Adventures in Automation: Testing and AI Workshop by James Bach of Rapid Software Testing

Last week, I saw that there was a new half day workshop by one of the creators of Rapid Software Testing, James Bach, Testing and AI Workshop, and that they were offering a 50% discount for anyone who was unemployed, so I just had to attend.

Visit James Bach's Satisfice.com Download Page to download PDFs on topics such as Responsible Work (May 2026), AI Writing Policy (April 2026), Heuristic Test Strategy Model (December 2024), Rapid Software Testing Explored Class Appendices (December 2024) Why Testers? (Oct 2024) and ChatGPT Sucks at Being a Testing Expert (August 2023)
YouTube Channel: Rapid Software Testing
Check out James Bach's Classes at https://www.satisfice.com/classes (50% discount if unemployed)
The next Testing and AI Workshop will be July 6, 2026

"In each session, the instructors will first perform a live 'testopsy.' This is a demonstration of AI-assisted testing (using both assistive and agentic modes of AI) on a real product, accompanied by an analysis and explanation of what happened during the demo. During this part of the workshop, you may ask questions or offer critique.

"Next the instructors will challenge you to perform a similar process or solve a similar problem with the help of AI. You will have two hours. You will be able to work alone or in groups, as you like.

"Finally, the instructors will review and critique your work, if you choose to share it. At the end of the event, you will get to keep the videos".

I really enjoyed the class! It involved an hour-and-a-half webinar, where James Bach walks people though how he uses AI, a few hours where you can work on your own project for the course, then another hour-and-a-half webinar where course attendees could review what they came up with to analyze a site.

My only problem with the course was that my rambunctious seven year old was home sick, and I wasn't able to dedicate a few hours to come up with how to use AI to test a website, since I was busy making lunch and chatting with my kiddo.

What Is the Rapid Software Testing Approach?

As a software tester, I'm familiar with some of Michael Bolton and James Bach's work on the Rapid Software Testing and Satisfice websites:

Testing vs Checking (2009) and Testing vs Checking Refined (2024)
Michael Bolton's Testers: Get Out of the Quality Assurance Business
James Bach's Test Automation Snake Oil (2021)
Cem Kaner, James Bach, and Bret Pettichord's Lessons Learned in Software Testing: A Context-Driven Approach

... Back when I was a Meetup Organizer of the Ministry of Testing - Boston, I had many fans of RST try to convince me to no longer say "Automated Testing", and say only "Automated Checking", since computer programs can't think or test. And not to say "Quality Assurance" ever, since software testers are not product owners, managers, or developers, and cannot set the schedule, assuring any quality.

Although I have never taken any of Rapid Software Testing's classes before, but I did just purchase their newly released book, Taking Testing Seriously: The Rapid Software Testing Approach (2025), though.

Not familiar with Rapid Software Testing? Watch Michael Bolton give a talk at Quality Jam 2017,

"A Ridiculously Rapid Introduction to Rapid Software Testing"

https://www.youtube.com/watch?v=AS2kuD--z44&t=3018s

Or you can take a look at James Bach & Michael Bolton's presentation to CAST2020 (The Conference for the Association for Software Testing 2020) in Austin, TX,
Testopsies Dissecting Your Testing.

James Bach, during the class, shared some documentation:

Satisfice / Download / RST Appendices, which lists more about Rapid Software Testing, and the difference between it and "Factory Style" testing such as ISTQB, Six Sigma, TQM, or RUP. It also has some sample test plans.
Responsibility is the Human Moat, covering his Principles of Responsible Work v 2.0, detailing how it is the person who is prompting the AI is responsible for the work of the AI, and have the power to reject or remediate the work the AI produces.

James Bach and his brother John ran the workshop session. It turns out that James' brother is also a software tester!

James has been experimenting with LLMs since Chat GPT came out. This experimental workshop is to help unemployed people who are trying to reskill. It isn't exactly like his three day AI and Testing class. James and John tailored this workshop to be taken in one day.

Testing is a Human activity. AI is not testing. AI can not do professional testing. AI can do exploratory analysis. But the human needs to review and validate the result. AI is just another tool a software tester can use.

An LLM can make up stories that are partially true, but it can not tell what it was thinking when exploring, since LLMs do not think.

The Application under test we would be exploring with AI? The Online Gantt Chart at https://www.onlinegantt.com/.

Principles of Responsible Work

James & John Bach and Michael Bolton just released the above listed Principles of Responsible Work paper in May 2026. In spite of all their experience, they are always trying to see how they can make things better, which resulted in these principles. Some of the principles they cover are:

"Every non-trivial business comprises some set of services that enable it to function. Examples include sales, accounting, R&D, customer support, etc. These services must be sufficiently reliable or else the business will collapse.

"Every service entails the risk of failure. When failures occur, the business must be able to recognize them and recover. In regulated industries, risk management may be subject to specific process mandates.

"A 'responsible person' is a natural person in a business who is reasonably competent, prepared, and accountable for some service that sustains or defines that business. No matter what tools or processes are used within a business, someone must be responsible for them. To bear responsibility, a person must have sufficient capacity. For instance, neither a child nor a tool (such as AI) has the capacity (either legally or socially) to bear responsibility. Even adult humans may lack capacity, such as when an airline pilot has had insufficient sleep or is under the influence of drugs.

"A 'responsible service' is one that is performed in good faith by a responsible person. This may include interpreting and following procedures, improving skills, anticipating problems, and reporting to relevant authorities or clients, both inside and outside the business". ( See James Bach's Responsibility is the Human Moat for more )

This begs the question... what is the Responsible Operation of AI?

Responsible Operation of AI

"AI cannot bear responsibility.[1] AI is not a responsible person, and it would be meaningless to speak of a tool that operates in “good faith.” Therefore, it cannot provide a responsible service, nor can responsibility be delegated to it.
"An “AI agent” is always a tool operated by a natural person, irrespective of whether the person is monitoring it in real-time.
"Thus, the operator of an AI agent always bears responsibility for the behavior of that agent.[2] This includes anticipating availability issues, such as outages, token rationing, or poor performance.
"The responsible operator must assure adequate quality of the work; they cannot merely prompt and pray.

"Therefore, the operator must…

"be sufficiently skilled in the use of the AI tool.
"be sufficiently prepared to operate the tool in that context.
"be sufficiently alert to risks, anomalies, or defects that may occur in the work.[3]
"reasonably anticipate restrictions or interruptions of the services on which the work depends.
"feel empowered (and actually have the power) to reject or remediate any work done by AI. Otherwise, the operator becomes a scapegoat, a “moral crumple zone.”[4]
avoid or mitigate the special hazards of AI operation".

( See James Bach's Responsibility is the Human Moat for more )

Special Hazards of AI Operation

Responsibility is the Human Moat details the Special Hazards of AI Operation, with categories such as:

Technological: Service Outage, Service Adulteration
Interactional: Cognitive Overload, Cognitive Debt, Cognitive Atrophy, Cognitive Surrender, Anthropomorphism and Anthropomorphizing, Automation Bias, Chronic Stress
Managerial: Data Negligence, Reckless Spending, Violations of Law, Moral Crumple Zone, Business Disruption.

What is the Moral Crumple Zone? As James Bach puts it, "This can occur in an automated system that has a human operator or supervisor (such as a self-driving car with a safety driver, or an ordinary user of ChatGPT). It happens when a failure of the system is routinely and carelessly blamed on the human. The human functions as a sort of 'crumple zone' in the moral sense: a component designed to assume blame in order to deflect it from the automation".

What can you do with AI in Testing?

Product Analysis, identifying testable elements
Risk Analysis, identifying potential problems
Test Design, generating test ideas
Test Data Synthesis
Test Oracles, analyzing output
Static Analysis, analyzing code and data structures
Test Tool Implementation

This is what James finds most interesting using AI in Testing... AI can actually allow you to create your own test tools with Claude Code. You can experiment with tools to help you test.

If you point AI at a software product, and ask it to come up with a list of features. AI is slow, clunky, expensive, and misses a lot of stuff, so it is not quite there yet.

Not sure if AI can do something? Just try it out. Ask if it can do something. Experiment.

Heuristics for Evaluating GenAI Work

To use AI responsibility, it isn't enough to ask the right questions. James Bach and Michael Bolton prepared a checklist for being critical operators of GenAI. Some of the topics on the checklist are...

"What did you do?

"How did you select the tool? What is your familiarity with the tool you used? Did you use GenAI directly or is it some tool that imposes a certain structure on your work? If the latter, what do you know about the role GenAI plays in that tool?
"How did you approach the problem? Be prepared to tell the story of your strategy for involving AI in this work. What problem specifically were you trying to solve? How did you prepare to use the AI? What part of the problem did you present to the AI? Did you use an interactive assistive approach or an agentic autonomous approach?
"How did you construct your prompts? How much and what kinds of context did you provide? Did you start simple and refine? Or did you submit a complicated one-shot prompt with explanations, supporting data, or examples?
"How did you follow-up your original prompt to shape the AI response? Were you in charge of the process? Did you control the refinement or were you constrained by the tool? Did you try the same question, or variations thereof, multiple times, or was it one connected conversation? If there were autonomous agents involved, how did you review what they did?
"How did you preserve the record of what happened? Was it stored automatically? Is any of it lost? Can you reproduce it? Can you access the logs in case you need to report in detail or perform follow-up work?

"How did it go?

"Do you fully understand what the AI produced? If challenged, can you explain and defend (or critique) each part of the work?
"What did you learn while doing this work? Did the AI surprise you in any way? Did you try a new technique? Did you learn anything important about the AI tool’s ability to handle this kind of problem? Did you get any new ideas for how you could be either more productive or else more careful with AI?
"Do you feel it gave helpful responses to your prompts? How was the first response? Were your efforts to refine that response rewarded? Did it ultimately solve the problem?
"In what way was it worth the effort and cost to solve this problem using this AI tool? Or was it not helpful? Could you have solved it in a simpler way with a web search, FAQ, or some other traditional tool?
"What is your assessment of the efficacy and reliability of this AI tool for this kind of work? Should users be on their guard, or is the worst-case scenario either not severe or else very improbable?

"Did you notice any specific problems with the AI responses?

"Incorrectness. Did it hallucinate, give erroneous output, or contradict the data available to it?
"Corrections. Did it ever claim to correct itself? If so, did it in fact correct itself, or did it make a new error?
"Ungrounded. Did it fail to acquire the full details of your problem? Did it make silent, risky assumptions?
"Unnecessary. Did it provide anything that was unhelpful or wasteful?
"Omission. Could something important be missing from its output? Were there any important matters that it only acknowledged after you prompted it to do so?
"Bias. In what ways might it have been biased? To detect bias, you might need to consider other methods of solving the problem, including solving it yourself and comparing your assumptions to the AI.
"Incongruence or Self-Repudiation. Did it make claims and later repudiate them? Did it say it would follow a procedure and then fail to follow it?
"Awkward Interaction. If the process was conversational, did you feel rushed or overwhelmed during it? Did the AI lose track of questions or forget the state of the interaction? Were you in control during the process? Was the AI too accommodating, sycophantic, or placating?
"Required Intervention. What part of its helpful responses depended on you to guide or correct it? If you had not been vigilant and provided only blank encouragement, would it have given a poor solution? Could a complete novice with no education or training have reached a good result using this AI tool?

"How specifically did you evaluate the responses?

"Thoroughness. How closely and carefully did you study what the AI did? What evidence do you have that it performed acceptably? This can be a time-consuming process, so it’s common for users to perform only minimal checking of the output. However, GenAI is very good at creating credible and confident responses that are completely wrong. Never trust your first impression of GenAI responses.
"Testable Output. Did the tool provide supporting data or any other feature that helped you evaluate its output? What references did it provide? What indications of its thinking? What kind of metadata? How helpful was its formatting of the response? How did it guide you to help you understand the output?
"Possible Blind Spots. In what way could the AI have given you an erroneous or otherwise problematic response without you knowing it? Are there plausible-seeming facts in the response that you didn’t check and about which you have no personal knowledge? Is there any content that might be incomplete? If GenAI performs any operations or transformations on data, could it have dropped some of that data?

"Now what?

"Double down on this approach? Maybe this experience showed promise, and you want to go further with it and complete the work.
"Drop it and try something else? Maybe the journey was the reward. Maybe you’ve already gotten what you need from this without needing to preserve any of the specific results that the AI gave you. Maybe this approach to solving the problem didn’t work and you should try something else.
"Consult with co-workers or clients about it? Remember that your reputation could depend on how others feel about your work. You might to run it by them and see what they think. They may have ideas for improving it, or concerns to share with you about it.
"Raise concerns about using AI for this kind of work? Did the AI do anything that scared you or represented a specific hazard that you should escalate?
"Test it more? Maybe you’re not sure what you’ve got and need more information to have an opinion. You can analyze what you’ve got or gather more information so you CAN analyze it.
Define or modify a protocol for using AI to do this? Maybe you know enough to make decisions about how you or your company will apply AI to problems of this kind".

Using AI to Test: Demo

James Bach demoed testing The Online Gantt Chart at https://www.onlinegantt.com/ with prompts using Chat GPT and Claude Code.

James kept track of his work in a worksheet based on his "Session Based Test Management" worksheet ( Download PDF ) he and his brother developed at Hewlett Packard, he tracks time testing, metrics, etc.

During the first part, a Survey Session, James was trying to see what he can manually explore and discover. How can you use this tool? What are the boundaries for text boxes? James did a preliminary analysis on what you can start exploring with AI.

James, after exploring for a bit, went to the setting screen, took the raw Document Object Model code of the web app, and add it to Chat GPT:

Prompt: "Analyze the settings for onlinegannt and list them for me"

Chat GPT then created an outline of all the settings.

Could you have told ChatGPT and get the information itself? Yes, but Chat GPT cannot press buttons. It might not have been able to go to the Settings screen. Sometimes, you don't want to "Prompt and Pray".

Prompt: "I would like to do combinatorial testing on the factors in this settings panel. What combinations do you suggest?"

James is doing what he calls "step back prompting". Start with a general prompt to see what information he can get, a soft prompt.

A "hardened prompt" is very specific. James doesn't start off with this because he is worried what the biases of the LLM are.

James also doesn't want to experience cognitive overload, receiving more information that he can process. He is exploring the app using AI to find subtle bugs he might miss.

Ask Yourself: What Is The Main Thing We Need to Test?

With the Online Gannt chart, the main thing we need to test is the display. If information is corrupted, if task get dropped, this product is useless.

Focus on: Is it usable? Reasonably charismatic?

James notices that as input, it accepts GANTT Files. If you want to enter a lot of data to see how the app handles it, you don't want to do it by hand. You can use AI to help you out.

Using AI To Create Tools

How can you generate a lot of data? You can use Claude Code to create a tool.

James took a GANTT file, saved it to an empty directory.

Prompt: "The file onlineganntt.gantt is a json that describes schedule data for a gantt chart. I need a Pythin program that generates random schedules that will enable me to test gantt charting software. The program must produce random schedules with N tasks (N is specific on the command line). The task should be between 1 and 10 days in length and connected using various dependency relations including no relation. Give the tasks generic self-descriptive names. The tool should process properly formatted JSON file with the suffix '.gantt' plus a csv table listing the details for each different task (for use as an oracle)."

What is an oracle? It is a principle or mechanism by which you recognize a problem. Rather than being a rigid, absolute tool that delivers binary "pass" or "fail" checkmarks, Rapid Software Testing treats an oracle as a heuristic -- a fallible, experience-based guideline that points you in the right direction but can occasionally fail or mislead you.

If this prompt is successful, James would get a data file he would be able to input into Online Gantt.

With the CSV file, we will be able to check for accuracy in case the AI had messed up generating the file. Testing the Import functionality is part of our tests.

Setting Claude protocols using Claude.md

James has in his claude.md file for Claude Code, where it has been told to push back and ask questions on any prompts

"Critical Working Protocol:

"When I ask you do do something for me, or produce text, code, or anything else, ask questions to get more information about any ambiguous and critical matters.
"During any task, identify any risky or critical assumptions that either you or I may be making. Declare those assumptions.
"Present a plan and get approval before taking action. Write the plan to a file called "plan-[project]-[date]-[ordinal value].md". The plans file should be stored in a plans directory that is located in the project directory. The plan file MUST include the full text of my prompt that immediately precipitated that plan.
"Before writing new code, always inform me of existing tools (command-line utilities, libraries, or open-source projects) that could fulfill the need. I want a detailed analysis. Include a section called 'Existing Solutions' that includes this information when asking clarifying questions or presenting plans.
"If multiple approaches exist (built-in tools vs writing code), present all options with trade-offs so I can choose.
"If I ask for analysis, always put it into a file. If I don't say otherwise, perform a detailed analysis with high effort"

"Testing:

"DO NOT TEST anything you write for me, unless I ask you to.
"When you write code and it is ready to try, let me try it first and I will report any problems

"Git Style:

."I want to be able to revert to earlier versions if I don't like the current version.
"Do not commit anything to Git unless the file is already being tracked."

When Claude started thinking about the problem, it then pushed back asking about task hierarchy, and alignment of subtasks, what resources to use, what the output filename should be, etc. James was then able to fine tune the prompt.

Why tell Claude not to test? James thinks it wastes tokens. Also, anything Claude claims to have tested, James does not believe it. He wants to test things out for himself. James does not want to be lulled into a false sense of security.

Ask Claude to keep a file of what you are doing

Everything you asked Claude to do will go into a file, as a record of what you were working on with Claude. Because this information is going into a file, when an employer asks, "What did you do?" you as a tester can give an answer.

It also gives you a record you can review when you are done experimenting. James needs to piece it together for an article or a blog post or in the class material.

The other day, James asked Claude to use an MCP that was connected to Playwright, and use Playwright to explore Online Gantt. It took a few days to figure out how to do it, but it worked. Claude was slow, but it did explore it and give an analysis. When James said, give me the analysis, it gave the analysis. When asked to put the analysis in a file, it churned out for five minutes a new analysis in the file, three times larger that what it gave James.

James asked, what are you doing? Why is there so much more in the file? Claude said, well, it gave on screen only an overview of the analysis.

Don't Accept Any Tool Claude Generates. Test the Tool.

After a lot of back and forth, Claude generated a tool in Python, generate_gantt.py.

Two files were produced: The file that was meant to be imported to Online Gantt, and the CSV file that had a copy of the data that could be examined in Microsoft Excel.

When the import file was uploaded, you could see that a lot of records were now showing... but were they correct?

Only by comparing the Online Gantt chart with the CSV could you see if the upload was interpreted correctly.

Careful Not To Have The AI Get Too Ahead of You!

Working with AI, James was reminded of the western trope of the cowboy being dragged by horses across the desert.

If you are being given too much data by the AI for you to process, you might want to shrug without reviewing it all, and think that if the first few results were fine, the rest of the results were fine.

It is best to work iteratively, one step at a time. Manage your cognitive load carefully.

Remember: You Need To Explain What You Are Doing With AI!

Once you finish a test session, ask yourself (and take notes on):

What did you do?
How did it go?
Did you notice any specific problems with the AI responses?
How specifically did you evaluate the responses?
Now what?

These are discussed above with the Heuristics for Evaluating GenAI Work by James Bach.

These reports you are having AI generate are good as a reference for yourself. It still might have hallucinations, but might be good enough for you. Recognize it might not be good enough for your employer. Do not give your employer any AI sloppiness that would waste their time. You need to do your due diligence.

People will need to trust you, trust your work, and trust you are using AI well, James mentioned. If you do not review the work, you are responsible for any mistakes the AI might have made.

As you practice using AI with testing, James predicts that one of the skills will be how to be efficient and effective in your use of tokens.

James relayed a time when a prompt he was doing was set up to retry on a failure. The AI then retried the same prompt 500 times in an hour. It didn't cost much, only ten dollars, but he just burned through ten dollars because of the tooling he was using.

There are some things that the current generation of AI does really well, such as figuring out data formats. Before, he would try to figure it out first, then give it to the AI. Now? He doesn't have to do that. Most of the time it figures it out and does it well. His default approach is give it the raw file it needs and stand back. If it fails, then step in.

Some things AI does not do well. If you ask it directly on questions you should ask when putting together a test plan, it may list out the fifteen things you need to ask yourself. But, if it starts "testing", it will not do that. This is because AI is not a good tester, since it gets its information from the internet, and most people on the internet are not a good testers.

You can try to make AI to be a better tester, by giving it such as James Bach's Heuristic Test Strategy Model (Download PDF). James was thinking about creating a file which details "What is Rapid Software Testing"?

... James said that maybe by then next time this session is run, he can create a skill for Claude that has this skill?

Be cautious when analyzing long lists of results

James mentioned that when you are reviewing the list of what was examined by AI, you need to ask yourself, "What's not here? What did the AI not tell me"

There is a certain cognitive bias called the part-set cuing effect.

In the Wikipedia page, under "Memory Inhibition", "The 'part-set cuing effect' was initially discovered by Slamecka (1968), who found that providing a portion of to-be-remembered items as test cues often impairs retrieval of the remaining un-cued items compared with performance in a no-cue (free-recall) control condition".

James described it as such, "If someone said, I need you to think about all the fruits that you can, for instance, apples, bananas, pineapples, lemons, oranges", it's harder to think of other types of fruit, since your brain says, "Yep, those are fruit!". If you weren't given anything, your brain can think of more things. To protect yourself, before you see what AI comes up with, scribble down your ideas first.

New Podcast: Humans in the Way

James Bach mentioned that he started a new podcast on the Rapid Software Testing YouTube channel, called Humans in the Way. He takes things that AI can do, breaks it down and analyze it.

Humans in the Way #1

https://youtu.be/X1dWeqYlF74?si=JWcSDS6bBhtJL6iT

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

Adventures in Automation

June 12, 2026

Testing and AI Workshop by James Bach of Rapid Software Testing - Notes