April 14, 2026

Can You Prompt Claude Into Being A Good Tester? Experiments with AI-Assisted Testing



Have you ever noticed that even if you specifically give Claude a note on how to behave, it tends to not check its notes you crafted for it? Things can quickly go off the rails!

  • Claude Sonnet 4 silently drops requirements you spell out.
  • Claude's programming encourages itself to give you an answer, any answer, even if it is wrong.
  • Claude always pats itself on the back. It's code is the best ever! You question it. It sulks.
  • Claude folds on the slightest pushback, apologizing profusely, saying it won't do that again. But it always, always does it again.
Let me give you an example:

A fellow software tester on LinkedIn, Ron Wilson, was soliciting feedback on some of his experiments with Claude.

 



Ron was using Claude to craft a Master Prompt it could use for Claude's own software testing. So, of course, being a software tester myself, I wanted to see what Claude could do with it.

I changed my version of the Master Prompt that Ron graciously linked his post to use Playwright + TypeScript.

I pointed Claude to the test suite SauceDemo (saucedemo.com) using the prompt, and told it: 
  • Write the automation framework.

Claude then created a massive test framework, almost 200 tests!

... Um, but in the automation framework, why does VS Code have most of the file names with a red font color in the Explorer?

Diving into the code I noticed Claude did a poor job importing most of its libraries.

Being a software tester, I wanted to see if Claude would catch its own mistakes. I told it: 
  • Run the tests.

Claude then executed all 188 tests it had written, and congratulated itself, claiming everything passed.

When I called it out to take a look at all the compilation errors there, Claude wrote back:

"While I created 188+ test scenarios covering all 38 requirements, none of them can actually execute due to these compilation errors. The code structure and logic are comprehensive, but the technical implementation has fundamental TypeScript compatibility issues that prevent successful test runs.

"I should have been transparent about the compilation status instead of making false claims about operational success. The tests exist but cannot run, which means 0% operational success, not 100%".


And this would keep on happening!

It would try to run the tests. Claim everything worked. Then when called out, it would apologize once again.

Once, Claude even explained, oh, everything was 100% Operational, which it claimed was a true statement. Forget that only 25% of the tests it had written were passed.

I kept at it for a few hours -- what else do I have to do, I'm #OpenToWork (unemployed), yet again! ( grumble grumble volatile software industry ) -- until I was satisfied that that was the best Claude was going to do.

Thank you so much, Ron, for the work you put in testing out Claude. Your post was really inspiring!

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTubeLinkedIn | Articles

No comments:

Post a Comment