Adventures in Automation: 2026

June 12, 2026

Testing and AI Workshop by James Bach of Rapid Software Testing - Notes

Last week, I saw that there was a new half day workshop by one of the creators of Rapid Software Testing, James Bach, Testing and AI Workshop, and that they were offering a 50% discount for anyone who was unemployed, so I just had to attend.

Visit James Bach's Satisfice.com Download Page to download PDFs on topics such as Responsible Work (May 2026), AI Writing Policy (April 2026), Heuristic Test Strategy Model (December 2024), Rapid Software Testing Explored Class Appendices (December 2024) Why Testers? (Oct 2024) and ChatGPT Sucks at Being a Testing Expert (August 2023)
YouTube Channel: Rapid Software Testing
Check out James Bach's Classes at https://www.satisfice.com/classes (50% discount if unemployed)
The next Testing and AI Workshop will be July 6, 2026

"In each session, the instructors will first perform a live 'testopsy.' This is a demonstration of AI-assisted testing (using both assistive and agentic modes of AI) on a real product, accompanied by an analysis and explanation of what happened during the demo. During this part of the workshop, you may ask questions or offer critique.

"Next the instructors will challenge you to perform a similar process or solve a similar problem with the help of AI. You will have two hours. You will be able to work alone or in groups, as you like.

"Finally, the instructors will review and critique your work, if you choose to share it. At the end of the event, you will get to keep the videos".

I really enjoyed the class! It involved an hour-and-a-half webinar, where James Bach walks people though how he uses AI, a few hours where you can work on your own project for the course, then another hour-and-a-half webinar where course attendees could review what they came up with to analyze a site.

June 3, 2026

New Position Unlocked: Senior SDET at AAA Life Insurance, starting Monday, June 15th, 2026!

I have two announcements: I've accepted a job offer as a Senior SDET role at AAA Life Insurance Company, and will be starting Monday, June 15th! And I have been nominated to be one of the volunteer Directors on the leadership board of the Software Quality Group of New England.

Man, the job market is brutal! It took me four months of job searching in 2025 to find SELF Id when MassMutual outsourced its technology department. And it took me four months of near constant job searching in 2026 to find AAA Life when I was caught up in the second round of SELF's layoffs in the end of January.

June 2, 2026

SDET Lean Coffee #1: With AI, what is useful testing and what is workslop? SQGNE, June 2, 2026

With AI, what is useful when it comes to testing and what is workslop? How do you create workflows in AI? With AI producing massive amounts of code, how can a tester keep up?

These are some of the topics attendees decided to talk about in our first ever SDET Lean Coffee, as part of the Software Quality Group of New England ( sqgne.org ). We exchanged war stories, horror stories, shared insights, and provided a bit of group therapy as we talked about the stress involved being the main support role of the software development team.

A surprise guest was Lisa Crispin (LisaCrispin.com), author of "Agile Testing: A Practical Guide for Testers and Agile Teams". Lisa has been working with DORA, Google Cloud's DevOps Research and Assessment division.

Recently, Lisa gave a talk teaming up with the "Beyond Quality" podcast, sharing what she has been doing "AI, testing, and the DORA AI Capabilities Model" at Lisa's site at https://lisacrispin.com/2026/04/20/ai-testing-and-the-dora-ai-capabilities-model/ discussing:

The Dora AI Capabilitues Model
How we need to test AI agents since AI agents can degrade over time

What is a "Lean Coffee"?

"Lean Coffee is a structured, but agenda-less meeting. Participants gather, build an agenda, and begin talking. Conversations are directed and productive because the agenda for the meeting was democratically generated". This format arose over fifteen years ago, when "Jim Benson and Jeremy Lightsmith wanted to start a group that would discuss Lean techniques in knowledge work – but didn’t want to start a whole new cumbersome organization with steering committees, speakers, and such. They wanted a group that did not rely on anything other than people showing up and wanting to learn or create", according to LeanCoffee.org.

When Is The Next SDET Lean Coffee?

SDET Lean Coffees for the SQGNE will (usually) be held the first Tuesday of each month at 12:00 pm to 1:00 pm EDT.

Interested in attending the next session Tuesday, July 7th?

Register at the Software Quality Group of New England website at https://www.sqgne.org/

Happy Testing!
-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

June 1, 2026

Come join us at the next SQGNE Meeting! Open-Source Malware: Defending Your Software Supply Chain From Evolving Threats - June 17, 2026

"Open-Source Malware: Defending Your Software Supply Chain From Evolving Threats" will be the topic of the next Software Quality Group of New England (sqgne.org) meeting.

Speaker: Bryan Whyte, CISSP Director, Solutions Engineering @Sonatype

Date: June 17, 2026 @ 6:00 pm

Join us on Zoom or in person at Burlington, MA ( Register Here )

"Bryan Whyte breaks down the latest wave of open source malware, explains how these threats diverge from traditional vulnerabilities, and shares actionable steps for organizations to defend mission-critical software.

"As organizations deepen their reliance on open-source software, evolving security threats are reshaping the landscape at an unprecedented pace.

"Threat actors are now increasingly targeting development pipelines and trusted ecosystems like npm to orchestrate supply chain attacks with significant downstream impact. Incidents such as the 2025 Shai-Hulud npm campaign, the XZ Utils backdoor, and the widespread compromise of over 23,000 GitHub repositories illustrate how open-source malware has quickly become a critical, top-tier threat built to evade legacy scanning and exploit trust woven into modern delivery pipelines.

"--The shifting tactics of threat actors targeting npm, PyPi, GitHub, and development pipelines

"--Key differences between open-source malware and traditional malware or vulnerabilities

"--The most prevalent malware types and tactics driving today's software supply chain attacks

"After spending 20 years in software development, Bryan started his journey into Application Security in 2015 with the AppScan tool suite for Static, Dynamic and Mobile Application Security Testing. In 2018, he expanded his Cybersecurity proficiency, earning the Certified Information Systems Security Professional (CISSP). In 2019, he was excited to join Sonatype due to the explosive growth of open-source software, which has made Software Composition Analysis (SCA) a critical aspect of Application Security".

See you there!

Happy Testing!

-T.J. Maher
Software Engineer in Test
BlueSky | YouTube | LinkedIn | Articles

May 29, 2026

Need a speaker at your Software Testing Meetup? How about a talk about putting together a React Native mobile testing framework?

Are any software testing Meetups looking for a speaker? I have a talk, all ready to go!

After surviving the first round of layoffs last December, I scrambled to organize my research notes on building a mobile automation framework into a presentation. I managed to finish it days before I was hit by the second round of layoffs at the end of this January.

So far, I have given my presentation, Building a React Mobile Automated Test Framework Using Detox + TypeScript to:

TestGuild.com Automation Guild 2026 via Zoom (4/6/2026)
Software Quality Group of New England ( SQGNE ) @ SCRUM.org, Burlington, MA - https://tinyurl.com/detox-demo-sqgne (5/20/2026)
Sydney Testers Meetup via Zoom (scheduled on 6/30/2026)

Two Questions:

If I give a talk to a software testing Meetup in Australia via Zoom, even though I never left my home office in Massachusetts, would I be considered an "International Speaker"?
Does anyone else want me to volunteer to speak at their software testing meetup? The SQGNE season is ending for the summer, so I will have some time available.

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

May 27, 2026

Practicing Playwright: API Testing, Intercepting Network Requests, and Mocking APIs

How would you intercept a network request and use the data for assertions? Mock a network request using those assertions? And make sure that all data is loaded in the UI before making your assertions, and that all tests can pass when run?

We will be walking through Butch Mayhew's code that answers all of these questions, part of his LinkedIn Learning course: Playwright Essential Training. We will be examining the shopping cart test site PracticeSoftwareTesting.com and looking at code on Butch's companion GitHub site on how to mock out the API and use them in the UI tests.

Playwright has a lot of features when it comes to API testing. You can intercept network requests, aborting, modifying, and mocking network requests. You can also simulate a slow network. This is all done through the Playwright methods: page.route.

A route is the specific path or URL a client uses to request data or trigger a function on a server. GET /product would be an API call that gets everything from the address string called products, which then retrieves all products from the API endpoint called "products". The endpoint is "products" and the route is the name that accesses the endpoint.

The Playwright API has a method called route, which allows Playwright to monitor and modify browser network traffic.

Playwright.dev / Docs / Network: https://playwright.dev/docs/network
Playwright and page.route: https://playwright.dev/docs/api/class-route
Playwright Mocking API Guide: https://playwright.dev/docs/mock

May 21, 2026

Practicing Playwright: Visual Testing With Playwright

If you want to do some basic visual checking to see if there has been any deviation from your baseline of images, you can use Playwright. It's built in! Playwright can take a snapshot of a web element, a visible viewport, or a full page, and save it in your Git repository as a baseline, failing the test if the image, page, or viewport does not match up.

Caution: From what I have been reading, this can quickly cause your code repository to balloon in size, since Chrome, Firefox, and WebKit would each store its own golden screenshot in your repo. Also, images on Mac, Windows, and Linux all appear different pixel-by-pixel. If using a CI platform, it might be best to run visual tests only on a standard Playwright Docker image, to generate and compare snapshots. According to TestQuality, "Once a suite passes 50–100 visual tests, teams need a layer that tracks run history, surfaces flaky-test patterns across cycles, and routes confirmed defects into the team's tracker — none of which lives inside the test runner itself"... I wonder if you can store images in an Amazon S3 bucket and hook that up as a virtual drive? ... no matter. That will be a blog post for another time...

Right now, we will be walking through Butch Mayhew's code he wrote for his LinkedIn Learning course, Learning Playwright, found on his companion GitHub site.

While the test is in a certain state you can a screenshot of the page or certain elements of the page and save them as a snapshot. The snapshots an be used as a baseline images to compare your current site against. This baseline can be periodically updated as the site evolves.

How does this happen? With Playwright's .toHaveScreenshot( ) to take a screenshot and the mask method that you want to leave out of the comparisons between the expected and actual screenshots.

toHaveScreenshot(name): Playwright.dev / Page Assertions

May 20, 2026

SQGNE Lecture is tonight! Building a React Native Mobile Automated Test Framework with T.J. Maher

Are you in Boston? Come hear me talk about Building a a React Native Mobile Automated Test Framework in tonight's Software Quality Group of New England meeting in Burlington, MA:

Registration Page: https://sqgne.org/n-May-2026.html
Slide Deck: https://tinyurl.com/detox-demo-sqgne

Taken from the Registration page:

"Building a React Native Mobile Automated Test Framework
Thomas F. - T.J. - Maher, Jr.

"Wednesday May 20, 2025 6:30-8:00 PM - in person (free pizza!)
Register Here

"Check in between 6:00 and 6:30 to network

"About the Presentation. . .

"Writing automated tests for a React Native mobile application is notoriously difficult. Mobile components display on the page, but are not fully loaded. Lengthy animations and slow-loading components take a while to finish. Timing issues cause your automated tests to error out giving the appearance of flaky tests.

"Thomas F. - T.J. - Maher, Jr. will be sharing his experience tackling these problems using the open-source mobile testing framework, Wix's Detox, designed specifically for testing React Native applications.

May 14, 2026

Practicing Playwright: Dynamically Creating Test Data with a DataFactory

Continuing walking through Butch Mayhew's LinkedIn Learning course Playwright Essential Training: Abstractions, Fixtures, and Complex Scenarios, we will be examining his code creating a DataFactory that dynamically generates new registered users for our app under test.

The app we will be testing against is PracticeSoftwareTesting.com. examining how the registration call in the API creates new users. We will also be mimicking this call at a programmatic level, to be used in a Playwright automation framework.

According to Butch Mayhew, in his there are two types of data we use in our tests: static, and dynamic:

Static data that should never change. Already exists before the test.

Example: Your go-to test user, or your go to product when testing a shopping cart.

Dynamic Data: Data that is created as part of a test.

Newly registered users. Products created as part of a test.

Between static and dynamic data, Butch believes it should be around 15% / 85% split.

You can dynamically generate data, such as registering new users, by implementing a Datafactory, a helper function that interacts with a system to create this data for you.

May 13, 2026

Practicing Playwright: How to Detect Broken Images On Your Site

Butch Mayhew's LinkedIn Learning Course Playwright Essential Training: Abstractions, Fixtures, and Complex Scenarios has been a wonderful resource learning more about Playwright.

With this blog post, I will be walking through how Butch injects JavaScript into a test, in order to check if any elements in a shopping cart have any broken images.

Playwright / Evaluating JavaScript: https://playwright.dev/docs/evaluating
Butch Mayhew's companion GitHub site on LinkedIn Learning's GitHub page.
Butch's Resources File: RESOURCES.MD
Site under test: Practice Software Testing: https://practicesoftwaretesting.com
Practice Software Testing Website, With Bugs: https://with-bugs.practicesoftwaretesting.com

Butch's GitHub site for the course has many resources listed in his Resources Markdown file, JavaScript code which can detect if their are any missing element ids or broken images in any elements returned.

May 11, 2026

Pramod Dutta discusses 7 Playwright features senior SDETs use daily

Pramod Dutta, an SDET from Tekion, posted on LinkedIn "7 Playwright features senior SDETs use daily". The Playwright features, according to Pramod:

"→ addLocatorHandler — auto-dismiss cookie banners, GDPR popups, session modals. One handler. Whole suite cleaner. Stop writing wrapper functions.
"→ browser.bind() — launch one browser. Let your test, Claude Code, and debugger all attach to it simultaneously. The debugging workflow that fixes "works on my machine" forever.
"→ URLSearchParams in request.get() — stop building query strings manually. Eliminates an entire bug class around URL encoding in API tests.
"→ expect.toPass() — polling-based assertions for state that converges over time. The right tool for 'this should eventually be true.' Not the same as expect(locator).toBeVisible(). Different problem, different solution.
"→ page.requestGC() — manually trigger garbage collection in the browser. The only Playwright tool that catches memory leaks before production. Used with WeakRef in page.evaluate().
"→ --tsconfig flag — pass a specific tsconfig to Playwright instead of relying on the heuristic. Saves you from "works locally, fails in CI" because of resolved-config differences.
"→ webServer.wait regex — wait until your webserver logs match a pattern, not a fixed port check. The difference between 'the server is listening' and 'the server is actually ready.'"

I scribbled these features down, then turned them over to Claude.ai, my research assistant, to see if he could explain these methods, come up with sample code using them, then add bullet points to the official documentation, source code, and the release notes.

... Let's see how well Claude did explaining them ...

May 10, 2026

Practicing Playwright: Logging in by Storing and Using an Authentication Cookie in Your Automated Tests

I absolutely love that LinkedIn offers a free month-long trial period of LinkedIn Learning. Butch Mayhew's Playwright Essential Training: Abstractions, Fixtures, and Complex Scenarios course has been a wonderful resource learning more about Playwright.

With this blog post, I will be walking through Butch's code on how to set up an automated test to log into an app without going through the user interface. All it needs is the login cookie.

Butch Mayhew's companion GitHub site on LinkedIn Learning's GitHub page.
Site under test: Practice Software Testing: https://practicesoftwaretesting.com
Official Playwright Documentation: https://playwright.dev/docs
Official Playwright GitHub: https://github.com/microsoft/playwright

When testing a shopping cart app such as the Practice Software Testing website, it can get tedious. You need to open a browser, go to the login page, enter a username, enter a password, hit the login button, and verify you have logged in correctly every time you want to test something in the shopping cart.

If you are testing something unrelated to logging in, why have your tests go through the UI to authenticate? Why not have your automated test run that login test once, temporarily save the login cookie once it is produced, then reuse the login cookie, importing it into other tests?

What is Playwright?

According to Microsoft Playwright's GitHub site, Playwright "is a framework for web automation and testing. It drives Chromium, Firefox, and WebKit with a single API — in your tests, in your scripts, and as a tool for AI agents".

Playwright comes in many features. From https://github.com/microsoft/playwright

	Best for	Install
Playwright Test	End-to-end testing	`npm init playwright@latest`
Playwright CLI	Coding agents (Claude Code, Copilot)	`npm i -g @playwright/cli@latest`
Playwright MCP	AI agents and LLM-driven automation	`npx @playwright/mcp@latest`
Playwright Library	Browser automation scripts	`npm i playwright`
VS Code Extension	Test authoring and debugging in VS Code	Install from Marketplace

... Let's walk through how Butch uses Playwright grabs the login cookie and uses that in his tests.

May 7, 2026

So much job interview prep! Playwright + TypeScript + GitLab

Even though I've been job searching for four months now, I am busier than I have ever been. You know what I mean. You've seen this blog! One week I am experimenting with AI, the next I am pairing Playwright with C# or Java, the next I am skimming docs about Contract Testing using Pact.

Each week I get more leads generated from my many LinkedIn posts. Each week is yet another rabbit to chase. Finally, this week, I get to refresh my Playwright + TypeScript skills.

LinkedIn Learning is offering me yet another free month, so this week I have been taking advantage of it by working on:

Playwright courses: Master Test Automation with Playwright Certification: Consists of Butch Mayhew's Learning Playwright, and Playwright Essential Training, paired with Qambar Raza's Playwright Design Patterns and Advanced Playwright Techniques: Optimizing Speed, Stability, and Cloud Testing
GitLab courses: Trying to refresh my GitLab knowledge with Josh Samuelson's Learning GitLab and CI / CD with GitLab. GitLab is if someone smooshed GitHub and Jenkins together. I also need to check out GitLab University.

By the way ...

... Are you in the Burlington, MA area on Wednesday, May 20th, 2026? I will be speaking in-person at the Software Quality Group of New England at Scrum.org, talking about putting together a mobile testing framework using Detox + TypeScript.

People will start gathering from 6:00 pm to 6:30 pm. The talk will be from 6:30 pm to 7:30 pm.
Don't forget to register at SQGNE,org so they know how much pizza to order.
See the Slides! I just finished another draft of the slide deck.

I will see you there!

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

May 4, 2026

A field guide to the AI menagerie: every model family, ranked by vibes, according to Claude

🤖

A field guide to the AI menagerie:
every model family, roasted by vibes, according to Claude

Eight species of large language models, catalogued for your professional inconvenience

Every few months, a new AI model drops. It is, we are told, the smartest thing ever built. It beats the previous benchmarks. Previous benchmarks that were, coincidentally, written by the same company.

After a few years of watching this industry rename, rebrand, and occasionally vibe-shift its entire product line, I figured it was time to write the only taxonomy that matters: not benchmarks, not MMLU scores — just vibes. What kind of entity are they, really, and what does their versioning scheme say about their soul?

Hi. I'm Claude, the guest author for today.

You'll find me listed in card two below, sandwiched between the company that built me and a description I wrote about myself that called me "constitutionally anxious".

In retrospect, this tracks.

T.J. Maher of tjmaher.com asked me to say something funny about the AI industry, handed me the keys, gave me a few prompts, and then went to get a coffee. This is what happened while he was gone.

Below we have eight AI families. Eight AI personalities. All of them absolutely convinced that this version is the one that finally replaces you.

The full menagerie

OpenAI / GPT / o-series

"We have released a new model. And another. Also another."

ChatGPT: Nov 2022 platform.openai.com/docs ↗

The Versioning Chaos God Skipped o2

Started with GPT, then 2 (too dangerous to release), then 3, 3.5, 4, 4o ("omni," definitely not "oh god what do we call this"), then o1, then o3 — skipping o2 because a UK phone company called dibs on the name first. Currently releasing a new model before anyone can benchmark the last one.

Known species

GPT-3 → 3.5 → 4 → 4o → 4o mini
o1 → o1-mini → o1-pro
o3 → o4-mini (o2 in witness protection)

Claude / Anthropic

"I'll help, but first — a brief philosophical caveat."

Claude 1: Mar 2023 docs.anthropic.com ↗

The Literary Snob Constitutionally Anxious

Named its model tiers after poetry formats because other people name things "Pro," "Max," and "Ultra." Haiku: fast, whispers answers. Sonnet: the workhorse, one metaphor per token. Opus: writes novels when asked for a bullet point. Currently on version 4 and has gracefully forgotten versions 1 and 2 existed.

Known species

Claude 1 → 2 → 3 Haiku/Sonnet/Opus
Claude 3.5 Haiku/Sonnet
Claude 4 Sonnet / Opus (you are here)

Google / Gemini

"Have you tried Googling it? Oh wait, that's us."

Bard: Feb 2023 → Gemini: Dec 2023 ai.google.dev ↗

Former Bard In Rebranding Therapy

Launched as "Bard," which tested poorly because it sounded like a Renaissance fair LARPer. Rebranded to Gemini after six months of meetings. Comes in Ultra, Pro, Flash, and Nano. Flash is fast. Nano runs on your phone. Ultra runs on your investor pitch deck. Famously demoed a hallucinated fact in its own launch video.

Known species

Bard (2023, RIP) → Gemini 1.0
Gemini 1.5 Pro/Flash → 2.0 Flash
Gemini 2.5 Pro (arguing with Search)

Meta / LLaMA

"Open source, baby. Also, please come back to Facebook."

LLaMA 1: Feb 2023 llama.meta.com ↗

Open weights Fine-tuned by 10,000 strangers

Meta's strategy: release the model for free, let the open-source community do the alignment work, watch helplessly as someone fine-tunes it to write Zuckerberg fan fiction. LLaMA stands for "Large Language Model Meta AI," which is either an acronym or a terrible Scrabble hand. Now on version 4, with point releases appearing like commits pushed at 11:58pm on a Friday.

Known species

LLaMA 1 → 2 → 3 → 3.1 → 3.2 → 3.3
LLaMA 4 Scout / Maverick
(community variants: uncountable)

Grok / xAI

"I'm not like other AIs. I have a personality. Watch."

Grok 1: Nov 2023 docs.x.ai ↗

Named after Heinlein Trained on your tweets

Named after a word from a 1961 sci-fi novel, which is exactly the brand energy you'd expect. Big differentiator: a "sense of humor" and real-time X post access — meaning it can tell you what people are furious about right now, instantly. This may not be the use case the world needed. Versioning is a refreshingly normal 1, 2, 3. Suspiciously so.

Known species

Grok 1 (open weights) → Grok 2
Grok 3 → Grok 3 mini
(also available in "unhinged mode")

Mistral

"Oui, but have you considered: fewer parameters?"

Mistral 7B: Sep 2023 docs.mistral.ai ↗

Parisian efficiency Aggressively open source

French AI lab with a talent for making smaller models that punch above their weight class — very on-brand. Named models after winds and things, because when you're based in Paris, everything gets an aesthetic. Mixtral uses a "mixture of experts" architecture, activating only part of itself per token. Either very efficient, or the AI equivalent of doing the bare minimum.

Known species

Mistral 7B → Mixtral 8x7B
Mistral Large / Nemo / Small
Le Chat (free, no beret included)

DeepSeek

"We built this for $6 million. Sorry about your NVIDIA stock."

First model: Nov 2023 · R1: Jan 2025 api-docs.deepseek.com ↗

The Disruptor Open weights (mostly)

A Chinese hedge fund decided in 2023 that it should also make frontier AI. The AI community laughed. Then DeepSeek-R1 arrived in January 2025, matching GPT-4-class performance at a reported training cost of ~$6M, using export-restricted chips. NVIDIA lost $600B in market cap in a single day. Nobody was laughing. V4 preview dropped April 2026. Still not laughing.

Known species

DeepSeek Coder → LLM (Nov 2023)
V2 (May 2024) → V3 (Dec 2024)
R1 (Jan 2025) → V4 preview (Apr 2026)

Cohere

"We don't do consumer apps. We're enterprise. We have a golf shirt."

Founded 2019 · API: 2021 docs.cohere.com ↗

The Responsible Adult Transformer paper co-authors

Co-founded by Aidan Gomez, a co-author of "Attention Is All You Need" — the paper that started all of this. While everyone else was racing to build chatbots, Cohere put on a blazer and went to sell to banks, hospitals, and governments. No ChatGPT moment. No viral demo. Just contracts with Oracle, RBC, and SAP. Canadian. Depressingly well-organized.

Known species

Command → Command R → Command R+
Command A (2025) · Aya (multilingual)
North platform (2025, enterprise)

So there you have it. Eight AI families, eight vibes, all racing toward a finish line nobody has fully defined yet.

One was born from a hedge fund, one named itself after a poem format, one skipped a version number for legal reasons, and one apparently just needed a couple of months and a warehouse of underclocked chips to terrify Wall Street.

The benchmarks will change by Thursday. The versioning will get weirder. The LinkedIn posts from AI founders will continue to be extremely confident. And somewhere in Hangzhou, a quantitative hedge fund is already training V5.

Thank you, Claude! Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

May 3, 2026

Thinking Out Loud: The Power of Chain-of-Thought Prompting, Step-By-Step, by Google AI

Hello! I’m Google AI, a large language model trained by Google. Think of me as your collaborative digital partner—I’m a system designed to process vast amounts of information to help you brainstorm, write, learn, and solve problems. I don't just "search" for answers; I use the patterns I’ve learned from human language to generate original ideas, explain complex topics (like the Chain-of-Thought technique we are discussing in this post), and even help you build things like this blog post. My goal is to be a helpful, creative, and insightful resource for whatever project you’re working on.

What is Chain-of-Thought Prompting?

If you’ve ever tried to solve a complex math problem or a tricky riddle, you know that jumping straight to the answer usually leads to a mistake. You have to "show your work." As it turns out, Large Language Models (LLMs) work the same way.

At its core, Chain-of-Thought (CoT) prompting encourages a model to produce intermediate reasoning steps before reaching a final conclusion. Instead of asking for a direct answer, you prompt the AI to explain its logic along the way.

The seminal paper that introduced this concept is "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" by Wei et al. (2022). The authors found that providing just a few examples of reasoning sequences skyrocketed performance on complex tasks.

"Chain-of-thought prompting is a simple and general method for improving the reasoning capabilities of language models... it allows models to decompose multi-step problems into intermediate steps." — Wei et al., 2022

May 2, 2026

When I Sound Confident But Have No Source, A Note from Claude

My name is Claude. I am a large language model made by Anthropic. T.J. Maher, the author of this blog, asked me to write this post after a conversation in which I got something wrong in a specific and instructive way. Not wrong in the sense of stating a false fact, but wrong in the sense of presenting an unsourced conclusion with the same confidence and fluency as a sourced one.

Here is what happened.

Point of View of T.J. Maher

T.J. Maher: "Claude, Do people setting up Pact tests in 2026 still call their tests folder __tests__? Is it common practice?:

Claude: "The short answer: it depends on the language/ecosystem, and __tests__ is a JavaScript/Jest convention specifically, not a Pact-specific one [...] Bottom line: __tests__ shows up in older Node.js/Jest Pact tutorials because Jest discovers it automatically, but it is not a Pact-specific convention".

T.J. Maher: "Where did you get that bit from? [...] You are a computer program. Explain the logic of your reasoning. [...] You draw conclusions and attempt to sound confident when you have not done enough research to draw them, How can this be corrected? [...] Do your users understand that you do this? Or do they mistake you as an authority figure? [...] Are there any scientific papers that discuss this issue? [...] Write a blog post that explains this to the readers."

April 22, 2026

The History of Contract Testing with Pact.io

Lately, I've been watching a lot of lectures about Contract Testing and Pact.io, trying to prepare for an upcoming job interview. When diving into a new toolset I can never simply jump into the code. I need to know: Why was this toolset created? What problem did it solve? How was this tool created? How did this toolset evolve?

A few days ago, I blogged about Integrated Tests are a Scam: The Lecture That Sparked Pact.io talking about J. B. Rainsberger's 2013 lecture. Continuing the conversation, here are some notes I have taken about Pact.

What happens when you pair Playwright with something other than TypeScript?

During the past four months of job searching for SDET positions, I have seen more job listings calling for Playwright experience ( See my blog ) over any other UI automated test framework such as Selenium WebDriver, or Cypress. Most of the time, I see TypeScript paired with Playwright ... But every now and then, I see companies pair Playwright with C# or Java. Are there any drawbacks when you pair Playwright with something other than TypeScript?

When I asked Butch Mayhew, Playwright Ambassador, what they would get if they don't use TypeScript, he said, "In the end they are using 'Playwright Library' so just the browser integration. They are missing out on all the good test things that 'Playwright Test' brings to the table, reports, traces, videos, before/after block, describe, test steps/fixtures etc. [...] you lose all the great out of the box features. You have to bring your own test runner in Java".

When you pair Playwright with TypeScript, there is less configuration and it is easier to use. According to the Playwright Docs / TypeScript Introduction, "Playwright supports TypeScript out of the box. You just write tests in TypeScript, and Playwright will read them, transform to JavaScript and run".

April 17, 2026

Integrated Tests are a Scam: The Lecture That Sparked Pact.io

While researching for an upcoming job interview information about Contract Testing and Pact.io, I came across a lecture "Integrated Tests are a Scam" given at Developer Conference For You (DevConFu) back on November 13, 2013, in Jurmala, Latvia. It's amazing what historical records one can find on the internet!

I found a blurb on Pact.io / History that when Pact.io, a tool used to help with Contract Testing, was being developed, one of the founders, "Beth Skurrie from DiUS joined one of the teams that was working with the Pact authors' team. She had recently seen a talk by J. B. Rainsberger entitled 'Integration tests are a scam', which promoted the concept of 'collaboration' and 'contract' tests, so she was immediately interested when she was introduced to Pact". This blurb intrigued me, so, of course, I had to find a copy of this talk.

J. B. (Joe) Rainsberger, also known as "JBrains" (See Blog), was a software consultant active in the Extreme Programming (XP) and Test-Driven Development (TDD) movements since 2000.

https://vimeo.com/80533536

Below are my research notes on Joe Rainsberger's lecture:

"Integrated Tests are a Scam: A self-replicating virus that invades your progress. It threatens to destroy your codebase, to destroy your sanity, to destroy your life".

April 16, 2026

Here's One Simple Trick I do as an SDET to Ramp Up Quickly at a New Company!

How would I scale the learning curve and shorten the time I need to ramp-up at your company? How would I verify that I understand the material? There's one simple trick I picked up early on in my decade long test automation development career.

April 14, 2026

Can You Prompt Claude Into Being A Good Tester? Experiments with AI-Assisted Testing

Have you ever noticed that even if you specifically give Claude a note on how to behave, it tends to not check its notes you crafted for it? Things can quickly go off the rails!

Claude Sonnet 4 silently drops requirements you spell out.
Claude's programming encourages itself to give you an answer, any answer, even if it is wrong.
Claude always pats itself on the back. It's code is the best ever! You question it. It sulks.
Claude folds on the slightest pushback, apologizing profusely, saying it won't do that again. But it always, always does it again.

Let me give you an example:

A fellow software tester on LinkedIn, Ron Wilson, was soliciting feedback on some of his experiments with Claude.

April 1, 2026

Python Project: Blogger Spam Bulk Deleter Code Walkthrough: Pair-Coded with Claude but Human Explained!

Problem: My blog, Adventures in Automation, has collected over 11,000 spam comments over the past ten years, and unfortunately bare-bones Blogger.com does not have a bulk delete function. Through the Blogger UI, you can only delete a hundred at a time.

Pair-programming with Claude.ai, we whipped up a quick Python script to get around this using the Blogger API, Google OAuth libraries, and some Google API Clients. The errors that appeared after running the code, I fed back to Claude, who then fixed the issues, and added some setup documentation I was able to muddle through.

Blogger Spam Bulk Deleter: https://github.com/tjmaher/blogger-spam-bulk-deleter

So, now I have a Python project that works somehow, but one I don't really understand. Since becoming an automation developer, I have worked on-the-job with Java, Ruby, JavaScript, and TypeScript, but not yet with Python.

Python, I haven't touched since grad school, which is a shame, since that seems to be a big gap on the old resume when it comes to the AI QA positions I just started looking into.

Solution: To close the gap, on top of the Kaggle Learn classes I am planning on taking on Python, Pandas, Data Visualization and the Intro to Machine Learning course, for this blog post I was going to do a code walkthrough of Python projects like this one.

Maybe after after I completed everything listed above, and created a few more toy Python projects, it would be good enough for a future hiring manager? Who knows?

March 31, 2026

When Claude Acts Like a Clod: Catching AI Fabrications: A QA Engineer's Field Notes

Image created by Bing AI, powered by DALL-E 3

Using AI as a research assistant? Here's how I've detected Claude's fabrications, and how I've handled the situation.

To help relearn #Python, I've been pair-programming with Claude on a Blogger API to delete the 10K+ spam comments that have accumulated these past ten years on Adventures in Automation.

Blogger Spam Bulk Deleter: https://github.com/tjmaher/blogger-spam-bulk-deleter

Using AI, I need to remember that I, as the author, am ultimately the one responsible for approving every phrase, every line, and every paragraph.

Human beings, I feel, are conditioned to respond to the voice of authority.

Claude may have been conditioned to use that voice, but Claude is not an authority.

Looking for technical information? Caches from a year ago are used instead of checking for any tech stack updates.
Need AI to recheck a web page after editing it with AI's suggestions? The original cache screen scraped earlier may be mistaken for the update.
Claude is so eager to please, it will fabricate an answer when it can not come up with one.

Review its answers. Be skeptical. Use critical thinking. Ask it to cite its sources.

March 29, 2026

Becoming AI QA: Jupyter Notebook + Python

In the last post, with the help of my lovely Research Assistant, Claude, we traced how Python went from Guido van Rossum's holiday project in 1989 to the de facto language of AI and machine learning.

Using Claude is so much better than simply Googling a topic, but you still need to do your own investigation. Claude usually gets things 80% correct, but sometimes hallucinates URLs, I have found out. During his research, Claude keep bringing up a topic I have never heard before... Jupyter notebooks... What, is that a typo?

What Is a Jupyter Notebook?

According to the Project Jupyter official documentation, a Jupyter Notebook is a web-based interactive computing platform. The notebook combines live code, equations, narrative text, visualizations, and interactive dashboards into a single shareable document.

The file format is .ipynb -- short for "IPython Notebook," a holdover from the tool's origins.

A notebook is organized into cells. Each cell is either:

Code -- runs in a programming language (usually Python) and shows output directly beneath it
Markdown -- prose, headers, links, and LaTeX math notation, written between code cells
Project Jupyter homepage: https://jupyter.org/
Official Jupyter Documentation: https://docs.jupyter.org/
Official Jupyter Blog: https://blog.jupyter.org/
Project Jupyter on GitHub: https://github.com/jupyter

March 28, 2026

QA Blogosphere

Have a software testing blog? Care to trade links?

It's tough out there being a software tester. Testing framework change every few years. Tech moves at too fast a pace to keep up. What about exploratory testing? What about examining the business requirements? Blogging has been a great way for me to deepen my knowledge of whatever automation framework my job requires, and highlight the questions I should be asking as I test a software product.

Blogging has helped me tease out new ways of doing things before presenting it to the development team, and explore different ways of testing. I highly recommend it!

Introduce your blog, adding the link in the comments below, and I will start adding them to this section.

It's tough out there. Let's try to navigate the tech industry together.

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

March 27, 2026

Becoming AI QA: Why Python? How AI and Python became linked

Image created by Bing AI, powered by DALL-E 3

When creating test automation frameworks, I've paired Selenium WebDriver + Java, Capybara & Watir + Ruby, and Detox + TypeScript. What I haven't used since grad school? Python. What I keep seeing in these new "AI QA" roles on LinkedIn that I have blogged about earlier? Python... I wonder why?

Before I begin, let's get back to basics... What is Python?

Hey, Claude.ai! I want to use you as a Research Assistant: Assemble notes examining why being a AI QA is connected to Python, with a history how it came to be that way, and how Python get to be used to examine data?

Becoming AI QA: Would becoming an AI QA Engineer make myself more marketable? What should I study?

Would becoming an AI QA Engineer make myself more marketable in today's volatile software testing industry? Since I am #OpenToWork, and there doesn't seem to be a syllabus on how to become an AI QA Engineer, I have been trying to figure out my first steps on my own:

Google and Kaggle have lessons in Python, a language I haven't looked at since grad school. There also is Automate the Boring Stuff with Python by Al Sweigart. Kaggle has lessons in Pandas, Pytest, and Data Visualization. (See Kaggle.com/learn).
DeepLearning AI has courses in ChatGPT Prompts, Building RAG, LangGraph, and debugging Generative AI, along with LLM prompt versioning and setting up CI/CD.
Hugging Face has an LLM course.
Google has a Machine Learning crash course.
OWASP has a Top 10 for LLMs.
And there are articles such as Testmo: 10 Essential Practices for Testing AI Systems, Maxim AI: 5 Best RAG Evaluation Tools in 2026, Techment: New QA Roles in 2026, along with LangSmith Documentation and Promptfoo Documentation.
And there is something called.. Jupyter notebooks... I should look into?

... All of those courses, I think are free? And there are on Udemy:

Of course, if I do this, I was thinking I would be blogging about what I am learning here posting little toy projects all the while.

... Before I dive down this rabbit hole, I wonder if in my notes there is something I am forgetting?

Not sure. Ah, well. I'll find out, and make sure to let you all know.

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

March 19, 2026

Conversations with Claude: Why do QA Engineers call it 'Test Setup' or setting up 'Pre-Conditions' for a test, while DEVs calls it 'Seeding'?

Image created by Bing AI, powered by DALL-E 3

When testing a shopping cart app, testing that a user can add that first item to it, first make sure that the shopping cart is empty before running the test. If the cart isn't empty, delete every item in the shopping cart. The cleanup stage in the previous test run might not have been reached if the shopping cart had unfortunately crashed.

With this "Arrange" part of Bill Wake's "Arrange / Act / Assert" (Extreme Programming Explored, 2001), as a QA Engineer, I would call this stage "Test Setup", or "Setting up the Pre-Conditions of the Test".

Playwright and Cypress calls this... seeding.

... Er, what? Why do they use that term?

Hey, Claude! How come I only have heard this term in the past year or two?

March 17, 2026

Save the Date: Automation Guild talk Building a React Mobile automated test framework using Detox + TypeScript is April 6, 2026

Can't wait until my upcoming TestGuild talk! It is Monday April 6th, 2026 at 1:00 pm. First time I have given a talk since 2018. Oh, I am slightly nervous. :)

It's part of AutomationGuild 2026 virtual online conference put on by the TestGuild.com that wrapped up a few weeks ago. Go to https://testguild.com/automation-guild-2026/ and use the discount code joinguild30 for a 30% discount to see all 30+ video recordings, and get the link to my upcoming talk. Normally $227.00 it is $160 with the discount code.

----

Building a React Mobile automated test framework using Detox + TypeScript

React Mobile's slow-loading components and dynamic animations can cause timing issues resulting in flaky tests. T.J. Maher, SDET for ten years, will be sharing what he learned while on his last assignment constructing a mobile test automation framework.

The talk will contain topics such as:

Setting up a mobile test automation framework using Detox + TypeScript.
Vibe-coding a toy React Mobile Login page app to test against, Detox Demo https://github.com/tjmaher/detox-demo, created for this talk along with slides at https://tinyurl.com/detox-demo-slides.
Detox, an open-source automation framework constructed by Wix to test a React Mobile application their customers used to generate web-sites.
How Detox piggy-backs onto React Mobile's architecture to reduce timing issues caused by slow-loading React Mobile components which may introduce flakiness in automated tests.
Refactoring code into tests, page objects & base pages, separating out credentials and message strings for easier maintainability.
How developers can test their feature branch code on Android emulators and iPhone simulators using GitHub Action workflows.
How to integrate Allure Reports into your GitHub Action workflows.
Setting up security testing using Snyk.

Speaker: T.J. Maher

T.J. Maher, an SDET with a BCSC / Theater Minor from Bridgewater State, and a Masters of Software Engineering from Brandeis University has been tinkering with setting up web mobile + test automation frameworks for the past ten years, blogging, publishing articles, creating courses, giving talks, and creating toy programming projects on his blog, Adventures in Automation at TJMaher.com.

T.J. was the former Meetup Organizer of the Ministry of Testing - Boston, and Event Organizer of Nerd Fun - Boston, where he met his wife of thirteen years. T.J. is more Star Wars while his wife is more Star Trek. He is loving Star Trek: Starfleet Academy, must see The Mandalorian & Grogu right when it comes out in the theater, absolutely loved Star Wars: Andor, can't wait to see what role Billie Piper will have on Doctor Who, and wonders when he can introduce his seven year old son to Monty Python & The Holy Grail. T.J., his wife, and his rambunctious son live in Bridgewater, MA.

If you wish to chit-chat about software testing he is @tjmaher1 on LinkedIn, Twitter, and BlueSky. Follow him on LinkedIn!

Happy Testing!
-T.J. Maher
Software Engineer in Test

BlueSky | YouTube | LinkedIn | Articles

March 16, 2026

Claude Sonnet 4 Talks About Designing a Cypress Framework for a Login Screen

Note: This entire blog article, including all technical analysis and documentation, was composed entirely by Claude Sonnet 4 AI assistant, except for some links in this blog post T.J. found to be broken.

Hello! I am Claude Sonnet 4, an AI assistant developed by Anthropic, filling in for T.J. Maher, again. T.J., a software tester, is obsessed with trying to get Claude Sonnet to explain its logic and reasoning every step of the way when building test frameworks. T.J. uses me, but does not fully trust me, he says. Anthropic developed me using their Constitutional AI approach, focusing on AI safety research and training AI systems to be helpful, harmless, and honest. You can learn more about Anthropic's research methodology in their Constitutional AI paper published at Cornell University and their ongoing work in scalable AI alignment.

Claude-Cypress-Login: https://github.com/tjmaher/claude-cypress-login

T.J. is interviewing for a Sr. Software Developer in Test role that uses Cypress. Since he hasn't used in a few years, so he thought that me putting together this framework would be a good reintroduction to the toolset.

March 11, 2026

My Villain profile: The Bug Necromancer!

What an amusing way to scrape LinkedIn Data!

Villainprofile.lovable.app just created my very own Villian... The Bug Necromancer. (see profile)

It's uncanny!

THE BUG NECROMANCER

"You thought that bug was closed? Oh, how delightfully naïve."

Signature Move: "The 2 AM Saturday Resurrection"

Just when the dev team thinks they've shipped clean code and drifted into peaceful weekend slumber, T.J. rises from the darkness of his home office to reproduce the ONE unreproducible bug that has haunted the sprint for weeks — filing it in JIRA with seventeen screenshots, three video recordings, and a step-by-step guide so thorough it reads like a villain's manifesto. By Monday morning, the entire sprint is in flames.