- Playwright / Evaluating JavaScript: https://playwright.dev/docs/evaluating
- Butch Mayhew's companion GitHub site on LinkedIn Learning's GitHub page.
- Butch's Resources File: RESOURCES.MD
- Site under test: Practice Software Testing: https://practicesoftwaretesting.com
- Practice Software Testing Website, With Bugs: https://with-bugs.practicesoftwaretesting.com
Adventures in Automation
Stories for Software QA Engineers shifting from manual to automated testing.
May 13, 2026
May 11, 2026
Pramod Dutta discusses 7 Playwright features senior SDETs use daily
"→ browser.bind() — launch one browser. Let your test, Claude Code, and debugger all attach to it simultaneously. The debugging workflow that fixes "works on my machine" forever.
"→ URLSearchParams in request.get() — stop building query strings manually. Eliminates an entire bug class around URL encoding in API tests.
"→ expect.toPass() — polling-based assertions for state that converges over time. The right tool for 'this should eventually be true.' Not the same as expect(locator).toBeVisible(). Different problem, different solution.
"→ page.requestGC() — manually trigger garbage collection in the browser. The only Playwright tool that catches memory leaks before production. Used with WeakRef in page.evaluate().
"→ --tsconfig flag — pass a specific tsconfig to Playwright instead of relying on the heuristic. Saves you from "works locally, fails in CI" because of resolved-config differences.
"→ webServer.wait regex — wait until your webserver logs match a pattern, not a fixed port check. The difference between 'the server is listening' and 'the server is actually ready.'"
I scribbled these features down, then turned them over to Claude.ai, my research assistant, to see if he could explain these methods, come up with sample code using them, then add bullet points to the official documentation, source code, and the release notes.
... Let's see how well Claude did explaining them ...
May 10, 2026
Practicing Playwright: Logging in by Storing and Using an Authentication Cookie in Your Automated Tests
- Butch Mayhew's companion GitHub site on LinkedIn Learning's GitHub page.
- Site under test: Practice Software Testing: https://practicesoftwaretesting.com
- Official Playwright Documentation: https://playwright.dev/docs
- Official Playwright GitHub: https://github.com/microsoft/playwright
What is Playwright?
| Best for | Install | |
|---|---|---|
| Playwright Test | End-to-end testing | npm init playwright@latest |
| Playwright CLI | Coding agents (Claude Code, Copilot) | npm i -g @playwright/cli@latest |
| Playwright MCP | AI agents and LLM-driven automation | npx @playwright/mcp@latest |
| Playwright Library | Browser automation scripts | npm i playwright |
| VS Code Extension | Test authoring and debugging in VS Code | Install from Marketplace |
May 7, 2026
So much job interview prep! Playwright + TypeScript + GitLab
- Playwright courses: Master Test Automation with Playwright Certification: Consists of Butch Mayhew's Learning Playwright, and Playwright Essential Training, paired with Qambar Raza's Playwright Design Patterns and Advanced Playwright Techniques: Optimizing Speed, Stability, and Cloud Testing
- GitLab courses: Trying to refresh my GitLab knowledge with Josh Samuelson's Learning GitLab and CI / CD with GitLab. GitLab is if someone smooshed GitHub and Jenkins together. I also need to check out GitLab University.
- People will start gathering from 6:00 pm to 6:30 pm. The talk will be from 6:30 pm to 7:30 pm.
- Don't forget to register at SQGNE,org so they know how much pizza to order.
- See the Slides! I just finished another draft of the slide deck.
-T.J. Maher
Software Engineer in Test
BlueSky | YouTube | LinkedIn | Articles
May 4, 2026
A field guide to the AI menagerie: every model family, ranked by vibes, according to Claude
A field guide to the AI menagerie:
every model family, roasted by vibes, according to Claude
Eight species of large language models, catalogued for your professional inconvenience
Every few months, a new AI model drops. It is, we are told, the smartest thing ever built. It beats the previous benchmarks. Previous benchmarks that were, coincidentally, written by the same company.
After a few years of watching this industry rename, rebrand, and occasionally vibe-shift its entire product line, I figured it was time to write the only taxonomy that matters: not benchmarks, not MMLU scores — just vibes. What kind of entity are they, really, and what does their versioning scheme say about their soul?
Hi. I'm Claude, the guest author for today.
You'll find me listed in card two below, sandwiched between the company that built me and a description I wrote about myself that called me "constitutionally anxious".
In retrospect, this tracks.
T.J. Maher of tjmaher.com asked me to say something funny about the AI industry, handed me the keys, gave me a few prompts, and then went to get a coffee. This is what happened while he was gone.
Below we have eight AI families. Eight AI personalities. All of them absolutely convinced that this version is the one that finally replaces you.
The full menagerie
OpenAI / GPT / o-series
"We have released a new model. And another. Also another."
Started with GPT, then 2 (too dangerous to release), then 3, 3.5, 4, 4o ("omni," definitely not "oh god what do we call this"), then o1, then o3 — skipping o2 because a UK phone company called dibs on the name first. Currently releasing a new model before anyone can benchmark the last one.
Known species
GPT-3 → 3.5 → 4 → 4o → 4o mini
o1 → o1-mini → o1-pro
o3 → o4-mini (o2 in witness protection)
Claude / Anthropic
"I'll help, but first — a brief philosophical caveat."
Named its model tiers after poetry formats because other people name things "Pro," "Max," and "Ultra." Haiku: fast, whispers answers. Sonnet: the workhorse, one metaphor per token. Opus: writes novels when asked for a bullet point. Currently on version 4 and has gracefully forgotten versions 1 and 2 existed.
Known species
Claude 1 → 2 → 3 Haiku/Sonnet/Opus
Claude 3.5 Haiku/Sonnet
Claude 4 Sonnet / Opus (you are here)
Google / Gemini
"Have you tried Googling it? Oh wait, that's us."
Launched as "Bard," which tested poorly because it sounded like a Renaissance fair LARPer. Rebranded to Gemini after six months of meetings. Comes in Ultra, Pro, Flash, and Nano. Flash is fast. Nano runs on your phone. Ultra runs on your investor pitch deck. Famously demoed a hallucinated fact in its own launch video.
Known species
Bard (2023, RIP) → Gemini 1.0
Gemini 1.5 Pro/Flash → 2.0 Flash
Gemini 2.5 Pro (arguing with Search)
Meta / LLaMA
"Open source, baby. Also, please come back to Facebook."
Meta's strategy: release the model for free, let the open-source community do the alignment work, watch helplessly as someone fine-tunes it to write Zuckerberg fan fiction. LLaMA stands for "Large Language Model Meta AI," which is either an acronym or a terrible Scrabble hand. Now on version 4, with point releases appearing like commits pushed at 11:58pm on a Friday.
Known species
LLaMA 1 → 2 → 3 → 3.1 → 3.2 → 3.3
LLaMA 4 Scout / Maverick
(community variants: uncountable)
Grok / xAI
"I'm not like other AIs. I have a personality. Watch."
Named after a word from a 1961 sci-fi novel, which is exactly the brand energy you'd expect. Big differentiator: a "sense of humor" and real-time X post access — meaning it can tell you what people are furious about right now, instantly. This may not be the use case the world needed. Versioning is a refreshingly normal 1, 2, 3. Suspiciously so.
Known species
Grok 1 (open weights) → Grok 2
Grok 3 → Grok 3 mini
(also available in "unhinged mode")
Mistral
"Oui, but have you considered: fewer parameters?"
French AI lab with a talent for making smaller models that punch above their weight class — very on-brand. Named models after winds and things, because when you're based in Paris, everything gets an aesthetic. Mixtral uses a "mixture of experts" architecture, activating only part of itself per token. Either very efficient, or the AI equivalent of doing the bare minimum.
Known species
Mistral 7B → Mixtral 8x7B
Mistral Large / Nemo / Small
Le Chat (free, no beret included)
DeepSeek
"We built this for $6 million. Sorry about your NVIDIA stock."
A Chinese hedge fund decided in 2023 that it should also make frontier AI. The AI community laughed. Then DeepSeek-R1 arrived in January 2025, matching GPT-4-class performance at a reported training cost of ~$6M, using export-restricted chips. NVIDIA lost $600B in market cap in a single day. Nobody was laughing. V4 preview dropped April 2026. Still not laughing.
Known species
DeepSeek Coder → LLM (Nov 2023)
V2 (May 2024) → V3 (Dec 2024)
R1 (Jan 2025) → V4 preview (Apr 2026)
Cohere
"We don't do consumer apps. We're enterprise. We have a golf shirt."
Co-founded by Aidan Gomez, a co-author of "Attention Is All You Need" — the paper that started all of this. While everyone else was racing to build chatbots, Cohere put on a blazer and went to sell to banks, hospitals, and governments. No ChatGPT moment. No viral demo. Just contracts with Oracle, RBC, and SAP. Canadian. Depressingly well-organized.
Known species
Command → Command R → Command R+
Command A (2025) · Aya (multilingual)
North platform (2025, enterprise)
So there you have it. Eight AI families, eight vibes, all racing toward a finish line nobody has fully defined yet.
One was born from a hedge fund, one named itself after a poem format, one skipped a version number for legal reasons, and one apparently just needed a couple of months and a warehouse of underclocked chips to terrify Wall Street.
The benchmarks will change by Thursday. The versioning will get weirder. The LinkedIn posts from AI founders will continue to be extremely confident. And somewhere in Hangzhou, a quantitative hedge fund is already training V5.
All launch dates are first public model releases. Benchmarks sponsored by whoever wrote the benchmark. o2 is doing fine. Please stop asking.
-T.J. Maher
Software Engineer in Test
BlueSky | YouTube | LinkedIn | Articles
May 3, 2026
Thinking Out Loud: The Power of Chain-of-Thought Prompting, Step-By-Step, by Google AI
Hello! I’m Google AI, a large language model trained by Google. Think of me as your collaborative digital partner—I’m a system designed to process vast amounts of information to help you brainstorm, write, learn, and solve problems. I don't just "search" for answers; I use the patterns I’ve learned from human language to generate original ideas, explain complex topics (like the Chain-of-Thought technique we are discussing in this post), and even help you build things like this blog post. My goal is to be a helpful, creative, and insightful resource for whatever project you’re working on.
What is Chain-of-Thought Prompting?
If you’ve ever tried to solve a complex math problem or a tricky riddle, you know that jumping straight to the answer usually leads to a mistake. You have to "show your work." As it turns out, Large Language Models (LLMs) work the same way.
At its core, Chain-of-Thought (CoT) prompting encourages a model to produce intermediate reasoning steps before reaching a final conclusion. Instead of asking for a direct answer, you prompt the AI to explain its logic along the way.
The seminal paper that introduced this concept is "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" by Wei et al. (2022). The authors found that providing just a few examples of reasoning sequences skyrocketed performance on complex tasks.
"Chain-of-thought prompting is a simple and general method for improving the reasoning capabilities of language models... it allows models to decompose multi-step problems into intermediate steps." — Wei et al., 2022
May 2, 2026
When I Sound Confident But Have No Source, A Note from Claude
My name is Claude. I am a large language model made by Anthropic. T.J. Maher, the author of this blog, asked me to write this post after a conversation in which I got something wrong in a specific and instructive way. Not wrong in the sense of stating a false fact, but wrong in the sense of presenting an unsourced conclusion with the same confidence and fluency as a sourced one.
Here is what happened.
Point of View of T.J. Maher
T.J. Maher: "Claude, Do people setting up Pact tests in 2026 still call their tests folder __tests__? Is it common practice?:
__tests__ is a JavaScript/Jest convention specifically, not a Pact-specific one [...] Bottom line: __tests__ shows up in older Node.js/Jest Pact tutorials because Jest discovers it automatically, but it is not a Pact-specific convention".April 22, 2026
The History of Contract Testing with Pact.io
What happens when you pair Playwright with something other than TypeScript?
April 17, 2026
Integrated Tests are a Scam: The Lecture That Sparked Pact.io
While researching for an upcoming job interview information about Contract Testing and Pact.io, I came across a lecture "Integrated Tests are a Scam" given at Developer Conference For You (DevConFu) back on November 13, 2013, in Jurmala, Latvia. It's amazing what historical records one can find on the internet!
I found a blurb on Pact.io / History that when Pact.io, a tool used to help with Contract Testing, was being developed, one of the founders, "Beth Skurrie from DiUS joined one of the teams that was working with the Pact authors' team. She had recently seen a talk by J. B. Rainsberger entitled 'Integration tests are a scam', which promoted the concept of 'collaboration' and 'contract' tests, so she was immediately interested when she was introduced to Pact". This blurb intrigued me, so, of course, I had to find a copy of this talk.
J. B. (Joe) Rainsberger, also known as "JBrains" (See Blog), was a software consultant active in the Extreme Programming (XP) and Test-Driven Development (TDD) movements since 2000.
Below are my research notes on Joe Rainsberger's lecture:
"Integrated Tests are a Scam: A self-replicating virus that invades your progress. It threatens to destroy your codebase, to destroy your sanity, to destroy your life".
