QA in the Era of AI

Summary

I've worked at companies with entire QA departments: rooms of people clicking through the same flows before every release, filing tickets, arguing about repro steps. Most of that job is now something you can wire up. Not because testing got less important, but because agents got good at exactly the parts that burned humans out: reviewing every pull request, walking the same three flows every morning, catching the console error nobody looked for, filing the ticket with the screenshot actually attached.

This guide is the system for doing that on purpose. It started with a post about my dialer QA bot, where Claude Code drives a browser through real call flows and files GitHub issues for whatever breaks. Here we build the whole department around that idea: AI code review as the first gate (and the real tradeoffs between the tools), agents that write tests plus the mutation-testing trick that keeps those tests honest, browser agents that smoke-test every preview deploy, visual and accessibility passes, and the loop that turns every failure, from staging or production, into a filed issue that another agent fixes. (And if you want to build agents like these yourself, that craft is The Agentic Playbook; this guide is about putting them to work.)

And because I'd rather you trust this thing for the right reasons, we spend real time on where it breaks: agents that pass tests they should fail, self-healing tools that heal around genuine bugs, prompt injection hiding in the very pages your QA agent reads, and the work that still belongs to a human with product judgment. The goal isn't zero humans. It's humans doing the 30% that was always the actual job, with a tireless department underneath them.

This is a living document and will be updated as the tools and patterns evolve.

Loading The QA Department, Unbundled…

Loading The First Gate: AI Code Review…

Loading Tests Written by Agents (and How to Keep Them Honest)…

Loading A Browser in the Agent's Hands…

Loading Teaching the Agent What Correct Means…

Loading The Pipeline: Wiring QA into CI…

Loading From Failure to Filed Issue to Fix…

Loading Visual Diffs, Accessibility, and the Self-Healing Question…

Loading How It Fails (and Who's Attacking It)…

Loading What's Left for Humans (and How to Roll This Out)…

Loading Toolkit: A Starter QA Pass for Your CLAUDE.md…

Loading Toolkit: The Pipeline Readiness Checklist…

Do you like my content?

Sponsor Me On Github

Keep reading

The Agentic Playbook

Everyone agrees you should be building agents. Nobody agrees on how. One camp says drag nodes on a canvas and ship this afternoon. The other says real agents live in code, in your repo, behind your own API. They're both right, and the argument is a distraction: the loop is the same either way. A model, a set of tools, a memory, and a stopping condition. Once you see that, the question stops being "which side is right" and becomes "which lane fits this job." This playbook walks both lanes properly. The first half builds agents in n8n: the AI Agent node, tools it can actually call (including MCP servers), memory and RAG on the canvas, human approval gates, multi-agent patterns, and the Evaluations feature that tells you whether any of it works. The second half builds the same ideas in TypeScript with the Vercel AI SDK inside an Astro site: a streaming chat endpoint, real tool definitions with schemas, the ToolLoopAgent, approval gates in code, structured output, and MCP as the bridge that lets your n8n workflows and your code agents share the same tools. It pairs with [Mastering n8n](/guides/mastering-n8n) (which covers hosting and hardening the platform itself) and [Roll Your Own Coding Agent](/guides/roll-your-own-coding-agent) (which builds the raw loop from nothing), and once you're shipping agents, [QA in the Era of AI](/guides/qa-in-the-era-of-ai) shows what happens when you point them at your test suite. This one is about shipping: picking a lane, building the agent, and knowing when to switch lanes as the job outgrows the canvas. _This is a living document and will be updated as the tools and patterns evolve._

Read guide

Mastering Hermes

Most of this series treats Hermes as the place its ideas land. Building Your Agentic OS pivots to it, Running the Fleet orchestrates on it, Self-Hosting the Agentic Stack deploys it. What none of them do is sit down and cover Hermes itself, the whole thing, every feature, the way I'd walk a competent developer through a tool they've never run in anger. That's this guide. Hermes is Nous Research's open-source, self-hosted agent, the one with a built-in learning loop: it writes its own skills from experience, curates its own memory, searches its own past conversations, and builds a deepening model of who you are across sessions. It installs with one command, runs on a five-dollar box or a GPU cluster, talks to whatever model you point it at, and you can message it from Telegram while it works on a cloud VM. That surface is a lot bigger than the rest of the series has needed to show. So here we go wide instead of deep: install and first contact, the model layer and Nous Portal, the terminal interface, the messaging gateway across six platforms, the six places it can run, the learning loop, context files and personality, tools and toolsets, MCP, scheduled automations, delegation and subagents, the security model, day-two operations, and migrating in from OpenClaw. Where a topic has its own field guide in this series, I point you there instead of repeating it. This is the manual that ties the rest together. _This is a living document and will be updated as Hermes updates._

Read guide

Agent Skills: A Field Guide to the Third Pillar

Your agent can write code. But does it know how your team cuts a release? Can it run your incident playbook the same way twice, or does it improvise something a little different every time? That gap, between raw capability and a repeatable way of doing one specific job, is exactly what skills fill. A skill is procedural memory you write down once: a packaged, reusable how-to that the agent loads when it's relevant and runs the same way every time. This is the third leg of a trilogy with [Agent Memory](/guides/agent-memory-field-guide) and [The Agent's Self](/guides/agent-self-personality-identity), the three pillars from [Building Your Agentic OS](/guides/building-your-agentic-os). Identity is who the agent is, memory is what it knows, skills are how it does things. We start with what a skill really is, and what it isn't, then build one from a plain folder and a single file. We dig into the two halves of the craft that actually matter: writing a description that makes the agent reach for the skill at the right moment, and writing a body that makes it succeed once it does. We cover progressive disclosure (why the whole skill isn't sitting in context all the time), how to tell a skill apart from a memory or a tool, and how to version and share skills across a fleet without letting them rot. By the end you'll be able to take a capable, general-purpose agent and turn it into a specialist that does your specific jobs your specific way, on demand, every time. _This is a living document and will be updated as the tools and patterns evolve._

Read guide