Summary
I've worked at companies with entire QA departments: rooms of people clicking through the same flows before every release, filing tickets, arguing about repro steps. Most of that job is now something you can wire up. Not because testing got less important, but because agents got good at exactly the parts that burned humans out: reviewing every pull request, walking the same three flows every morning, catching the console error nobody looked for, filing the ticket with the screenshot actually attached.
This guide is the system for doing that on purpose. It started with a post about my dialer QA bot, where Claude Code drives a browser through real call flows and files GitHub issues for whatever breaks. Here we build the whole department around that idea: AI code review as the first gate (and the real tradeoffs between the tools), agents that write tests plus the mutation-testing trick that keeps those tests honest, browser agents that smoke-test every preview deploy, visual and accessibility passes, and the loop that turns every failure, from staging or production, into a filed issue that another agent fixes. (And if you want to build agents like these yourself, that craft is The Agentic Playbook; this guide is about putting them to work.)
And because I'd rather you trust this thing for the right reasons, we spend real time on where it breaks: agents that pass tests they should fail, self-healing tools that heal around genuine bugs, prompt injection hiding in the very pages your QA agent reads, and the work that still belongs to a human with product judgment. The goal isn't zero humans. It's humans doing the 30% that was always the actual job, with a tireless department underneath them.
This is a living document and will be updated as the tools and patterns evolve.