verify-app — E2E Test Subagent for Claude Code
Open-source Claude Code subagent that runs end-to-end tests on recent changes and triages failures. Inspired by Boris Cherny's verify-app setup.
What verify-app Is and Why It Matters
verify-app is a Claude Code subagent that runs end-to-end tests against the files Claude just modified, then reports failing assertions with a one-line root-cause hypothesis. The pattern is named directly by Boris Cherny, the creator of Claude Code at Anthropic, on his public setup page howborisusesclaudecode.com. The community-written equivalent in this skill captures the same contract that Boris describes: identify changed files, locate matching E2E tests, run only those tests, surface broken assertions with reproduction hints. Setup time is under 1 minute, and it works with Claude Code 1.x and above.
The leverage is high because end-to-end coverage is where regressions hide. According to the official Playwright documentation, end-to-end tests "emulate real user scenarios" by driving a real browser, which is the only reliable way to validate multi-component behavior after a feature implementation session (Playwright Test docs). Without a subagent gate, you ship the diff to CI and wait 10 plus minutes for the same answer Claude could have given you locally in 90 seconds.
How verify-app Works (the Contract)
The subagent definition is a single Markdown file saved to .claude/agents/verify-app.md inside your project. The frontmatter declares the four tools the agent is authorized to invoke: Bash, Read, Grep, Glob. The body of the file is the system prompt the agent runs under. The contract has 5 numbered steps:
- Identify changed files via
git status --shortandgit diff --name-only. - For each changed file, locate E2E tests that exercise it by searching
tests/e2e/,cypress/,playwright/, ore2e/directories and matching by import paths and feature names. - Run only the matched tests using
npm run test:e2e -- <pattern>or the equivalent for your framework. - For each failure, capture the failing assertion plus
file:line, identify the most likely changed file responsible, and emit a one-line root-cause hypothesis. - Output a structured summary with totals and per-failure detail.
The subagent prompt also declares hard boundaries: do not run unit tests (that belongs to a different subagent), do not fix anything yourself (the contract is "test, don't touch"), and if no E2E tests match the changed files, escalate with the exact line "No E2E coverage for X. Add a smoke test?" so the human decides whether to add a smoke test or proceed.
The Anthropic Claude Code documentation calls this a subagent and confirms the conventions used here: subagents live in .claude/agents/, are activated with /agents or detected on restart, and run with their own scoped tool list (Claude Code subagents documentation).
Why Map Changed Files to Tests Instead of Running the Full Suite
A full Playwright or Cypress suite for a mid-size product typically runs 8 to 25 minutes. The verify-app contract scopes execution to the matched subset, which usually returns in 60 to 180 seconds. That keeps the inner loop fast enough that you actually run it after every Claude session instead of waiting for CI. The Playwright team explicitly recommends "running only impacted tests" through their --grep and project filters specifically for this reason (Playwright command line).
The matching logic in the prompt is intentionally simple: look at the changed file's import graph, glob for spec files that reference any of those modules, and union with feature-name matches. There are 4 supported framework conventions out of the box: tests/e2e/, cypress/, playwright/, and a generic e2e/ directory. For other frameworks, edit the Bash invocation in the subagent file. There is no auto-discovery for non-standard layouts because the prompt is meant to be readable in 30 seconds, not bulletproof against every monorepo shape.
Quick Start: 3 Steps to Live
The prompt_template ships a literal Quick Use block:
- Save the file in "How verify-app Works" to
.claude/agents/verify-app.mdin your project. - Restart Claude Code, or run
/agents reload. - After Claude finishes a coding session, say:
"Run verify-app on the changes."
That is the entire setup. There is no npm install, no config file, no environment variable. The agent reads its own system prompt from the Markdown file each time it is invoked, which is also why iteration is fast: edit the prompt, restart Claude Code, retry. The Anthropic guide on building agents emphasizes this "prompt as the contract" model: agents are characterized primarily by their system prompt and tool list, not by code (Anthropic engineering: Building effective agents).
When to Run It and When to Skip It
The prompt_template calls out the use cases explicitly. Run verify-app:
- At the end of a feature-implementation session, before opening a PR.
- After running
/loopor/ralph-loop. Long autonomous sessions accumulate untested code, and verify-app is the natural gate. - Before merging a long-lived branch where the diff is wider than your memory.
Do not run verify-app:
- Mid-feature, while Claude is still writing. Let it finish. Half-implemented features will trigger noisy failures that have nothing to do with regressions.
- For pure refactors with no behavior change. Use
code-simplifierinstead, which is designed for that case.
The rule of thumb: verify-app is for the moment when you are about to type git push. If you are not at that moment, you are using the wrong tool.
The Output Format You Will See
The subagent emits a fixed-shape report. The prompt_template defines it verbatim:
verify-app summary
==================
Changed files: N
E2E tests run: N
Passed: N
Failed: N
Failures (if any):
1. <test_path:line> — <assertion>
Likely cause: <changed_file>
Hypothesis: <one-line>
The shape matters. A predictable structure means you can pipe the output into another agent, into a Slack message, or into a PR comment without parsing prose. The example session in the skill shows exactly what this looks like in practice: 5 files written, 4 E2E tests found in tests/e2e/admin/, 3 pass, 1 fails at tests/e2e/admin/audit.spec.ts:24 with the hypothesis "timestamp format changed from ISO to epoch" pointing at src/lib/auditLogger.ts. That is enough to fix in 60 seconds without a debugger.
How verify-app Compares to Adjacent Subagents
verify-app is one node in a small ecosystem of named subagents. Each owns a different verification surface, and stacking them produces a 4-stage pre-PR pipeline.
Frequently Confused Points
It is not Boris's exact subagent. Boris Cherny named verify-app on howborisusesclaudecode.com, but his private .claude/agents/verify-app.md has not been published by Anthropic. The skill is a community-written equivalent that captures the same contract from his public description. If Anthropic releases the original, this skill is structured so that you can drop the official version into the same path with no other change.
It does not fix anything. The contract is test, don't touch. If you want auto-repair, pair verify-app with a separate fixer subagent and let the human decide whether to invoke the fixer based on the verify-app summary. According to MDN's testing fundamentals, separating verification from repair is a core hygiene rule: "a test that fixes is no longer a test" (MDN: Cross-browser testing strategies).
It does not replace CI. It runs locally during your Claude session. The 10-minute earlier signal is the value, not redundancy.
It does not run unit tests. Unit-level coverage is a different subagent's job. verify-app stays focused on E2E so that the failure signal is unambiguous: if it breaks, the user-visible behavior broke.
Statistics and Concrete Wins
- Setup time: under 1 minute (1 file, 1 restart, 1 invocation phrase).
- Tools required: 4 (
Bash,Read,Grep,Glob). - Workflow steps in the system prompt: 5 numbered actions.
- Supported framework directory conventions out of the box: 4 (
tests/e2e/,cypress/,playwright/,e2e/). - FAQ entries shipped in the
prompt_template: 5. - Inner-loop time saved versus CI: roughly 10 minutes per cycle, depending on your CI queue depth.
- Boris-named subagents in the public reference set: 5, with verify-app being one of them.
These numbers come directly from the skill's prompt_template and the public Boris Cherny description, not from estimation.
Pairing It With Other TokRepo Skills
The full pre-PR loop becomes powerful when verify-app is one stage in a chain. A common Claude Code session ending looks like: run a long autonomous loop with the /loop scheduler, then run verify-app on the final diff, then run code-simplifier to clean dead branches, then run commit-push-pr to ship. The combination is what Boris is describing when he refers to his "setup" rather than a single agent. Each step has a single responsibility and a deterministic output, which is the pattern Anthropic recommends in its guide on building effective agents.
Source Attribution and Trust
The prompt_template is explicit that this skill is a community-written equivalent and not Anthropic's private setup. The Source and Thanks block links to:
- howborisusesclaudecode.com (verify-app section), Boris Cherny's public reference page.
- Boris Cherny on Threads:
https://www.threads.com/@boris_cherny/post/DTBVroqEg_K/, which is where the verify-app pattern was first surfaced publicly. - Pragmatic Engineer:
https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny, the long-form interview about how Boris's team uses Claude Code internally.
When you adopt the skill, keep the citation block intact. Trust in this kind of community port depends on the lineage being legible.
Common Setup Pitfalls and How to Avoid Them
The single Markdown file install model is forgiving but a few details trip up new users. First, the file path must be exactly .claude/agents/verify-app.md relative to the project root. Claude Code does not search parent directories. Second, the YAML frontmatter is required and must declare name, description, and tools keys. Skipping the description means the agent will not be auto-discoverable when you ask Claude to delegate. Third, the tools list is space-or-comma separated and case-sensitive: Bash, Read, Grep, Glob. A typo here silently disables a capability and you get cryptic permission errors mid-run. Fourth, after editing the file you must restart Claude Code or run /agents reload; the prompt is loaded once at session start. The official Anthropic subagent docs cover these conventions, and following them strictly is what makes the skill drop-in across teams.
Frequently Asked Questions
No. It is a community-written equivalent that captures the same contract from Boris's public description on howborisusesclaudecode.com. Anthropic has not open-sourced his private prompt, so this version reproduces the workflow steps and output shape rather than the literal text.
No. The subagent auto-detects Playwright, Cypress, and generic tests/e2e and e2e directories. For other frameworks, edit the Bash invocation in the subagent Markdown file to call your test runner. The four supported conventions cover most JavaScript and TypeScript projects.
No. The contract is test, do not touch. The subagent only reports failing assertions with a likely cause and a one-line hypothesis. Pair it with a separate fixer subagent if you want auto-repair, so the verification surface stays trustworthy.
No. It runs only the tests matched to the files you changed in the current session. That keeps each cycle in the 60 to 180 second range instead of the 8 to 25 minutes a full Playwright or Cypress suite typically takes on a mid-size product.
It runs locally during your Claude Code session, before push. Catching regressions roughly 10 minutes earlier than your CI queue means tighter inner loops and fewer abandoned PRs. CI still runs as the merge gate, verify-app is the pre-PR gate.
Four: Bash, Read, Grep, and Glob. Bash runs git and the test command. Read and Glob locate spec files. Grep matches imports and feature names. The agent is intentionally minimal so the surface area is auditable.
Citations (5)
Related on TokRepo
Source & Thanks
Inspired by Boris Cherny's verify-app subagent — referenced on howborisusesclaudecode.com and his Threads post. Original concept: the Anthropic Claude Code team's daily workflow.
Citations:
- howborisusesclaudecode.com (verify-app section)
- Boris Cherny on Threads: https://www.threads.com/@boris_cherny/post/DTBVroqEg_K/
- Pragmatic Engineer: https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny