Pack de Generación de Tests con IA + E2E
Diez picks para el dev que quiere que la IA escriba los tests unitarios que faltan, genere property tests desde especificación, controle un navegador real para E2E y triage snapshots en CI. Test Engineer agent + Vitest + Jest + pytest + Hypothesis + MSW + Playwright + Playwright MCP + Playwright Tester agent + verify-app. En orden de instalación.
What's in this pack
This is the pack for the engineer who finally accepted that AI is faster than they are at writing the boring tests — the ones for the validator that takes 6 string permutations, the integration where the response body has 14 fields, the E2E that clicks through onboarding. You don't want to author those by hand anymore. You want a system: an agent that reads the file and proposes the test plan, a runner that doesn't make you wait, a property-based gen for the cases your brain misses, a mock so tests don't hit prod, a real browser an agent can drive, and a CI subagent that triages the red lights.
The ten picks below are the install order for that system. JavaScript/TypeScript is the spine (most of the modern web), Python is the second track (most of the AI/data side), and the bridge between them is the Test Engineer agent that picks the right runner per language. Every pick is open-source and lives on TokRepo so an AI coding agent can install it from inside a session.
Who this is for: a dev with a real codebase that has <40% coverage, who has tried to write tests during sprint time and given up, and who now wants Claude / Codex / Cursor to do it under supervision. By the end of step 10 you have unit + integration + E2E + snapshot tests running on every commit, with a subagent that reads failed runs and explains what broke.
Install in this order
- Claude Code Agent: Test Engineer — Start here. This is the meta-agent that reads your codebase, picks the runners, drafts the test plan, and delegates the rest. Without it you'll install eight tools and never wire them together. Invoke it as
@test-engineerand let it propose the strategy before you install any specific framework. - Vitest — The fast unit runner for anything Vite-flavored (Nuxt, modern React, Svelte, Solid). Native ESM, TypeScript out of the box, Jest-compatible API, HMR-style watch mode that reruns in ~50 ms. Install this first because it's the lowest-friction win — a Vitest suite is a one-file
vitest.config.tsaway. - Jest — The fallback. Pre-Vite codebases (Create React App, older Node, anything CommonJS) still ride Jest. Same expect-API as Vitest, so tests written for one mostly port to the other. Install only if Vitest's Vite assumption doesn't fit; otherwise skip.
- pytest — The Python side. Fixtures, parameterization, plugins, the works. AI agents love pytest because the assertion failure messages are unusually readable — when Claude reads a failed pytest run, it knows exactly what to fix.
- Hypothesis — Property-based testing for Python. Instead of writing 20 example inputs, you write the property ("reversing a list twice gives the original") and Hypothesis generates the inputs, shrinking failures to the minimal reproducer. This is where AI test generation goes from "plausible" to "actually finds bugs." Pairs with pytest natively.
- MSW (Mock Service Worker) — Network layer mocking. Intercepts
fetchandXHRat the service worker, so your integration tests don't hit production. The AI angle: when an agent writes a test that should hit/api/users, MSW gives it a deterministic response without secrets / rate limits / flakes. This is the boundary between unit-style and E2E-style. - Playwright — The E2E framework. Cross-browser (Chromium, Firefox, WebKit), auto-wait so flakes drop ~90%, video + trace on failure for debugging. If you only install one E2E tool, install this. Generates tests from
codegenrecording, which an agent can then refine. - Playwright MCP — The MCP server that exposes Playwright to AI agents. Now Claude Code / Cursor / Codex can navigate pages, fill forms, click buttons, take snapshots — driving a real browser, not guessing about the DOM. This is what makes "AI runs my E2E" actually work versus the agent hallucinating selectors.
- Claude Code Agent: Playwright Tester — A specialist subagent that writes Playwright specs. Feed it a user flow ("signup → onboard → first project"), it produces a
.spec.tsfile with proper locators, auto-waits, and assertions. Without this, you're typingpage.click(...)by hand; with it, you're code-reviewing tests an agent drafted. - verify-app — E2E Test Subagent for Claude Code — The CI layer. After a Claude Code session that touched the codebase,
verify-appruns the relevant E2E tests on the changed surface, triages failures, and reports back in plain English ("the signup test failed because the button moved fromdata-ctatodata-test-id"). This is the closing loop: tests run, tests fail, agent explains, you fix, agent reruns.
How they fit together
Test Engineer (#1)
│
└─ reads codebase, picks runners, drafts plan
│
Unit layer:
Vitest (#2) ← primary (Vite-flavored)
Jest (#3) ← fallback (legacy / non-Vite)
pytest (#4) ← Python side
│
Property layer:
Hypothesis (#5) generates inputs from spec
│
Integration boundary:
MSW (#6) mocks the network so tests stay hermetic
│
E2E layer:
Playwright (#7) — framework
Playwright MCP (#8) — agent drives the real browser
Playwright Tester agent (#9) — writes the specs
│
CI layer:
verify-app subagent (#10) — runs E2E on diff, triages failures
The Test Engineer + Hypothesis + verify-app trio is the agentic backbone. Take those three away and the rest is just a normal test stack. Keep them and the loop closes: agent plans, generators surface edge cases, runner reports, subagent triages, you make decisions instead of typing assertions.
Tradeoffs you'll hit
- Vitest vs Jest — Vitest is faster, native ESM, no transform config — but it assumes Vite. Jest is older, slower, but works in every JS environment ever shipped. Rule of thumb: new project = Vitest; codebase you inherited = whatever's already there.
- Hypothesis vs example-based tests — Property tests catch bugs your examples never would, but they're harder to write and read. Use Hypothesis for pure functions (parsers, validators, math) and stick with examples for I/O-heavy code (the property is just "it doesn't crash," which isn't useful).
- MSW vs real test server — MSW is faster and deterministic, but it lies — your code passes against a mock that doesn't match prod schemas. Combat this by generating MSW handlers from your OpenAPI spec, so the mock and prod can't drift silently.
- Playwright vs Playwright MCP — The framework runs your scripted tests. The MCP server lets an agent improvise. Both, not either: scripted Playwright runs in CI for regression; Playwright MCP is for ad-hoc "agent, click through the new onboarding and tell me what's broken."
- Test Engineer agent vs writing your own plan — The agent is right ~80% of the time on strategy and wrong about your specific business invariants. Treat its plan as a draft; edit before executing.
Common pitfalls
- Installing all 10 at once — Don't. Pick your stack (JS or Python), install steps 1-2-5-6-7-10, ship a green pipeline, then add the rest. A pack is a menu, not a mandate.
- Letting the agent write 200 tests on day one — Quality matters more than count. Have the Test Engineer agent draft 10 critical-path tests, code-review each, then expand. 200 mediocre tests is technical debt with a friendly wrapper.
- MSW handlers that never refresh — Treat MSW handlers like type definitions: regenerate when the API changes. Stale mocks are how "all tests green" still ships broken code.
- Playwright MCP in CI — Don't. MCP is for interactive sessions where an agent explores. CI should run scripted Playwright specs (faster, reproducible, no LLM cost per run). Use Playwright Tester (#9) to write the spec; let CI run it deterministically.
- Skipping verify-app because "my CI already runs tests" — Your CI runs tests. verify-app explains failures. The first time a Claude Code session breaks a test and verify-app tells you which selector changed, you'll see the difference.
- Hypothesis on impure code — Property-testing a function that touches the database is a recipe for flakes. Refactor the pure logic out first, property-test that, leave the I/O for example-based pytest cases.
10 recursos listos para instalar
Preguntas frecuentes
Do I really need both Vitest AND Jest in the same project?
No. Pick one. The pack lists both because different projects ride different stacks — Vite-flavored codebases install Vitest, legacy CRA / Node CommonJS codebases stay on Jest. If you're starting fresh, pick Vitest and skip step 3 entirely. The Test Engineer agent (#1) can inspect your repo and tell you which one applies in 30 seconds.
What's the difference between Playwright (#7), Playwright MCP (#8), and the Playwright Tester agent (#9)?
Playwright is the framework — runs .spec.ts files in CI. Playwright MCP is a server that lets a coding agent drive a real browser interactively (great for exploration, terrible for CI cost). Playwright Tester agent is a specialist that writes the .spec.ts files for you. The pipeline: Tester agent writes specs → CI runs Playwright on them → MCP is for ad-hoc when something weird needs a real browser session right now.
Why include Hypothesis when AI can generate examples directly?
AI generates examples it can imagine. Hypothesis generates examples derived from the property — random strings with unicode edge cases, integers near boundaries, lists with shared references — the cases a human (or an LLM) wouldn't think to try. Pairs well: have Claude propose the property statement, let Hypothesis hunt counter-examples, ship the test.
Which three would you install if I only have an afternoon?
Test Engineer agent (#1), Playwright (#7), and verify-app (#10). The agent plans, Playwright runs the highest-leverage tests (E2E catches bugs unit tests never will), and verify-app explains the failures. Add Vitest (#2) or pytest (#4) on day two depending on language. The middle picks (MSW, Hypothesis, Playwright MCP) are upgrades that pay off in week two.
Does this pack assume Claude Code specifically?
The two subagents (#1 Test Engineer, #10 verify-app) and #9 Playwright Tester are Claude Code-native. Everything else (Vitest, Jest, pytest, Hypothesis, MSW, Playwright, Playwright MCP) is language- and tool-agnostic — works under Cursor, Codex CLI, Cline, Roo, and plain CLI runs. If you're not on Claude Code, swap the three agent picks for the equivalent in your toolchain and the rest of the install order still holds.
12 packs · 80+ recursos seleccionados
Explora todos los packs curados en la página principal
Volver a todos los packs