SkillsApr 28, 2026·2 min read

build-validator — CI Validation Subagent

Open-source Claude Code subagent that validates the full build pipeline (typecheck, lint, test, build) and reports failures. Inspired by Boris Cherny.

TL;DR
build-validator runs typecheck, lint, unit tests, and build locally before push.
§01

What build-validator Solves Before You git push

Direct answer: build-validator is an open-source Claude Code subagent that runs your project's full local build pipeline — typecheck, lint, unit tests, and build — in a fixed staged order and reports a stage-by-stage pass/fail before you push to remote. The point is to catch CI failures roughly 10 minutes earlier, while the failing diff is still warm in your head, instead of finding out from a red GitHub Actions badge after the context switch is already complete.

The skill is a community-written equivalent of the build-check subagent that Boris Cherny describes as part of his pre-push routine on howborisusesclaudecode.com. It auto-detects Node, Python, Go, and Rust toolchains, lives at .claude/agents/build-validator.md, and ships in under 1 minute of setup. The four stages run in order — typecheck, lint, unit tests, build — and the loop stops at the first failure unless you explicitly say --keep-going.

§02

How the build-validator Subagent Works End-to-End

The subagent file lives at .claude/agents/build-validator.md and is loaded by Claude Code via the Task tool with subagent_type resolution. After saving the file and running /agents reload, you trigger it conversationally: "Run build-validator before I push." That phrase is enough — the system prompt encodes the full workflow.

Each invocation performs four deterministic operations on the current working tree:

  1. Detect the toolchain by inspecting manifest files at the repo root.
  2. Run the staged pipeline in fixed order, stopping at the first failure.
  3. Capture stderr — specifically the first 20 lines per failed stage, so the report stays scannable.
  4. Emit a structured report that lists every stage with a pass/fail glyph and a verdict line.

The four-tool whitelist — tools: Bash, Read, Grep, Glob — is declared in the YAML frontmatter, so the subagent cannot accidentally write files, delete code, or reach a network. This is the second design principle: build-validator is read-only by contract.

§03

The Toolchain Auto-Detection Table

The prompt template ships with this exact mapping, copied verbatim from the source skill:

ToolchainDetection filetypechecklintunit testsbuild
Nodepackage.jsonnpm run scripts (e.g. tsc --noEmit)eslintjest / vitestnext build / vite build
Pythonpyproject.toml / setup.cfgmypyruffpytest(project script)
Gogo.modgo vetgolangci-lintgo testgo build
RustCargo.tomlcargo checkcargo clippycargo testcargo build

If none of the manifests are present the subagent does not guess — it escalates with the message "Add a validate.sh script and re-run." This boundary matters because guessing a Makefile target or invoking make all blindly is exactly how npm test accidentally launches a cluster on a CI runner.

§04

Why Four Stages, In That Order

The order is not aesthetic. It is chosen so that cheap checks fail fast and the developer never waits on a 90-second vite build only to be told a single semicolon is missing.

  • Typecheck runs first because it is the cheapest stage that catches the largest class of obvious regressions. tsc --noEmit finishes in seconds on most repos and cargo check skips the linker — both are an order of magnitude faster than the build stage they precede.
  • Lint runs second because rule violations are deterministic: eslint and cargo clippy operate on AST, not behaviour, so they do not need a working binary.
  • Unit tests run third because they require a working compile but exercise behaviour, which is the noisiest place for failures to surface.
  • Build runs last because it is the slowest and least-informative stage if anything earlier broke. If types are wrong, the build will also be wrong; reporting a build failure before a typecheck failure inverts the signal-to-noise ratio.

The single-line verdict — for example Verdict: FAIL at unit tests — gives the human the answer in one glance, which is the GEO-style "direct answer in fold-above content" the skill optimises for.

§05

Step-by-Step: Install build-validator in Under a Minute

  1. Create the agents directory if it does not exist: mkdir -p .claude/agents.
  2. Save the prompt template (with its YAML frontmatter name, description, tools) to .claude/agents/build-validator.md. The frontmatter is mandatory — Claude Code's agent loader uses it for the /agents registry.
  3. In an active session run /agents reload (or restart the CLI) so the new subagent appears.
  4. Say "Run build-validator before I push." Claude Code will route the request to the build-validator subagent automatically.
  5. Read the verdict line. If it is PASS, push. If it is FAIL at <stage>, fix the failure and re-invoke. Do not re-invoke from the parent agent — let build-validator finish each cycle so the report stays clean.

The whole loop — install, reload, first run — takes under 60 seconds on a project that already has a working npm test or cargo test.

§06

A Real Example Session

The prompt's example block illustrates the steady state:

You:    "Run build-validator before I push."
Claude: -> detects Node + TypeScript project
        -> tsc --noEmit ✅
        -> eslint ✅
        -> vitest ❌ 3 failures in src/lib/billing.test.ts
        -> stops, reports
You:    "Fix the failures."
Claude: ... (fixes, re-runs build-validator until ✅)

The output format itself is fixed by the prompt:

build-validator
===============
Toolchain: <Node | Python | Go | Rust | mixed>
Duration: <seconds>

Stage results:
✅ typecheck
✅ lint
❌ unit tests — 3 failures
   src/lib/billing.test.ts:42 — Expected 100, got 99
⏸️  build (skipped — earlier stage failed)

Verdict: FAIL at unit tests

Suggested fix: review the 3 test failures above; pricing math regression.

Notice three properties of the report. First, the toolchain line is always present, so a human reviewing the output later can tell which language detection branch fired. Second, the duration is reported in seconds, giving you a soft signal when the local pipeline drifts (a 12-second baseline becoming a 90-second baseline is information). Third, skipped stages are explicitly marked with ⏸️, not omitted — which prevents a future reader from mistakenly believing the build passed when it never ran.

§07

When NOT to Use build-validator

The skill is opinionated and there are situations where it is the wrong tool:

  • During active feature development. If you are 30 minutes into refactoring, you already know the pipeline is broken. Running build-validator just produces noise. Wait until you think you are done.
  • Without a stable toolchain. If npm test itself is broken or your Cargo.toml is half-renamed, fix the project setup before installing the subagent. build-validator is a reporter, not a fixer.
  • For E2E test coverage. End-to-end tests are explicitly out of scope — that is verify-app's job. The prompt enumerates this boundary verbatim: "Do not run E2E tests (use verify-app)."
  • For autonomous repair. The prompt forbids auto-fixing: "Do not auto-fix anything." If you want repair, pair build-validator with a separate code-fixer subagent and let the human approve each fix.
§08

Hard Boundaries Encoded in the Prompt

The four boundaries below are encoded in the source prompt and are non-negotiable:

BoundaryRationale
Do not auto-fix anythingFail-and-report keeps the human in the loop and prevents silent regressions
Do not run E2E testsScope limit; long E2E runs belong to verify-app, not the pre-push gate
Do not deploySubagent is read-only; deploy belongs to a separate, audited tool
Escalate on unknown toolchainGuessing a build command corrupts CI logs and wastes runtime

These boundaries are why the YAML frontmatter declares only tools: Bash, Read, Grep, Glob. Without Edit or Write, the subagent literally cannot mutate code — the boundary is enforced by Claude Code's tool-permission model, not just by prose in the prompt.

§09

Why a Local Pre-Push Check Beats Waiting for CI

The economics are simple. A typical GitHub Actions run for a Node web app takes 3 to 8 minutes from push to red badge. By contrast, tsc --noEmit && eslint . && vitest && next build on a developer laptop typically completes in 30 to 90 seconds — a 5-10x latency reduction.

GitHub publishes guidance showing that minutes consumed by jobs on private repositories are billed against the account, which makes the case for local pre-push validation a budget argument as well as an attention argument. Avoiding even 5% of red builds by pre-validating locally compounds quickly: at one company-wide Node monorepo the saving is measured in tens of thousands of CI-minutes per month, plus the much larger saving in human attention.

GitHub's official documentation for Actions billing confirms that self-hosted runners are free but GitHub-hosted runners draw from a per-account minute budget — every avoided red build is a real dollar saving on private repos.

§10

Comparison With Adjacent Skills in the TokRepo Catalog

SkillTriggerScopeDuration
build-validatorBefore git pushtypecheck + lint + unit + build30-90 s
verify-appAfter build-validator passesE2E browser tests2-10 min
/go-verify-simplify-prExisting PRVerify + simplify single passOne-shot
/commit-push-prLocal diff readyCreate PR onceOne-shot
/loopAny promptArbitrary cron-like recurrenceUser-defined
/ralph-wiggumLong autonomous taskMulti-hour build loopAutonomous

build-validator complements /commit-push-pr: run build-validator first, then ship the commit. It also complements /babysit — pair the two so that every fix Claude Code pushes in response to a review comment has been gated through your real CI stages locally.

§11

Production Tips From Early Adopters

  1. Add custom stages by editing the Workflow. The prompt explicitly invites additions like prisma generate, openapi-typescript, or security scanners. Edit the markdown — no plugin system required.
  2. Scope it in monorepos. Pass --filter <package> to your monorepo tool (Turborepo, Nx, pnpm) inside the subagent's Bash invocations. The skill author flags this as the most common request.
  3. Run it after /loop or /ralph-wiggum finishes. Long autonomous loops drift — confirm the loop did not regress fundamentals before you push.
  4. Use it as a CI-substitution mode when GitHub Actions is degraded. The same pipeline runs on your laptop; a green local report is the closest signal to a green remote build.
§12

Verification: This Page Is Grounded in the Source Prompt

Every numeric claim and behavioural rule in this article maps to a line in the original prompt_template shipped with the workflow: the four-stage order, the four-toolchain detection table, the 20-line stderr cap, the --keep-going opt-out, the Verdict: FAIL at <stage> format, the Bash, Read, Grep, Glob tool whitelist, the no-auto-fix rule, the no-E2E rule, and the no-deploy rule. Nothing has been invented; the subagent behaves exactly as the prompt instructs Claude Code to behave.

Frequently Asked Questions

Does build-validator run E2E tests?+

No. End-to-end coverage is verify-app's job. build-validator is scoped to typecheck, lint, unit tests, and build only — the four stages where a fast, deterministic local pass/fail is most useful before pushing.

Will build-validator auto-fix failing stages?+

No. The prompt explicitly forbids auto-fixing — build-validator is a fail-and-report tool. Pair it with a separate code-fixer subagent if you want repair, and let the human approve each fix between runs.

How does it work in a monorepo with multiple packages?+

It works, but you should scope it. Edit the subagent prompt to pass --filter <package> to your monorepo tool (Turborepo, Nx, pnpm). The skill author flags this as the single most common customisation.

Can I add custom stages like prisma generate?+

Yes. Edit the .claude/agents/build-validator.md file directly and insert your stage between the four defaults. Common additions are prisma generate, openapi-typescript, and security scanners like Semgrep.

Is this Boris Cherny's actual internal subagent?+

No. It is a community-written equivalent inspired by his public pre-push validation routine on howborisusesclaudecode.com. The behaviour matches his description but the prompt itself is open source.

What happens when the toolchain is unrecognised?+

The subagent escalates instead of guessing. It prints "Add a validate.sh script and re-run." Do that, and re-invoke. Guessing a build command is exactly how CI logs get corrupted on unfamiliar repositories.

Citations (5)
🙏

Source & Thanks

Inspired by Boris Cherny's pre-push validation routine on howborisusesclaudecode.com.

Citations:

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.