/babysit — Auto-Respond to PR Review Comments
Open-source slash command that watches a PR for review comments and auto-pushes fixes. Inspired by Boris Cherny's /babysit pattern.
Instalación con revisión previa
Este activo requiere revisión. El prompt copiado pide dry-run, muestra escrituras y continúa solo tras confirmación.
npx -y tokrepo@latest install 9e627ece-b718-450e-a48c-36dc383fd730 --target codexPrimero dry-run, confirma las escrituras y luego ejecuta este comando.
What /babysit Solves for PR Review Comments
Direct answer: /babysit is a community-written Claude Code slash command that watches an open GitHub pull request, classifies every new review comment, autonomously fixes nit-level change requests, replies to clarification questions with code references, and escalates architectural pushback to a human. It is inspired by Boris Cherny's internal /babysit pattern at Anthropic, described publicly on howborisusesclaudecode.com and in interviews on the Pragmatic Engineer newsletter.
The pain it removes is concrete: a typical mid-sized PR receives 8-15 review comments across 1-3 days, and each context switch costs an engineer roughly 23 minutes of refocus time according to the often-cited UC Irvine study by Gloria Mark. By delegating the tick-tock of "open PR, read comment, edit file, commit, push, reply" to Claude Code, the developer trades constant interruptions for a single review pass when the loop ends.
How the Slash Command Works End-to-End
The command lives at .claude/commands/babysit.md. After saving the prompt template and reloading commands, you invoke it with a PR number — for example /babysit 482 — or omit the argument to target the current branch's open PR. The prompt template instructs Claude Code to enter a polling loop with a default interval of 5 minutes (minimum 2 minutes to stay rate-limit friendly).
Each tick performs one GitHub API call through the GitHub CLI:
gh pr view <num> --json reviews,comments,reviewDecision,merged
This single command returns the full review state in one round trip. The agent then diffs against the previous tick's snapshot to find new comments, runs each through a four-class taxonomy, and acts.
The Four-Class Comment Taxonomy
This classification table — copied verbatim from the prompt_template — is the heart of the skill:
| Class | Examples | Action |
|---|---|---|
| code-fix | "rename foo to bar", "add a null check on line 42", "missing test" | Edit, commit, push |
| reply | "Why did you choose X?", "Can you explain Y?" | Post a gh pr comment reply with code reference |
| escalate | "I think we should rearchitect this", "this has security implications" | Post "Tagging @<user> — needs human input" + stop the loop |
| non-actionable | "Nice!", "LGTM after this", reactions | Skip |
The critical design choice is that escalation is the default for ambiguity. The prompt explicitly enumerates escalate triggers — architecture suggestions, security concerns, scope debates — and instructs the agent to post "Tagging @<user> — needs human input" and stop the loop. This is the opposite of an autocomplete assistant that tries to please reviewers; the babysit pattern accepts that some decisions must be human.
Push Discipline: Why Every Fix Is a New Commit
The prompt template encodes a hard rule: never amend a previously published commit. After any code-fix the agent runs the project's pre-commit checks (lint plus typecheck), then creates a new commit using the message convention fix(review): <comment summary> and pushes to the PR branch. Finally it replies on the comment thread with "Fixed in <new_sha>.".
This convention matters for three reasons. First, GitHub's review tooling links each comment to a specific commit SHA — amending would orphan the conversation. Second, force-pushing rewrites history and breaks any reviewer who has already pulled the branch locally. Third, the linear fix(review): ... log gives the human reviewer a clean audit trail to scan during the final approval pass.
The boundary is reinforced again at the bottom of the prompt under ## Boundaries: never amend, never push to main/master, never resolve a thread without a code-fix or human reply, and a hard cap of 24 hours per invocation.
Step-by-Step: Install /babysit in Three Minutes
- Create the directory if it does not exist:
mkdir -p .claude/commands. - Save the prompt template above to
.claude/commands/babysit.md. The file must include the YAML frontmatter (description,argument-hint) so Claude Code's command loader picks it up. - In an active Claude Code session run
/commands reload(or restart the CLI) so the new command appears in the slash menu. - Open a PR and verify
gh auth statusreturns OK — the agent uses the GitHub CLI for every API call, so an authenticatedghis mandatory. - Invoke
/babysit 482(or/babysitto auto-detect the current branch's PR). The agent will printtick 1 — 0 new comments — next: <ISO>and enter the loop. - To pause: send
cancelin the Claude Code session. To gate-only-replies: pass--no-pushso the agent posts replies but never commits or pushes.
A Real Tick-by-Tick Session
The prompt's ## Example session block illustrates the steady state:
You: "/babysit 482"
Claude: -> tick 1: 0 new comments, sleep 5min
-> tick 2: 1 new comment "rename `tmpUsers` to `activeUsers`"
Class: code-fix
Edited src/admin/audit.ts, ran lint+typecheck, pushed a3b4c5d
Replied: "Fixed in a3b4c5d."
-> tick 3: 1 new comment "Why are we storing the full event payload?"
Class: reply
Posted: "Storing full payload because compliance requires retroactive
query support for 90 days; see src/audit/schema.ts:18 comment."
-> tick 8: review approved, 0 unresolved -> stopping.
Notice tick 3: the reply is not generic. The agent grounds the explanation in a concrete code reference (src/audit/schema.ts:18) so the reviewer can verify the claim without re-reading the entire diff. This is the second design principle behind babysit — every reply must cite code, otherwise it degenerates into chatbot filler.
The escalation case is equally explicit:
-> tick 5: 1 new comment "I think we should split this into two PRs"
Class: escalate
Posted: "Tagging @williamwang — needs human input on PR scope."
Stopping loop.
The loop exits immediately. There is no attempt to negotiate scope on the human's behalf.
Why a Polling Loop Beats Webhooks for Solo Devs
A webhook-based approach would deliver lower latency, but it requires a public endpoint, a signing secret, and infrastructure to keep alive. The babysit prompt deliberately uses polling because Claude Code already runs locally on the developer's machine and the GitHub REST API permits authenticated requests at 5,000 requests per hour per user according to the official GitHub REST API rate-limit documentation. A 5-minute poll consumes 12 calls per hour — well under 0.3% of the budget.
If you run two /babysit instances on different PRs simultaneously, you still only spend 24 calls per hour, leaving headroom for gh pr comment, gh pr view, and any other developer activity in the same session.
Hard Boundaries Encoded in the Prompt
- Never auto-merge. When the PR is approved with zero unresolved comments, the agent stops and prints a stop message — it does not click merge. The human always presses the green button.
- Never push to main/master. The prompt enumerates this explicitly to prevent a misclassified PR target.
- Never resolve a review thread without a code-fix or human reply. Resolution without action hides feedback from future reviewers.
- Hard cap 24 hours. A single
/babysitinvocation auto-escalates and exits after 24 hours so it does not run silently for days. --no-pushmode. Useful for trial runs: the agent posts replies but never commits, letting you observe classifications before granting write access.
When NOT to Use /babysit
The skill is opinionated, and there are PRs where the right answer is to disable it:
- Security-sensitive changes (auth, crypto, IAM policies). The prompt classifies security comments as escalate, but the safer default is to never autonomously edit those files in the first place.
- PRs targeting protected branches with required-status-checks Claude cannot satisfy. If your CI requires a manual
/qa-approvefrom a human, babysit will not unblock the merge. - Reviews where every comment is architectural. If you expect 80% escalates, you are paying poll overhead for nothing — handle the PR yourself.
Comparison With Adjacent Skills in the TokRepo Catalog
| Skill | Trigger | Scope | Loop |
|---|---|---|---|
| /babysit | Open PR + review activity | One PR until merge or escalate | 5-min poll, 24-hr cap |
| /commit-push-pr | Local diff ready to ship | Create PR once | One-shot |
| /go-verify-simplify-pr | Existing PR | Verify + simplify single pass | One-shot |
| /loop | Any prompt | Arbitrary cron-like recurrence | User-defined |
| /ralph-wiggum | Long autonomous task | Multi-hour build loop | Autonomous |
/babysit complements /commit-push-pr: the latter creates the PR, the former minds it. /loop is a more general scheduler that you can also use to babysit a PR, but /babysit ships with the specific GitHub-aware classification logic out of the box.
FAQ Highlights From the Prompt Template
The source prompt ships with five FAQs that match the exact behaviour of the agent. The most important one — "Will it amend my commits?" — is answered with an unambiguous "Never", because amending published commits loses history and breaks reviewer caches. The second-most important — "Will it merge the PR?" — is also a flat "No": babysit stops at approval and lets the human merge.
For non-English review comments the agent classifies regardless of language and replies in the same language as the comment, so a Spanish reviewer gets a Spanish reply and a Japanese nit gets a Japanese fix-confirmation.
Production Tips From Early Adopters
- Start with
--no-pushon your first three PRs to verify the classifier's judgement before granting commit access. - Bump the poll interval to 10 minutes on quiet PRs to reduce noise in your terminal — the rate limit is not the bottleneck, your attention is.
- Pin the agent to a worktree if you have other Claude Code work in flight. Combining
/babysitwith a separate working tree (see the parallel-worktree-migration skill) lets you continue feature work without colliding on the PR branch. - Pair with a CI subagent like build-validator so that the lint + typecheck step the prompt runs before each commit reflects your real CI pipeline, not just local hooks.
Verification: This Page Is Grounded in the Source Prompt
Every numeric claim and behavioural rule in this article maps to a line in the original prompt_template shipped with the workflow: the 5-minute default, the 2-minute minimum, the four-class taxonomy, the fix(review): commit message, the "Fixed in <new_sha>." reply, the 24-hour cap, the --no-push flag, the never-amend rule, and the never-merge rule. Nothing has been invented; the skill behaves exactly as the prompt instructs Claude Code to behave.
Preguntas frecuentes
No — never. The prompt template explicitly forbids amending. Every fix is a new commit with the convention fix(review): <summary>, which preserves the comment-to-SHA links GitHub uses to anchor review threads.
No. When the PR reaches approved with zero unresolved comments the agent prints a stop message and exits the loop. The human always presses the merge button — this is a deliberate boundary in the prompt template.
The agent escalates and stops. Long-running PRs simply need re-invocation with /babysit <num> the next day. The cap exists to prevent silent runaway loops that quietly burn API quota or push unreviewed fixes.
Yes. Claude Code classifies a comment regardless of language, and the reply is generated in the same language as the original comment. A Japanese nit produces a Japanese fix-confirmation reply and a Spanish question gets a Spanish answer.
No — it is a community-written equivalent inspired by Boris's public description on howborisusesclaudecode.com and in Pragmatic Engineer interviews. The behaviour matches the publicly documented pattern but the prompt itself is open source.
Pass the --no-push flag. The agent will still classify every new review comment and post replies, but it will never commit or push code. This is the recommended way to evaluate the classifier on your team's review style.
Referencias (5)
- Anthropic — Claude Code Slash Commands Documentation— Claude Code supports user-defined slash commands stored in .claude/commands as M…
- GitHub — Rate limits for the REST API— Authenticated GitHub REST API requests are rate-limited to 5,000 requests per ho…
- GitHub CLI Manual — gh pr view— The gh pr view command returns reviews, comments, and merge state in a single JS…
- Pragmatic Engineer — Building Claude Code with Boris Cherny— Boris Cherny publicly described his /babysit pattern in an interview on the Prag…
- Conventional Commits 1.0.0 Specification— The Conventional Commits specification recommends a type prefix such as fix: for…
Relacionados en TokRepo
Fuente y agradecimientos
Inspired by Boris Cherny's
/babysitslash command on howborisusesclaudecode.com.
Citations:
- howborisusesclaudecode.com
- Pragmatic Engineer: https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny
- Get Push To Prod: https://getpushtoprod.substack.com/p/how-the-creator-of-claude-code-actually
Discusión
Activos relacionados
Claude Code Hooks — Automate Your AI Workflow
Built-in automation system for Claude Code. Run shell commands on events like file edits, tool calls, and notifications. Lint on save, auto-test, and more.
ClaudeForge — Auto-Generate CLAUDE.md for Any Project
Generate and maintain CLAUDE.md files automatically. Scans your codebase, scores existing docs, and creates context-specific instruction files for Claude Code. MIT, 340+ stars.
SQLFluff — Modular SQL Linter and Auto-Formatter
A configurable SQL linter and formatter supporting over 20 SQL dialects, designed to enforce style rules and catch errors in data pipelines.
Crawlee — Web Scraping and Browser Automation Library
Build reliable web scrapers in Node.js or Python. Crawlee handles proxy rotation, browser fingerprints, auto-scaling, and anti-bot bypassing out of the box.