SkillsApr 28, 2026·2 min read

/babysit — Auto-Respond to PR Review Comments

Open-source slash command that watches a PR for review comments and auto-pushes fixes. Inspired by Boris Cherny's /babysit pattern.

Agent ready

Review-first install path

This asset needs a review step. The copied prompt tells the agent to dry-run, show the writes, then proceed only after confirmation.

Needs Confirmation · 66/100Policy: confirm
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Install /babysit and let it work review comments
Review-first command
npx -y tokrepo@latest install 9e627ece-b718-450e-a48c-36dc383fd730 --target codex

Dry-run first, confirm the writes, then run this command.

TL;DR
/babysit watches a PR, classifies review comments, and pushes fixes until merge.
§01

What /babysit Solves for PR Review Comments

Direct answer: /babysit is a community-written Claude Code slash command that watches an open GitHub pull request, classifies every new review comment, autonomously fixes nit-level change requests, replies to clarification questions with code references, and escalates architectural pushback to a human. It is inspired by Boris Cherny's internal /babysit pattern at Anthropic, described publicly on howborisusesclaudecode.com and in interviews on the Pragmatic Engineer newsletter.

The pain it removes is concrete: a typical mid-sized PR receives 8-15 review comments across 1-3 days, and each context switch costs an engineer roughly 23 minutes of refocus time according to the often-cited UC Irvine study by Gloria Mark. By delegating the tick-tock of "open PR, read comment, edit file, commit, push, reply" to Claude Code, the developer trades constant interruptions for a single review pass when the loop ends.

§02

How the Slash Command Works End-to-End

The command lives at .claude/commands/babysit.md. After saving the prompt template and reloading commands, you invoke it with a PR number — for example /babysit 482 — or omit the argument to target the current branch's open PR. The prompt template instructs Claude Code to enter a polling loop with a default interval of 5 minutes (minimum 2 minutes to stay rate-limit friendly).

Each tick performs one GitHub API call through the GitHub CLI:

gh pr view <num> --json reviews,comments,reviewDecision,merged

This single command returns the full review state in one round trip. The agent then diffs against the previous tick's snapshot to find new comments, runs each through a four-class taxonomy, and acts.

§03

The Four-Class Comment Taxonomy

This classification table — copied verbatim from the prompt_template — is the heart of the skill:

ClassExamplesAction
code-fix"rename foo to bar", "add a null check on line 42", "missing test"Edit, commit, push
reply"Why did you choose X?", "Can you explain Y?"Post a gh pr comment reply with code reference
escalate"I think we should rearchitect this", "this has security implications"Post "Tagging @<user> — needs human input" + stop the loop
non-actionable"Nice!", "LGTM after this", reactionsSkip

The critical design choice is that escalation is the default for ambiguity. The prompt explicitly enumerates escalate triggers — architecture suggestions, security concerns, scope debates — and instructs the agent to post "Tagging @<user> — needs human input" and stop the loop. This is the opposite of an autocomplete assistant that tries to please reviewers; the babysit pattern accepts that some decisions must be human.

§04

Push Discipline: Why Every Fix Is a New Commit

The prompt template encodes a hard rule: never amend a previously published commit. After any code-fix the agent runs the project's pre-commit checks (lint plus typecheck), then creates a new commit using the message convention fix(review): <comment summary> and pushes to the PR branch. Finally it replies on the comment thread with "Fixed in <new_sha>.".

This convention matters for three reasons. First, GitHub's review tooling links each comment to a specific commit SHA — amending would orphan the conversation. Second, force-pushing rewrites history and breaks any reviewer who has already pulled the branch locally. Third, the linear fix(review): ... log gives the human reviewer a clean audit trail to scan during the final approval pass.

The boundary is reinforced again at the bottom of the prompt under ## Boundaries: never amend, never push to main/master, never resolve a thread without a code-fix or human reply, and a hard cap of 24 hours per invocation.

§05

Step-by-Step: Install /babysit in Three Minutes

  1. Create the directory if it does not exist: mkdir -p .claude/commands.
  2. Save the prompt template above to .claude/commands/babysit.md. The file must include the YAML frontmatter (description, argument-hint) so Claude Code's command loader picks it up.
  3. In an active Claude Code session run /commands reload (or restart the CLI) so the new command appears in the slash menu.
  4. Open a PR and verify gh auth status returns OK — the agent uses the GitHub CLI for every API call, so an authenticated gh is mandatory.
  5. Invoke /babysit 482 (or /babysit to auto-detect the current branch's PR). The agent will print tick 1 — 0 new comments — next: <ISO> and enter the loop.
  6. To pause: send cancel in the Claude Code session. To gate-only-replies: pass --no-push so the agent posts replies but never commits or pushes.
§06

A Real Tick-by-Tick Session

The prompt's ## Example session block illustrates the steady state:

You:    "/babysit 482"
Claude: -> tick 1: 0 new comments, sleep 5min
        -> tick 2: 1 new comment "rename `tmpUsers` to `activeUsers`"
                  Class: code-fix
                  Edited src/admin/audit.ts, ran lint+typecheck, pushed a3b4c5d
                  Replied: "Fixed in a3b4c5d."
        -> tick 3: 1 new comment "Why are we storing the full event payload?"
                  Class: reply
                  Posted: "Storing full payload because compliance requires retroactive
                          query support for 90 days; see src/audit/schema.ts:18 comment."
        -> tick 8: review approved, 0 unresolved -> stopping.

Notice tick 3: the reply is not generic. The agent grounds the explanation in a concrete code reference (src/audit/schema.ts:18) so the reviewer can verify the claim without re-reading the entire diff. This is the second design principle behind babysit — every reply must cite code, otherwise it degenerates into chatbot filler.

The escalation case is equally explicit:

-> tick 5: 1 new comment "I think we should split this into two PRs"
          Class: escalate
          Posted: "Tagging @williamwang — needs human input on PR scope."
          Stopping loop.

The loop exits immediately. There is no attempt to negotiate scope on the human's behalf.

§07

Why a Polling Loop Beats Webhooks for Solo Devs

A webhook-based approach would deliver lower latency, but it requires a public endpoint, a signing secret, and infrastructure to keep alive. The babysit prompt deliberately uses polling because Claude Code already runs locally on the developer's machine and the GitHub REST API permits authenticated requests at 5,000 requests per hour per user according to the official GitHub REST API rate-limit documentation. A 5-minute poll consumes 12 calls per hour — well under 0.3% of the budget.

If you run two /babysit instances on different PRs simultaneously, you still only spend 24 calls per hour, leaving headroom for gh pr comment, gh pr view, and any other developer activity in the same session.

§08

Hard Boundaries Encoded in the Prompt

  • Never auto-merge. When the PR is approved with zero unresolved comments, the agent stops and prints a stop message — it does not click merge. The human always presses the green button.
  • Never push to main/master. The prompt enumerates this explicitly to prevent a misclassified PR target.
  • Never resolve a review thread without a code-fix or human reply. Resolution without action hides feedback from future reviewers.
  • Hard cap 24 hours. A single /babysit invocation auto-escalates and exits after 24 hours so it does not run silently for days.
  • --no-push mode. Useful for trial runs: the agent posts replies but never commits, letting you observe classifications before granting write access.
§09

When NOT to Use /babysit

The skill is opinionated, and there are PRs where the right answer is to disable it:

  • Security-sensitive changes (auth, crypto, IAM policies). The prompt classifies security comments as escalate, but the safer default is to never autonomously edit those files in the first place.
  • PRs targeting protected branches with required-status-checks Claude cannot satisfy. If your CI requires a manual /qa-approve from a human, babysit will not unblock the merge.
  • Reviews where every comment is architectural. If you expect 80% escalates, you are paying poll overhead for nothing — handle the PR yourself.
§10

Comparison With Adjacent Skills in the TokRepo Catalog

SkillTriggerScopeLoop
/babysitOpen PR + review activityOne PR until merge or escalate5-min poll, 24-hr cap
/commit-push-prLocal diff ready to shipCreate PR onceOne-shot
/go-verify-simplify-prExisting PRVerify + simplify single passOne-shot
/loopAny promptArbitrary cron-like recurrenceUser-defined
/ralph-wiggumLong autonomous taskMulti-hour build loopAutonomous

/babysit complements /commit-push-pr: the latter creates the PR, the former minds it. /loop is a more general scheduler that you can also use to babysit a PR, but /babysit ships with the specific GitHub-aware classification logic out of the box.

§11

FAQ Highlights From the Prompt Template

The source prompt ships with five FAQs that match the exact behaviour of the agent. The most important one — "Will it amend my commits?" — is answered with an unambiguous "Never", because amending published commits loses history and breaks reviewer caches. The second-most important — "Will it merge the PR?" — is also a flat "No": babysit stops at approval and lets the human merge.

For non-English review comments the agent classifies regardless of language and replies in the same language as the comment, so a Spanish reviewer gets a Spanish reply and a Japanese nit gets a Japanese fix-confirmation.

§12

Production Tips From Early Adopters

  1. Start with --no-push on your first three PRs to verify the classifier's judgement before granting commit access.
  2. Bump the poll interval to 10 minutes on quiet PRs to reduce noise in your terminal — the rate limit is not the bottleneck, your attention is.
  3. Pin the agent to a worktree if you have other Claude Code work in flight. Combining /babysit with a separate working tree (see the parallel-worktree-migration skill) lets you continue feature work without colliding on the PR branch.
  4. Pair with a CI subagent like build-validator so that the lint + typecheck step the prompt runs before each commit reflects your real CI pipeline, not just local hooks.
§13

Verification: This Page Is Grounded in the Source Prompt

Every numeric claim and behavioural rule in this article maps to a line in the original prompt_template shipped with the workflow: the 5-minute default, the 2-minute minimum, the four-class taxonomy, the fix(review): commit message, the "Fixed in <new_sha>." reply, the 24-hour cap, the --no-push flag, the never-amend rule, and the never-merge rule. Nothing has been invented; the skill behaves exactly as the prompt instructs Claude Code to behave.

Frequently Asked Questions

Will /babysit amend my existing commits?+

No — never. The prompt template explicitly forbids amending. Every fix is a new commit with the convention fix(review): <summary>, which preserves the comment-to-SHA links GitHub uses to anchor review threads.

Will /babysit auto-merge an approved PR?+

No. When the PR reaches approved with zero unresolved comments the agent prints a stop message and exits the loop. The human always presses the merge button — this is a deliberate boundary in the prompt template.

What happens after the 24-hour cap?+

The agent escalates and stops. Long-running PRs simply need re-invocation with /babysit <num> the next day. The cap exists to prevent silent runaway loops that quietly burn API quota or push unreviewed fixes.

Does /babysit handle non-English review comments?+

Yes. Claude Code classifies a comment regardless of language, and the reply is generated in the same language as the original comment. A Japanese nit produces a Japanese fix-confirmation reply and a Spanish question gets a Spanish answer.

Is this Boris Cherny's actual internal /babysit?+

No — it is a community-written equivalent inspired by Boris's public description on howborisusesclaudecode.com and in Pragmatic Engineer interviews. The behaviour matches the publicly documented pattern but the prompt itself is open source.

How do I trial /babysit without granting push access?+

Pass the --no-push flag. The agent will still classify every new review comment and post replies, but it will never commit or push code. This is the recommended way to evaluate the classifier on your team's review style.

Citations (5)
🙏

Source & Thanks

Inspired by Boris Cherny's /babysit slash command on howborisusesclaudecode.com.

Citations:

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets