Main
Use it to evolve SKILL.md overnight: the README describes a loop that scores assertions, edits, and commits only when score improves.
Treat
eval.jsonas your benchmark: make assertions binary (true/false) so the loop can optimize reliably.Use
--dry-runto score without mutating git history, then enable commits once behavior is trusted.Keep the target skill inside a git repo; README notes rollback relies on git reset/commit behavior.
Source-backed notes
- README frames the loop as a Claude Code-native adaptation of Karpathy autoresearch, using binary assertions for scoring.
- README provides quick start install steps that link Claude Code agents/commands and a
/autoimprovecommand interface.
FAQ
- Will it rewrite my history?: It can commit/reset; start with
--dry-runand run in a branch if you’re unsure. - Do I need Python scripts?: README says it runs with Claude Code agents + commands, no external Python runtime required.
- What’s the metric?: Binary assertion pass rate in eval.json; keep assertions precise and checkable.