Practical Notes
- Setup time ~20 minutes (install +
margin check+ one dry-run) - Two measurable checks:
margin --versionworks, and a run bundle is produced under your output folder - GitHub stars + forks (verified): see Source & Thanks
Margin Eval is strongest when you standardize “what counts as success” for tool-using agent runs:
- Use a shared suite repo for scenarios and fixtures.
- Keep agent configs in version control (so changes are reviewed).
- Compare agents side-by-side using the same suites and eval configs.
If you run multiple providers, treat auth as part of the harness: keep keys out of logs, and make sure dry-run is part of every developer’s setup.
FAQ
Q: Why evaluate locally instead of only in CI? A: Local evals shorten iteration loops. You can reproduce a failure immediately before pushing.
Q: Do I need Docker? A: The README lists Docker as a prerequisite for the quickstart.
Q: What should I store long-term? A: Store the run bundle/traces and a small summary so regressions can be audited later.