Practical Notes
A minimal workflow is to run evals on PRs that touch prompts/** and store output.json as an artifact. Example snippet:
- uses: actions/checkout@v4
- uses: promptfoo/promptfoo-action@v1
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
config: promptfooconfig.yamlStart with a small test set, then expand coverage once the report format fits your review process.
Safety note: Treat eval configs like code: review provider keys, red-team prompts, and data files; avoid leaking secrets in logs.
FAQ
Q: Do I need to host anything? A: No. It runs in GitHub Actions and uses promptfoo under the hood.
Q: Can I gate merges on quality? A: Yes. Use thresholds/fail options so CI fails when success rate drops.
Q: How do I keep costs down? A: Cache results and limit concurrency; run evals only on prompt-related paths.