PromptsMay 11, 2026·2 min read

promptfoo-action — Run Prompt Evals in GitHub CI

Add promptfoo-action to GitHub Actions to run prompt/agent evals on PRs or pushes, cache results, and comment a before/after report for safer iteration.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 27/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Prompt
Install
Stage only
Trust
Trust: Established
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 1eecb87d-ec62-4982-828d-18dd9a031695 --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

Add promptfoo-action to GitHub Actions to run prompt/agent evals on PRs or pushes, cache results, and comment a before/after report for safer iteration.

  • Best for: teams shipping prompts/agents who want CI regressions checks and a human-reviewable report in PRs
  • Works with: GitHub Actions, promptfoo configs (YAML/JSON), and optional caching via actions/cache (per repo docs)
  • Setup time: 13 minutes

Quantitative Notes

  • GitHub stars + forks (verified): see Source & Thanks
  • Action writes results to output.json (repo docs)
  • Setup time ~13 minutes (workflow + one config file)

Practical Notes

A minimal workflow is to run evals on PRs that touch prompts/** and store output.json as an artifact. Example snippet:

- uses: actions/checkout@v4
- uses: promptfoo/promptfoo-action@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    config: promptfooconfig.yaml

Start with a small test set, then expand coverage once the report format fits your review process.

Safety note: Treat eval configs like code: review provider keys, red-team prompts, and data files; avoid leaking secrets in logs.

FAQ

Q: Do I need to host anything? A: No. It runs in GitHub Actions and uses promptfoo under the hood.

Q: Can I gate merges on quality? A: Yes. Use thresholds/fail options so CI fails when success rate drops.

Q: How do I keep costs down? A: Cache results and limit concurrency; run evals only on prompt-related paths.


🙏

Source & Thanks

GitHub: https://github.com/promptfoo/promptfoo-action Owner avatar: https://avatars.githubusercontent.com/u/137907881?v=4 License (SPDX): MIT GitHub stars (verified via api.github.com/repos/promptfoo/promptfoo-action): 65 GitHub forks (verified via api.github.com/repos/promptfoo/promptfoo-action): 31

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets