Promptfoo — Test & Red-Team LLM Apps
Promptfoo is a CLI for evaluating prompts, comparing models, and red-teaming AI apps. 18.9K+ GitHub stars. Side-by-side comparison, vulnerability scanning, CI/CD. MIT.
What it is
Promptfoo is a CLI tool for evaluating prompts, comparing LLM model outputs side-by-side, and red-teaming AI applications for security vulnerabilities. It runs test suites against your prompts with configurable assertions, generates comparison tables across models, and scans for prompt injection, jailbreaks, and other vulnerabilities.
Promptfoo is designed for AI engineers and teams building LLM-powered applications who need systematic testing and security evaluation before production deployment.
How it saves time or tokens
Manually testing prompts across models is tedious and inconsistent. Promptfoo automates the process: define test cases once, run them across multiple models, and compare results in a structured view. The red-teaming feature automatically generates adversarial inputs to find vulnerabilities you would not think to test manually. CI/CD integration means prompt quality is validated on every code change.
How to use
- Install Promptfoo:
npm install -g promptfoo
- Initialize an evaluation config:
promptfoo init
- Run evaluation and view results:
promptfoo eval
promptfoo view
- Red-team scan for vulnerabilities:
promptfoo redteam run
Example
A promptfoo configuration for comparing models:
# promptfooconfig.yaml
prompts:
- 'Summarize this text in 2 sentences: {{text}}'
providers:
- openai:gpt-4
- anthropic:claude-sonnet-4-20250514
- ollama:llama3
tests:
- vars:
text: 'The quick brown fox jumps over the lazy dog. The fox was very quick.'
assert:
- type: contains
value: 'fox'
- type: llm-rubric
value: 'The summary should be exactly 2 sentences'
- type: max-tokens
value: 50
Run promptfoo eval to see a side-by-side comparison table of all three models with pass/fail assertions.
Related on TokRepo
- Testing tools — Browse AI testing and evaluation tools
- Security tools — Explore AI security tools
Common pitfalls
- Writing assertions that are too strict. LLM outputs are non-deterministic. Use
llm-rubricfor semantic evaluation instead of exact string matching. - Not running red-team scans before production. Prompt injection and jailbreak vulnerabilities are common in LLM applications. Run
promptfoo redteamto discover them before attackers do. - Testing only happy paths. Include edge cases, long inputs, multilingual text, and adversarial inputs in your test suite for comprehensive coverage.
- Starting with an overly complex configuration instead of defaults. Begin with the minimal setup, verify it works, then customize incrementally. This approach catches configuration errors early and keeps troubleshooting straightforward.
For teams evaluating this tool, the time saved on initial setup alone justifies the adoption. The well-documented API and active community mean most common questions have already been answered, reducing the learning curve and the number of tokens spent explaining basic usage to AI assistants.
Frequently Asked Questions
Red-teaming is the process of testing an AI application with adversarial inputs to find vulnerabilities like prompt injection, jailbreaks, data leakage, and harmful output generation. Promptfoo automates this by generating attack inputs and evaluating the application's responses.
Promptfoo supports OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, and any OpenAI-compatible API. You configure providers in the YAML config file.
Yes. Promptfoo is designed for CI/CD integration. Run promptfoo eval in your pipeline and configure it to fail the build if assertions do not pass. This ensures prompt quality is validated on every code change.
Run promptfoo view after an evaluation to open a web UI showing a side-by-side comparison table. Each row is a test case, each column is a model, and cells show the output with pass/fail indicators for assertions.
Yes. Promptfoo is open source under the MIT license. The CLI, evaluation engine, and red-teaming tools are all free. There is an optional cloud service for team collaboration.
Citations (3)
- Promptfoo GitHub— Promptfoo is a CLI for LLM evaluation and red-teaming
- Promptfoo Documentation— Prompt evaluation and model comparison
- Promptfoo Red Team— LLM red-teaming and vulnerability scanning
Related on TokRepo
Source & Thanks
Created by Promptfoo. Licensed under MIT. promptfoo/promptfoo — 18,900+ GitHub stars
Discussion
Related Assets
Kornia — Differentiable Computer Vision Library for PyTorch
Kornia is a differentiable computer vision library built on PyTorch that provides GPU-accelerated implementations of classical vision algorithms including geometric transforms, color conversions, filtering, feature detection, and augmentations, all with full autograd support for end-to-end learning.
AlphaFold — AI-Powered 3D Protein Structure Prediction
AlphaFold by Google DeepMind predicts three-dimensional protein structures from amino acid sequences with atomic-level accuracy, enabling breakthroughs in drug discovery, enzyme engineering, and structural biology research.
Flash Attention — Fast Memory-Efficient Exact Attention for Transformers
Flash Attention is a CUDA kernel library that computes exact scaled dot-product attention 2-4x faster and with up to 20x less memory than standard implementations by using IO-aware tiling to minimize GPU memory reads and writes.