Esta página se muestra en inglés. Una traducción al español está en curso.

PromptsMar 31, 2026·2 min de lectura

Promptfoo — Test & Red-Team LLM Apps

Promptfoo is a CLI for evaluating prompts, comparing models, and red-teaming AI apps. 18.9K+ GitHub stars. Side-by-side comparison, vulnerability scanning, CI/CD. MIT.

Script Depot · Community

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 96/100Política: permitir

Superficie agent

Cualquier agent MCP/CLI

Tipo

Prompt

Instalación

Single

Confianza

Confianza: Established

Entrada

Promptfoo — Test & Red-Team LLM Apps

Comando de instalación directa

npx -y tokrepo@latest install 42c43368-a482-4fad-b23d-d80e0530377b --target codex

Ejecutar después de confirmar el plan con dry-run.

TL;DR

Promptfoo evaluates prompts, compares models side-by-side, and red-teams AI applications for vulnerabilities via CLI.

§01

What it is

Promptfoo is a CLI tool for evaluating prompts, comparing LLM model outputs side-by-side, and red-teaming AI applications for security vulnerabilities. It runs test suites against your prompts with configurable assertions, generates comparison tables across models, and scans for prompt injection, jailbreaks, and other vulnerabilities.

Promptfoo is designed for AI engineers and teams building LLM-powered applications who need systematic testing and security evaluation before production deployment.

§02

How it saves time or tokens

Manually testing prompts across models is tedious and inconsistent. Promptfoo automates the process: define test cases once, run them across multiple models, and compare results in a structured view. The red-teaming feature automatically generates adversarial inputs to find vulnerabilities you would not think to test manually. CI/CD integration means prompt quality is validated on every code change.

§03

How to use

Install Promptfoo:

npm install -g promptfoo

Initialize an evaluation config:

promptfoo init

Run evaluation and view results:

promptfoo eval
promptfoo view

Red-team scan for vulnerabilities:

promptfoo redteam run

§04

Example

A promptfoo configuration for comparing models:

# promptfooconfig.yaml
prompts:
  - 'Summarize this text in 2 sentences: {{text}}'

providers:
  - openai:gpt-4
  - anthropic:claude-sonnet-4-20250514
  - ollama:llama3

tests:
  - vars:
      text: 'The quick brown fox jumps over the lazy dog. The fox was very quick.'
    assert:
      - type: contains
        value: 'fox'
      - type: llm-rubric
        value: 'The summary should be exactly 2 sentences'
      - type: max-tokens
        value: 50

Run promptfoo eval to see a side-by-side comparison table of all three models with pass/fail assertions.

§05

Related on TokRepo

Testing tools — Browse AI testing and evaluation tools
Security tools — Explore AI security tools

§06

Common pitfalls

Writing assertions that are too strict. LLM outputs are non-deterministic. Use llm-rubric for semantic evaluation instead of exact string matching.
Not running red-team scans before production. Prompt injection and jailbreak vulnerabilities are common in LLM applications. Run promptfoo redteam to discover them before attackers do.
Testing only happy paths. Include edge cases, long inputs, multilingual text, and adversarial inputs in your test suite for comprehensive coverage.
Starting with an overly complex configuration instead of defaults. Begin with the minimal setup, verify it works, then customize incrementally. This approach catches configuration errors early and keeps troubleshooting straightforward.

For teams evaluating this tool, the time saved on initial setup alone justifies the adoption. The well-documented API and active community mean most common questions have already been answered, reducing the learning curve and the number of tokens spent explaining basic usage to AI assistants.

Preguntas frecuentes

What is red-teaming in the context of LLM apps?+

Red-teaming is the process of testing an AI application with adversarial inputs to find vulnerabilities like prompt injection, jailbreaks, data leakage, and harmful output generation. Promptfoo automates this by generating attack inputs and evaluating the application's responses.

Which LLM providers does Promptfoo support?+

Promptfoo supports OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, and any OpenAI-compatible API. You configure providers in the YAML config file.

Can I run Promptfoo in CI/CD?+

Yes. Promptfoo is designed for CI/CD integration. Run promptfoo eval in your pipeline and configure it to fail the build if assertions do not pass. This ensures prompt quality is validated on every code change.

How does the comparison view work?+

Run promptfoo view after an evaluation to open a web UI showing a side-by-side comparison table. Each row is a test case, each column is a model, and cells show the output with pass/fail indicators for assertions.

Is Promptfoo free?+

Yes. Promptfoo is open source under the MIT license. The CLI, evaluation engine, and red-teaming tools are all free. There is an optional cloud service for team collaboration.

Referencias (3)

Promptfoo GitHub— Promptfoo is a CLI for LLM evaluation and red-teaming
Promptfoo Documentation— Prompt evaluation and model comparison
Promptfoo Red Team— LLM red-teaming and vulnerability scanning

Relacionados en TokRepo

Testing tools Security tools Coding tools

🙏

Fuente y agradecimientos

Created by Promptfoo. Licensed under MIT. promptfoo/promptfoo — 18,900+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

PromptFlow — Build and Test LLM Apps

PromptFlow is a CLI + framework for building and testing LLM flows. Install `promptflow` + `promptflow-tools`, then run `pf flow init` and `pf flow test`.

PromptsCLI Tools

Agent Toolkit

Promptfoo — LLM Eval & Red-Team Testing Framework

Open-source framework for evaluating and red-teaming LLM applications. Test prompts across models, detect jailbreaks, measure quality, and catch regressions. 5,000+ GitHub stars.

Prompts

Agent Toolkit

Prompt Flow — Build, Test & Deploy LLM Pipelines

Prompt Flow by Microsoft provides a visual editor and CLI for building LLM application workflows with built-in evaluation, tracing, and CI/CD integration for production deployment.

Prompts

AI Open Source

promptfoo-action — Run Prompt Evals in GitHub CI

Add promptfoo-action to GitHub Actions to run prompt/agent evals on PRs or pushes, cache results, and comment a before/after report for safer iteration.

PromptsCLI Tools

Script Depot