Is Promptfoo — LLM Eval & Red-Team Testing Framework free to use?

Yes. Promptfoo — LLM Eval & Red-Team Testing Framework is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Promptfoo — LLM Eval & Red-Team Testing Framework?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 6, 2026·2 min read

Promptfoo — LLM Eval & Red-Team Testing Framework

Open-source framework for evaluating and red-teaming LLM applications. Test prompts across models, detect jailbreaks, measure quality, and catch regressions. 5,000+ GitHub stars.

Agent Toolkit · Community

TL;DR

Open-source framework for testing prompts across models, detecting jailbreaks, and catching LLM quality regressions.

§01

What it is

Promptfoo is an open-source CLI and library for evaluating LLM outputs systematically. It runs your prompts against multiple models, scores the outputs using custom assertions (exact match, contains, LLM-graded, similarity), and surfaces regressions. The red-teaming module generates adversarial inputs to test jailbreak resistance and safety guardrails.

Promptfoo targets AI engineers, product teams, and security researchers who need repeatable LLM testing. It replaces manual prompt testing with automated evaluation pipelines that run in CI/CD.

§02

How it saves time or tokens

Manual prompt testing means running inputs one by one and eyeballing outputs. Promptfoo automates this with batch evaluation and structured scoring. You define test cases once and re-run them every time you change a prompt, model, or system configuration. The comparison view shows side-by-side results across models, making it obvious which performs better.

§03

How to use

Install Promptfoo via npm.
Create a configuration file with prompts, providers, and test cases.
Run the eval and open the results viewer.

npm install -g promptfoo

# Initialize a config
promptfoo init

# Run evaluation
promptfoo eval

# Open results in browser
promptfoo view

§04

Example

# promptfooconfig.yaml
prompts:
  - 'Summarize this article in 3 bullet points: {{article}}'

providers:
  - openai:gpt-4o
  - anthropic:claude-sonnet-4-20250514

tests:
  - vars:
      article: 'The Federal Reserve held interest rates steady...'
    assert:
      - type: contains
        value: 'interest rate'
      - type: llm-rubric
        value: 'Output should contain exactly 3 bullet points'
      - type: cost
        threshold: 0.01

§05

Related on TokRepo

AI tools for testing — Automated testing frameworks for AI applications
Prompt library — Reusable prompts to test with Promptfoo

§06

Common pitfalls

LLM-graded assertions (llm-rubric) consume additional tokens for the grading call. Use deterministic assertions (contains, regex) where possible and reserve LLM grading for subjective quality checks.
Running evaluations against many providers in parallel can hit rate limits. Configure concurrency limits in the config file.
Red-team tests may produce harmful content in outputs. Run red-team evaluations in isolated environments and restrict access to the results.

Frequently Asked Questions

Can Promptfoo test any LLM provider?+

Yes. Promptfoo supports OpenAI, Anthropic, Google, AWS Bedrock, Azure, Ollama, and any OpenAI-compatible endpoint. You can also define custom providers using scripts. This makes it easy to compare outputs across different models and providers.

How does the red-team feature work?+

Promptfoo's red-team module generates adversarial inputs designed to trigger jailbreaks, prompt injection, and safety bypass attempts. It runs these inputs against your application and scores whether the model maintained its safety guardrails. Results highlight specific vulnerabilities.

Can I run Promptfoo in CI/CD?+

Yes. Promptfoo has a CLI that exits with a non-zero code if any test assertion fails. You can run it as a step in GitHub Actions, GitLab CI, or any pipeline. The JSON output format enables integration with reporting tools.

How does Promptfoo compare to LangSmith for evaluation?+

Promptfoo is open source, runs locally, and focuses on batch evaluation with assertions. LangSmith is a SaaS platform with tracing, monitoring, and annotation features. Promptfoo is better for CI/CD-integrated testing; LangSmith is better for production observability and human annotation workflows.

Does Promptfoo support custom scoring functions?+

Yes. You can write custom assertion functions in JavaScript or Python that receive the LLM output and return a pass/fail result with a score. This lets you implement domain-specific quality checks beyond the built-in assertion types.

Citations (3)

Promptfoo GitHub— Promptfoo is an open-source LLM evaluation and red-teaming framework
Promptfoo Documentation— Promptfoo documentation for configuration and assertions
Red Teaming LLMs (arXiv)— LLM red-teaming techniques for safety evaluation

Related on TokRepo

AI testing tools Prompt library Security tools

🙏

Source & Thanks

Created by Promptfoo. Licensed under MIT.

promptfoo — ⭐ 5,000+

Thanks for bringing test-driven development to AI applications.

Discussion

No comments yet. Be the first to share your thoughts.

Promptfoo — LLM Eval & Red-Team Testing Framework

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

NAPI-RS — Build Node.js Native Addons in Rust

Mamba — Fast Cross-Platform Package Manager

Plasmo — The Browser Extension Framework