WorkflowsApr 2, 2026·2 min read

Agenta — Open-Source LLMOps Platform

Prompt playground, evaluation, and observability in one platform. Compare prompts, run evals, trace production calls. 4K+ stars.

TL;DR
Agenta combines prompt playground, evaluation, and observability in one open-source LLMOps platform.
§01

What it is

Agenta is an open-source LLMOps platform that combines prompt engineering, evaluation, and production observability in a single tool. It provides a visual playground for testing prompts across models, a framework for running automated evaluations, and tracing for monitoring production LLM calls. The platform works with OpenAI, Anthropic, and any OpenAI-compatible API.

Agenta targets AI engineering teams who need to iterate on prompts systematically, compare model performance with quantitative metrics, and monitor production LLM applications without stitching together separate tools for each concern.

§02

How it saves time or tokens

Agenta's side-by-side prompt comparison lets you test variations against the same inputs simultaneously. Instead of running prompts sequentially and manually comparing outputs, you see results side by side with latency and cost metrics. The evaluation framework automates quality checks, reducing the manual review burden.

Production tracing captures every LLM call with inputs, outputs, latency, and cost. When issues arise, you trace them to the specific prompt version and input that caused the problem, rather than guessing.

§03

How to use

  1. Deploy Agenta via Docker Compose: docker compose up from the repository. Access the web UI at localhost.
  2. Create an application in the playground. Write your prompt template, select a model, and test with sample inputs.
  3. Set up evaluations with test datasets. Define evaluation criteria (exact match, LLM-as-judge, custom metrics) and run batch evaluations to compare prompt versions quantitatively.
§04

Example

import agenta as ag

# Define a prompt variant
@ag.entrypoint
async def summarize(text: str) -> str:
    response = await ag.llm.chat.completions.create(
        model='gpt-4o',
        messages=[
            {'role': 'system', 'content': 'Summarize the following text concisely.'},
            {'role': 'user', 'content': text}
        ],
        temperature=ag.FloatParam(0.3, 0, 1)
    )
    return response.choices[0].message.content

The ag.FloatParam makes temperature adjustable from the UI without code changes. Each variant is tracked with version history.

§05

Related on TokRepo

§06

Common pitfalls

  • Self-hosted Agenta requires Docker and reasonable resources (4GB+ RAM). The managed cloud version avoids infrastructure management but has usage limits on the free tier.
  • Evaluation datasets need to be representative of production traffic. Running evals on toy examples gives misleading results about prompt quality.
  • Tracing adds minimal overhead but stores all inputs and outputs. For applications processing sensitive data, configure data retention policies and redaction rules.

Frequently Asked Questions

How does Agenta compare to LangSmith?+

Both provide prompt management and observability. Agenta is fully open source and self-hostable, while LangSmith is a commercial product from LangChain. Agenta is framework-agnostic (works without LangChain), while LangSmith has deeper integration with the LangChain ecosystem.

Can I use Agenta with local LLMs?+

Yes. Agenta works with any OpenAI-compatible API. You can connect it to Ollama, vLLM, or any local model server that exposes an OpenAI-compatible endpoint for prompt testing and evaluation.

What evaluation methods does Agenta support?+

Agenta supports exact match, regex match, LLM-as-judge (using another LLM to score outputs), custom Python evaluators, and human evaluation workflows. You can combine multiple evaluators in a single evaluation run.

Does Agenta support prompt version control?+

Yes. Every prompt change is tracked as a version. You can compare any two versions side by side, see evaluation scores for each version, and roll back to a previous version in one click.

Can multiple team members collaborate in Agenta?+

Yes. Agenta supports team workspaces where multiple users can create and test prompt variants, run evaluations, and review production traces. Role-based access control is available in the managed cloud version.

Citations (3)
🙏

Source & Thanks

Created by Agenta AI. Licensed under Apache-2.0.

agenta — ⭐ 4,000+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.