Latitude — Build, Evaluate, Monitor AI Agents
The Problem
Building AI agents is easy. Making them work reliably in production is hard. You need to see what prompts are sent, what responses come back, how tool calls behave, and whether quality is improving or degrading over time.
The Solution
Latitude gives you full visibility into your AI pipeline with tools to evaluate and improve agent performance.
Key Features
| Feature | Description |
|---|---|
| Observability | Capture prompts, I/O, tool calls, latency, costs |
| Prompt Playground | Iterate on prompts with instant feedback |
| Datasets | Curate test data for consistent evaluation |
| Evaluations | LLM-as-judge, custom metrics, automated grading |
| Experiments | Compare performance across models and providers |
| Annotations | Label and cluster issues in agent responses |
| Guards | Automated evaluation checks before responses ship |
Integration
import { Latitude } from "@latitude-data/sdk";
const latitude = new Latitude("your-api-key");
// Log a prompt-response pair
await latitude.log({
prompt: "Summarize this document...",
response: "The document discusses...",
model: "claude-sonnet-4-20250514",
duration_ms: 1200,
tokens: { input: 500, output: 150 }
});Evaluation Example
// Run LLM-as-judge evaluation
const result = await latitude.evaluate({
input: userQuery,
output: agentResponse,
criteria: [
"relevance",
"accuracy",
"helpfulness"
],
judge_model: "gpt-4o"
});FAQ
Q: What is Latitude? A: An open-source platform for building, evaluating, and monitoring AI agents in production. It provides observability, prompt management, LLM evaluations, and experiment comparison.
Q: Is Latitude free? A: The self-hosted version is free under LGPL-3.0. Latitude Cloud has a free tier for smaller projects.
Q: How is Latitude different from LangFuse? A: Latitude focuses on the full agent engineering lifecycle — from prompt iteration to evaluation to monitoring — with built-in LLM-as-judge capabilities and experiment comparison.