ScriptsApr 9, 2026·2 min read

Latitude — AI Agent Engineering Platform

Open-source platform for building, evaluating, and monitoring AI agents in production. Observability, prompt playground, LLM-as-judge evals, experiment comparison. LGPL-3.0, 4,000+ stars.

TL;DR
Latitude provides observability, prompt playground, and LLM-as-judge evals for AI agents.
§01

What it is

Latitude is an open-source platform for building, evaluating, and monitoring AI agents in production. It provides observability dashboards, a prompt playground for iterating on prompts, LLM-as-judge evaluations for automated quality scoring, and experiment comparison for A/B testing different agent configurations.

Latitude targets engineering teams deploying AI agents who need visibility into agent behavior, quality metrics, and a systematic way to improve prompts.

§02

How it saves time or tokens

Latitude's prompt playground lets you test prompt variations side-by-side without deploying. You iterate faster because you see results immediately, avoiding the deploy-test-revert cycle.

The LLM-as-judge evaluation automates quality assessment. Instead of manually reviewing agent outputs, Latitude scores them against criteria you define, catching regressions early.

§03

How to use

  1. Deploy Latitude with Docker: follow the self-hosting guide in the repository
  2. Connect your LLM providers (OpenAI, Anthropic, etc.)
  3. Create prompts in the playground and test with different inputs
  4. Set up evaluations to automatically score agent outputs
§04

Example

// Latitude SDK: run a prompt and evaluate the result
import { Latitude } from '@latitude-data/sdk';

const latitude = new Latitude('your-api-key');

// Run a prompt
const result = await latitude.prompts.run('customer-support-agent', {
  parameters: {
    customer_query: 'How do I reset my password?',
    context: 'User has a Pro account created in 2024'
  }
});

// Log the result for evaluation
await latitude.logs.create({
  prompt: 'customer-support-agent',
  response: result.text,
  metadata: { category: 'account-management' }
});
§05

Related on TokRepo

§06

Common pitfalls

  • Self-hosting requires PostgreSQL and Redis; ensure these are provisioned before deploying Latitude
  • LLM-as-judge evaluations consume additional tokens; budget for evaluation costs alongside agent inference costs
  • Prompt versioning in Latitude is separate from Git; establish a workflow to keep both in sync

Frequently Asked Questions

How does Latitude compare to LangSmith?+

Both provide observability and evaluation for LLM applications. Latitude is open source (LGPL-3.0) and self-hosted. LangSmith is a managed service by LangChain. Latitude emphasizes prompt engineering workflows; LangSmith emphasizes trace-level debugging.

What does LLM-as-judge mean?+

LLM-as-judge uses a language model to evaluate the output of another model. You define evaluation criteria (accuracy, helpfulness, safety), and the judge LLM scores each response. This automates quality assessment at scale.

Can I use Latitude with any LLM provider?+

Yes. Latitude supports OpenAI, Anthropic, Google, and other providers. You configure API keys in the platform, and prompts can target any configured provider.

Does Latitude support team collaboration?+

Yes. Latitude provides multi-user access, prompt sharing, and experiment comparison. Team members can iterate on prompts, review evaluation results, and approve changes before deploying to production.

Is Latitude production-ready?+

Latitude is used in production environments. The LGPL-3.0 license allows commercial use. Self-hosting gives you control over data and uptime. The project has active development and community support.

Citations (3)
  • Latitude GitHub— Latitude is an open-source AI agent engineering platform with 4,000+ stars
  • arXiv— LLM-as-judge evaluation methodology
  • Latitude License— LGPL-3.0 open-source license
🙏

Source & Thanks

Created by Latitude. Licensed under LGPL-3.0.

latitude-llm — ⭐ 4,000+

Thanks to the Latitude team for making AI agent engineering more transparent and reliable.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets