ConfigsApr 8, 2026·2 min read

Helicone — LLM Observability and Prompt Management

Open-source LLM observability platform. One-line proxy integration for request logging, cost tracking, caching, rate limiting, and prompt versioning across all providers.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Just change the base URL — no SDK needed
from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://oai.helicone.ai/v1",
    default_headers={"Helicone-Auth": "Bearer hlc-..."},
)

# All calls are now logged and tracked
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

What is Helicone?

Helicone is an open-source LLM observability platform that works as a proxy between your app and LLM providers. With a one-line base URL change (no SDK needed), you get request logging, cost tracking, latency metrics, caching, rate limiting, and prompt versioning — for any LLM provider.

Answer-Ready: Helicone is an open-source LLM observability platform. One-line proxy integration (change base URL, no SDK) for request logging, cost tracking, caching, rate limiting, and prompt versioning across OpenAI, Anthropic, and all providers. 5k+ GitHub stars.

Best for: Teams running LLM apps in production who need observability without code changes. Works with: OpenAI, Anthropic, Google, Azure, any OpenAI-compatible API. Setup time: Under 1 minute.

Core Features

1. Zero-SDK Integration

Just change the base URL:

# OpenAI
client = OpenAI(base_url="https://oai.helicone.ai/v1")

# Anthropic
client = Anthropic(base_url="https://anthropic.helicone.ai")

# Azure OpenAI
client = AzureOpenAI(azure_endpoint="https://oai.helicone.ai")

2. Request Dashboard

Real-time dashboard showing:

  • All requests with input/output
  • Latency percentiles (p50, p95, p99)
  • Token usage per model
  • Cost breakdown per user/feature
  • Error rates and patterns
  • Geographic distribution

3. Cost Tracking

Dashboard view:
  Today:     $42.50 (1,250 requests)
  This week: $285.30 (8,700 requests)
  By model:
    gpt-4o:        $180 (40%)
    claude-sonnet:  $85 (30%)
    gpt-4o-mini:    $20 (30%)

4. Caching

# Enable caching with a header
client = OpenAI(
    base_url="https://oai.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": "Bearer hlc-...",
        "Helicone-Cache-Enabled": "true",
    },
)
# Identical requests return cached results instantly

5. Rate Limiting

headers = {
    "Helicone-RateLimit-Policy": "10;w=60",  # 10 requests per 60 seconds
}

6. Custom Properties

headers = {
    "Helicone-Property-User": "user-123",
    "Helicone-Property-Feature": "chat",
    "Helicone-Property-Environment": "production",
}
# Filter and group by these properties in the dashboard

7. Prompt Versioning

headers = {
    "Helicone-Prompt-Id": "customer-support-v3",
}
# Track performance per prompt version

Self-Hosting

git clone https://github.com/Helicone/helicone
docker compose up -d
# Dashboard at http://localhost:3000

FAQ

Q: Does it add latency? A: Helicone proxy adds < 50ms. Requests are logged asynchronously.

Q: Is my data safe? A: Self-host for full data control. Cloud version is SOC 2 Type II compliant.

Q: Can I use it with Anthropic Claude? A: Yes, change base URL to https://anthropic.helicone.ai.

🙏

Source & Thanks

Created by Helicone. Licensed under Apache 2.0.

Helicone/helicone — 5k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets