Introduction
TensorZero is an open-source platform that combines an LLM gateway, structured observability, evaluation pipelines, and prompt optimization into one unified system. Built in Rust for high throughput and low latency, it helps engineering teams move from ad-hoc LLM usage to a structured, data-driven approach to improving AI features over time.
What TensorZero Does
- Routes LLM requests through a unified gateway with structured input/output schemas
- Collects inference data and feedback for observability and evaluation
- Supports A/B testing and experimentation across models and prompt variants
- Enables automated prompt optimization using collected production data
- Integrates with OpenAI, Anthropic, AWS Bedrock, and other LLM providers
Architecture Overview
TensorZero defines LLM interactions as typed functions with JSON schemas for inputs and outputs. The gateway processes requests, applies routing rules (round-robin, A/B split, or custom logic), and forwards them to configured providers. All inferences and associated feedback are stored in ClickHouse for low-latency analytical queries. An optimization layer uses this data to fine-tune prompts or select better model variants.
Self-Hosting & Configuration
- Deploy via Docker Compose with the provided
docker-compose.ymltemplate - Configure functions, variants, and providers in a TOML configuration file
- Point to a ClickHouse instance for inference and feedback storage
- Set provider API keys via environment variables (never in config files)
- Use the Python or TypeScript SDK to integrate with your application code
Key Features
- Rust-based gateway adds sub-millisecond overhead per request
- Structured function schemas enforce type safety across LLM calls
- Built-in A/B testing framework with statistical significance tracking
- ClickHouse-backed observability for fast analytical queries over inference data
- Feedback collection API ties user outcomes back to specific inferences
Comparison with Similar Tools
- LiteLLM — Proxy-focused with broad provider support; TensorZero adds evaluation and optimization
- Langfuse — Observability-focused; TensorZero integrates gateway and optimization in one system
- Helicone — LLM proxy with logging; TensorZero adds structured schemas and experimentation
- Portkey — Commercial gateway; TensorZero is fully open source with Rust performance
- Braintrust — Eval platform; TensorZero combines eval with production gateway routing
FAQ
Q: What LLM providers does TensorZero support? A: OpenAI, Anthropic, AWS Bedrock, Google AI, Azure OpenAI, Fireworks, Together, and any OpenAI-compatible endpoint.
Q: Why Rust instead of Python? A: Rust provides predictable low-latency performance critical for a gateway in the request path, with sub-millisecond overhead per inference.
Q: How does the optimization loop work? A: TensorZero collects inference-feedback pairs, then uses them to generate improved prompt variants or fine-tune models, which can be deployed as new A/B test variants.
Q: Can I use TensorZero without ClickHouse? A: ClickHouse is the recommended and default storage backend. It provides the analytical query performance needed for evaluation and optimization workflows.