What is Turbopuffer?
Turbopuffer is a serverless vector database designed for AI search workloads. It stores embeddings and serves similarity queries with sub-millisecond latency at any scale. Unlike self-hosted vector databases, Turbopuffer requires zero infrastructure — just an API key. Pay only for what you query, with automatic scaling from zero to billions of vectors.
Answer-Ready: Turbopuffer is a serverless vector database for AI search. Sub-millisecond queries, automatic scaling, pay-per-query pricing. No infrastructure to manage. Supports filtering, hybrid search, and namespaces. Used by AI companies for production RAG. Backed by a]16z.
Best for: AI teams building RAG or semantic search without managing infrastructure. Works with: OpenAI embeddings, Cohere, any embedding model. Setup time: Under 1 minute.
Core Features
1. Serverless (Zero Ops)
No clusters, no replicas, no shards. Create a namespace and start querying:
ns = tpuf.Namespace("products")
ns.upsert(ids=[1], vectors=[[...]], attributes={"name": ["Widget"]})
# That's it. No provisioning.2. Attribute Filtering
results = ns.query(
vector=[...],
top_k=10,
filters={"category": ["electronics"], "price": {"$lt": 100}},
)3. Hybrid Search
# Combine vector similarity with BM25 text search
results = ns.query(
vector=[...],
top_k=10,
rank_by=["vector_distance", "bm25"],
)4. Performance
| Metric | Value |
|---|---|
| Query latency (p50) | <1ms |
| Query latency (p99) | <10ms |
| Max vectors | Billions |
| Dimensions | Up to 4096 |
Turbopuffer vs Alternatives
| Feature | Turbopuffer | Pinecone | Qdrant | Weaviate |
|---|---|---|---|---|
| Serverless | Yes | Yes (paid) | No | No |
| Pricing | Per query | Per pod/hour | Free (OSS) | Free (OSS) |
| Scale to zero | Yes | No | N/A | N/A |
| Self-hosted | No | No | Yes | Yes |
| Latency | <1ms | ~10ms | ~5ms | ~5ms |
FAQ
Q: How does pricing work? A: Pay per query and storage. No minimum spend. Scales to zero when not in use — ideal for variable workloads.
Q: Can I migrate from Pinecone? A: Yes, export vectors from Pinecone and upsert into Turbopuffer. The API is similar.
Q: Does it support metadata filtering? A: Yes, filter on any attribute with comparison operators ($eq, $lt, $gt, $in, etc.).