WorkflowsApr 8, 2026·2 min read

Turbopuffer — Serverless Vector DB for AI Search

Serverless vector database built for AI search at scale. Turbopuffer offers sub-millisecond queries, automatic scaling, and pay-per-query pricing with zero infrastructure.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install turbopuffer
import turbopuffer as tpuf

# Connect (serverless — no infrastructure to manage)
tpuf.api_key = "tbp_..."

# Create namespace and upsert vectors
ns = tpuf.Namespace("my-docs")
ns.upsert(
    ids=[1, 2, 3],
    vectors=[[0.1, 0.2, ...], [0.3, 0.4, ...], [0.5, 0.6, ...]],
    attributes={"title": ["Doc A", "Doc B", "Doc C"]},
)

# Query
results = ns.query(
    vector=[0.15, 0.25, ...],
    top_k=10,
    include_attributes=["title"],
)
for r in results:
    print(f"{r.id}: {r.attributes['title']} (score: {r.dist:.4f})")

What is Turbopuffer?

Turbopuffer is a serverless vector database designed for AI search workloads. It stores embeddings and serves similarity queries with sub-millisecond latency at any scale. Unlike self-hosted vector databases, Turbopuffer requires zero infrastructure — just an API key. Pay only for what you query, with automatic scaling from zero to billions of vectors.

Answer-Ready: Turbopuffer is a serverless vector database for AI search. Sub-millisecond queries, automatic scaling, pay-per-query pricing. No infrastructure to manage. Supports filtering, hybrid search, and namespaces. Used by AI companies for production RAG. Backed by a]16z.

Best for: AI teams building RAG or semantic search without managing infrastructure. Works with: OpenAI embeddings, Cohere, any embedding model. Setup time: Under 1 minute.

Core Features

1. Serverless (Zero Ops)

No clusters, no replicas, no shards. Create a namespace and start querying:

ns = tpuf.Namespace("products")
ns.upsert(ids=[1], vectors=[[...]], attributes={"name": ["Widget"]})
# That's it. No provisioning.

2. Attribute Filtering

results = ns.query(
    vector=[...],
    top_k=10,
    filters={"category": ["electronics"], "price": {"$lt": 100}},
)

3. Hybrid Search

# Combine vector similarity with BM25 text search
results = ns.query(
    vector=[...],
    top_k=10,
    rank_by=["vector_distance", "bm25"],
)

4. Performance

Metric Value
Query latency (p50) <1ms
Query latency (p99) <10ms
Max vectors Billions
Dimensions Up to 4096

Turbopuffer vs Alternatives

Feature Turbopuffer Pinecone Qdrant Weaviate
Serverless Yes Yes (paid) No No
Pricing Per query Per pod/hour Free (OSS) Free (OSS)
Scale to zero Yes No N/A N/A
Self-hosted No No Yes Yes
Latency <1ms ~10ms ~5ms ~5ms

FAQ

Q: How does pricing work? A: Pay per query and storage. No minimum spend. Scales to zero when not in use — ideal for variable workloads.

Q: Can I migrate from Pinecone? A: Yes, export vectors from Pinecone and upsert into Turbopuffer. The API is similar.

Q: Does it support metadata filtering? A: Yes, filter on any attribute with comparison operators ($eq, $lt, $gt, $in, etc.).

🙏

Source & Thanks

Created by Turbopuffer. Backed by a16z.

turbopuffer.com — Serverless vector database

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets