Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsApr 8, 2026·2 min de lecture

Turbopuffer — Serverless Vector DB for AI Search

Serverless vector database built for AI search at scale. Turbopuffer offers sub-millisecond queries, automatic scaling, and pay-per-query pricing with zero infrastructure.

What is Turbopuffer?

Turbopuffer is a serverless vector database designed for AI search workloads. It stores embeddings and serves similarity queries with sub-millisecond latency at any scale. Unlike self-hosted vector databases, Turbopuffer requires zero infrastructure — just an API key. Pay only for what you query, with automatic scaling from zero to billions of vectors.

Answer-Ready: Turbopuffer is a serverless vector database for AI search. Sub-millisecond queries, automatic scaling, pay-per-query pricing. No infrastructure to manage. Supports filtering, hybrid search, and namespaces. Used by AI companies for production RAG. Backed by a]16z.

Best for: AI teams building RAG or semantic search without managing infrastructure. Works with: OpenAI embeddings, Cohere, any embedding model. Setup time: Under 1 minute.

Core Features

1. Serverless (Zero Ops)

No clusters, no replicas, no shards. Create a namespace and start querying:

ns = tpuf.Namespace("products")
ns.upsert(ids=[1], vectors=[[...]], attributes={"name": ["Widget"]})
# That's it. No provisioning.

2. Attribute Filtering

results = ns.query(
    vector=[...],
    top_k=10,
    filters={"category": ["electronics"], "price": {"$lt": 100}},
)

3. Hybrid Search

# Combine vector similarity with BM25 text search
results = ns.query(
    vector=[...],
    top_k=10,
    rank_by=["vector_distance", "bm25"],
)

4. Performance

Metric Value
Query latency (p50) <1ms
Query latency (p99) <10ms
Max vectors Billions
Dimensions Up to 4096

Turbopuffer vs Alternatives

Feature Turbopuffer Pinecone Qdrant Weaviate
Serverless Yes Yes (paid) No No
Pricing Per query Per pod/hour Free (OSS) Free (OSS)
Scale to zero Yes No N/A N/A
Self-hosted No No Yes Yes
Latency <1ms ~10ms ~5ms ~5ms

FAQ

Q: How does pricing work? A: Pay per query and storage. No minimum spend. Scales to zero when not in use — ideal for variable workloads.

Q: Can I migrate from Pinecone? A: Yes, export vectors from Pinecone and upsert into Turbopuffer. The API is similar.

Q: Does it support metadata filtering? A: Yes, filter on any attribute with comparison operators ($eq, $lt, $gt, $in, etc.).

🙏

Source et remerciements

Created by Turbopuffer. Backed by a16z.

turbopuffer.com — Serverless vector database

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires