Vector DB Showdown
Chroma, Weaviate, Pinecone, txtai, Qdrant MCP, plus the embedding APIs from Cohere and Together — pick by latency, cost, or RAG accuracy.
What's in this pack
This pack puts the seven dominant vector database options side by side so the decision becomes a 10-minute exercise instead of a week of evaluation. The choice space splits cleanly into three layers: self-hosted databases, managed databases, and embedding APIs (which often ship a basic vector store as a side-effect).
| # | Asset | Tier | Best at |
|---|---|---|---|
| 1 | Chroma | Self-hosted | Single-node simplicity, fastest local prototyping |
| 2 | Weaviate | Self-hosted/managed | Hybrid search with built-in BM25 + vector |
| 3 | Pinecone | Managed only | Zero-ops scaling, predictable p95 |
| 4 | txtai | Self-hosted | Embed + search in one Python library |
| 5 | Qdrant MCP | Self-hosted | Native MCP server so agents query directly |
| 6 | Cohere embeddings | API | Best-in-class multilingual quality |
| 7 | Together embeddings | API | Cheapest token economics for batch jobs |
This pack is intentionally the DB layer — what stores the vectors and serves nearest-neighbor queries. The retrieve-and-generate orchestration on top (chunking, query rewriting, reranking) lives in the RAG Pipelines pack so the two decisions stay independent.
Why pick deliberately
Most teams spend their first six months on the wrong vector DB and discover it only when something breaks. The two failure modes:
- Started with Pinecone, hit billing pain at scale. Pinecone's per-pod pricing makes sense at 1M vectors but starts looking expensive at 50M. Migrating off requires a re-embed campaign.
- Started self-hosted, hit ops pain at scale. A team with one Chroma node accumulates a 30M-vector store, then discovers single-node ANN indexes don't gracefully degrade — query latency goes from 50ms to 800ms over one quarter.
Picking deliberately means looking at three axes:
- Recall vs latency at your vector count. ANN-Benchmarks publishes recall@10 vs QPS curves; Qdrant and Pinecone consistently lead at >10M vectors, Chroma is fine below 5M.
- Hybrid search needs. If your queries blend keyword filters with semantic similarity, Weaviate's hybrid mode and Qdrant's payload filters are the differentiators — bolting BM25 onto Chroma after the fact is painful.
- Operations posture. If your team is two engineers, Pinecone's "no servers to babysit" wins. If you're already running Postgres at scale, pgvector (in the Postgres for Agents pack) often beats every option here on total cost of ownership.
Install in one command
# Install the entire pack into the current project
tokrepo install pack/vector-db-showdown
# Or pick individual assets
tokrepo install qdrant-mcp
tokrepo install chroma
The TokRepo CLI installs Docker Compose snippets for self-hosted options, env-var templates for managed APIs, and benchmark scripts that load 100k vectors and measure p95 query latency on your hardware.
Common pitfalls
- Benchmarking with random vectors. Random vectors have flat distance distributions — every index looks equally fast. Always benchmark with real embeddings from your domain (Wikipedia dumps work as a public proxy).
- Picking the wrong distance metric. Cosine vs dot product vs L2 give different rankings on the same data. Match the metric the embedding model was trained for; OpenAI text-embedding-3 expects cosine, some open models expect dot product.
- Ignoring the embedding-model lock-in. If you embed 100M docs with Cohere and want to switch to OpenAI, you re-embed everything. Some teams store both embedding models in parallel for a transition period.
- Treating "vector DB" as a complete RAG solution. None of these tools rerank, query-rewrite, or evaluate result quality. Pair with the RAG Pipelines pack and the LLM Eval pack.
- Underestimating filter cardinality. Pre-filtering by a high-cardinality field (e.g. user_id) before ANN search devastates recall on most engines. Either use post-filtering or build per-user indexes.
When this pack alone isn't enough
If your dataset is small (<1M vectors) and you already have Postgres, pgvector beats every option here on operational simplicity — one fewer service to monitor. If your queries need geographic or graph constraints in addition to semantic similarity, look at Neo4j with GDS or OpenSearch with k-NN — different tradeoffs but cleaner for those shapes. And if you're operating at 1B+ vectors, you've outgrown this pack entirely; talk to vendors about Vespa or Milvus dedicated tiers.
7 assets in this pack
Frequently asked questions
Are these vector DBs free to run?
Five of seven are: Chroma, Weaviate, txtai, and the Qdrant MCP server are all open-source under permissive licenses; you pay only for compute. Pinecone is managed-only with a free starter tier (100k vectors); Cohere and Together charge per million tokens for embedding calls. The pack documents both OSS and paid pricing so you can pick without surprises.
How does this differ from the rag-pipelines pack?
This pack is the storage layer — what holds the vectors. The rag-pipelines pack is the orchestration above it: chunking, query rewriting, retriever ensembling, and reranking. You pick a vector DB once and rarely change it; you tune RAG parameters constantly. Keeping them as separate packs lets you upgrade either independently.
Will this work with Claude Code or Cursor?
Yes. Qdrant ships an MCP server so Claude Code can query the vector store as a tool — call it qdrant.search() from any agent prompt. Chroma and Weaviate have community MCP servers covered in the Modern CLI Toolbelt and MCP Server Stack packs. Cursor users use the same servers via standard MCP integration.
What's the difference between Pinecone and Qdrant for production?
Pinecone is fully managed with predictable p95 latency and zero-ops scaling, but per-pod pricing rises sharply past 50M vectors. Qdrant runs anywhere — your laptop, Kubernetes, or Qdrant Cloud — and consistently leads ANN-Benchmarks for recall at high QPS. Pick Pinecone if your team is small and budget allows; pick Qdrant if you need self-host or are cost-sensitive at scale.
Operational gotcha when migrating between vector DBs?
The vectors aren't portable across embedding models, but they are portable across DBs if the model stays the same. Most migrations break because teams tweaked the embedding pipeline (chunk size, model version) during the move. Lock the pipeline first, snapshot embeddings to S3, migrate the DB, validate sample queries return identical IDs — then iterate on the pipeline.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs