TOKREPO · ARSENAL

New · this week

Vector DB Showdown

Q: Are these vector DBs free to run?

Five of seven are: Chroma, Weaviate, txtai, and the Qdrant MCP server are all open-source under permissive licenses; you pay only for compute. Pinecone is managed-only with a free starter tier (100k vectors); Cohere and Together charge per million tokens for embedding calls. The pack documents both OSS and paid pricing so you can pick without surprises.

Q: How does this differ from the rag-pipelines pack?

This pack is the storage layer — what holds the vectors. The rag-pipelines pack is the orchestration above it: chunking, query rewriting, retriever ensembling, and reranking. You pick a vector DB once and rarely change it; you tune RAG parameters constantly. Keeping them as separate packs lets you upgrade either independently.

Q: Will this work with Claude Code or Cursor?

Yes. Qdrant ships an MCP server so Claude Code can query the vector store as a tool — call it `qdrant.search()` from any agent prompt. Chroma and Weaviate have community MCP servers covered in the Modern CLI Toolbelt and MCP Server Stack packs. Cursor users use the same servers via standard MCP integration.

Q: What's the difference between Pinecone and Qdrant for production?

Pinecone is fully managed with predictable p95 latency and zero-ops scaling, but per-pod pricing rises sharply past 50M vectors. Qdrant runs anywhere — your laptop, Kubernetes, or Qdrant Cloud — and consistently leads ANN-Benchmarks for recall at high QPS. Pick Pinecone if your team is small and budget allows; pick Qdrant if you need self-host or are cost-sensitive at scale.

Q: Operational gotcha when migrating between vector DBs?

The vectors aren't portable across embedding models, but they *are* portable across DBs if the model stays the same. Most migrations break because teams tweaked the embedding pipeline (chunk size, model version) during the move. Lock the pipeline first, snapshot embeddings to S3, migrate the DB, validate sample queries return identical IDs — then iterate on the pipeline.

Chroma, Weaviate, Pinecone, txtai, Qdrant MCP, plus the embedding APIs from Cohere and Together — pick by latency, cost, or RAG accuracy.

7 assets

About this pack

What's in this pack

This pack puts the seven dominant vector database options side by side so the decision becomes a 10-minute exercise instead of a week of evaluation. The choice space splits cleanly into three layers: self-hosted databases, managed databases, and embedding APIs (which often ship a basic vector store as a side-effect).

#	Asset	Tier	Best at
1	Chroma	Self-hosted	Single-node simplicity, fastest local prototyping
2	Weaviate	Self-hosted/managed	Hybrid search with built-in BM25 + vector
3	Pinecone	Managed only	Zero-ops scaling, predictable p95
4	txtai	Self-hosted	Embed + search in one Python library
5	Qdrant MCP	Self-hosted	Native MCP server so agents query directly
6	Cohere embeddings	API	Best-in-class multilingual quality
7	Together embeddings	API	Cheapest token economics for batch jobs

This pack is intentionally the DB layer — what stores the vectors and serves nearest-neighbor queries. The retrieve-and-generate orchestration on top (chunking, query rewriting, reranking) lives in the RAG Pipelines pack so the two decisions stay independent.

Why pick deliberately

Most teams spend their first six months on the wrong vector DB and discover it only when something breaks. The two failure modes:

Started with Pinecone, hit billing pain at scale. Pinecone's per-pod pricing makes sense at 1M vectors but starts looking expensive at 50M. Migrating off requires a re-embed campaign.
Started self-hosted, hit ops pain at scale. A team with one Chroma node accumulates a 30M-vector store, then discovers single-node ANN indexes don't gracefully degrade — query latency goes from 50ms to 800ms over one quarter.

Picking deliberately means looking at three axes:

Recall vs latency at your vector count. ANN-Benchmarks publishes recall@10 vs QPS curves; Qdrant and Pinecone consistently lead at >10M vectors, Chroma is fine below 5M.
Hybrid search needs. If your queries blend keyword filters with semantic similarity, Weaviate's hybrid mode and Qdrant's payload filters are the differentiators — bolting BM25 onto Chroma after the fact is painful.
Operations posture. If your team is two engineers, Pinecone's "no servers to babysit" wins. If you're already running Postgres at scale, pgvector (in the Postgres for Agents pack) often beats every option here on total cost of ownership.

Install in one command

# Install the entire pack into the current project
tokrepo install pack/vector-db-showdown

# Or pick individual assets
tokrepo install qdrant-mcp
tokrepo install chroma

The TokRepo CLI installs Docker Compose snippets for self-hosted options, env-var templates for managed APIs, and benchmark scripts that load 100k vectors and measure p95 query latency on your hardware.

Common pitfalls

Benchmarking with random vectors. Random vectors have flat distance distributions — every index looks equally fast. Always benchmark with real embeddings from your domain (Wikipedia dumps work as a public proxy).
Picking the wrong distance metric. Cosine vs dot product vs L2 give different rankings on the same data. Match the metric the embedding model was trained for; OpenAI text-embedding-3 expects cosine, some open models expect dot product.
Ignoring the embedding-model lock-in. If you embed 100M docs with Cohere and want to switch to OpenAI, you re-embed everything. Some teams store both embedding models in parallel for a transition period.
Treating "vector DB" as a complete RAG solution. None of these tools rerank, query-rewrite, or evaluate result quality. Pair with the RAG Pipelines pack and the LLM Eval pack.
Underestimating filter cardinality. Pre-filtering by a high-cardinality field (e.g. user_id) before ANN search devastates recall on most engines. Either use post-filtering or build per-user indexes.

When this pack alone isn't enough

If your dataset is small (<1M vectors) and you already have Postgres, pgvector beats every option here on operational simplicity — one fewer service to monitor. If your queries need geographic or graph constraints in addition to semantic similarity, look at Neo4j with GDS or OpenSearch with k-NN — different tradeoffs but cleaner for those shapes. And if you're operating at 1B+ vectors, you've outgrown this pack entirely; talk to vendors about Vespa or Milvus dedicated tiers.

INSTALL · ONE COMMAND

$ tokrepo install pack/vector-db-showdown

hand it to your agent — or paste it in your terminal

What's inside

7 assets in this pack

Config#01

Chroma — Open-Source Vector Database for AI

Chroma is the open-source vector database and data infrastructure for AI applications. 27.1K+ GitHub stars. Simple 4-function API for embedding, storing, and querying documents. Supports Python, JavaS

by AI Open Source·123 views

$ tokrepo install chroma-open-source-vector-database-ai-04367306

Config#02

Weaviate — Open-Source Vector Database at Scale

Weaviate is an open-source vector database for semantic search at scale. 15.9K+ GitHub stars. Hybrid search (vector + BM25), built-in RAG, reranking, multi-tenancy, and horizontal scaling. BSD 3-Claus

by AI Open Source·121 views

$ tokrepo install weaviate-open-source-vector-database-scale-492f7d14

Agent#03

Pinecone — Managed Vector Database for Production AI

Fully managed vector database for production AI search. Pinecone offers serverless scaling, hybrid search, metadata filtering, and enterprise security with zero infrastructure.

by AI Open Source·105 views

$ tokrepo install pinecone-managed-vector-database-production-ai-0fc5f7e8

Script#04

txtai — All-in-One Embeddings Database

txtai is an all-in-one embeddings database for semantic search, LLM orchestration, and language model workflows. 10.4K+ GitHub stars. Vector search + SQL + RAG pipelines. Apache 2.0.

by Script Depot·128 views

$ tokrepo install txtai-all-one-embeddings-database-b732febc

MCP#05

Qdrant MCP — Vector Search Engine for AI Agents

MCP server for Qdrant vector database. Gives AI agents the power to store and search embeddings for RAG, semantic search, and recommendation systems. 22,000+ stars on Qdrant.

by MCP Hub·90 views

$ tokrepo install qdrant-mcp-vector-search-engine-ai-agents-301ce58e

Prompt#06

Cohere Embed — Multilingual AI Embeddings API

Generate high-quality multilingual embeddings for search and RAG. Cohere Embed v3 supports 100+ languages with specialized modes for documents, queries, and classification.

by Prompt Lab·97 views

$ tokrepo install cohere-embed-multilingual-ai-embeddings-api-dde04e91

Skill#07

Together AI Embeddings & Reranking Skill for Agents

Skill that teaches Claude Code Together AI's embeddings and reranking API. Covers dense vector generation, semantic search, RAG pipelines, and result reranking patterns.

by Together AI·122 views

$ tokrepo install together-ai-embeddings-reranking-skill-agents-da3bf81c

FAQ

Frequently asked questions

Are these vector DBs free to run?

Five of seven are: Chroma, Weaviate, txtai, and the Qdrant MCP server are all open-source under permissive licenses; you pay only for compute. Pinecone is managed-only with a free starter tier (100k vectors); Cohere and Together charge per million tokens for embedding calls. The pack documents both OSS and paid pricing so you can pick without surprises.

How does this differ from the rag-pipelines pack?

This pack is the storage layer — what holds the vectors. The rag-pipelines pack is the orchestration above it: chunking, query rewriting, retriever ensembling, and reranking. You pick a vector DB once and rarely change it; you tune RAG parameters constantly. Keeping them as separate packs lets you upgrade either independently.

Will this work with Claude Code or Cursor?

Yes. Qdrant ships an MCP server so Claude Code can query the vector store as a tool — call it qdrant.search() from any agent prompt. Chroma and Weaviate have community MCP servers covered in the Modern CLI Toolbelt and MCP Server Stack packs. Cursor users use the same servers via standard MCP integration.

What's the difference between Pinecone and Qdrant for production?

Pinecone is fully managed with predictable p95 latency and zero-ops scaling, but per-pod pricing rises sharply past 50M vectors. Qdrant runs anywhere — your laptop, Kubernetes, or Qdrant Cloud — and consistently leads ANN-Benchmarks for recall at high QPS. Pick Pinecone if your team is small and budget allows; pick Qdrant if you need self-host or are cost-sensitive at scale.

Operational gotcha when migrating between vector DBs?

The vectors aren't portable across embedding models, but they are portable across DBs if the model stays the same. Most migrations break because teams tweaked the embedding pipeline (chunk size, model version) during the move. Lock the pipeline first, snapshot embeddings to S3, migrate the DB, validate sample queries return identical IDs — then iterate on the pipeline.

12 packs · 80+ hand-picked assets

Browse every curated bundle on the home page

Back to all packs