Weaviate — Open-Source Vector Database at Scale
Weaviate is an open-source vector database for semantic search at scale. 15.9K+ GitHub stars. Hybrid search (vector + BM25), built-in RAG, reranking, multi-tenancy, and horizontal scaling. BSD 3-Claus
What it is
Weaviate is an open-source vector database designed for semantic search at scale. It stores data objects alongside their vector embeddings and supports hybrid search that combines vector similarity with BM25 keyword matching. Built-in modules handle RAG (retrieval-augmented generation), reranking, and multi-tenancy.
Weaviate targets teams building AI-powered search, recommendation engines, and RAG pipelines who need a database that understands meaning rather than just keywords. It scales horizontally and supports multi-tenant architectures for SaaS applications.
Why it saves time or tokens
Traditional keyword search misses semantically related results. Weaviate's vector search retrieves relevant documents even when the exact words differ, which means fewer retrieval misses and fewer follow-up queries. For RAG pipelines, better retrieval means the LLM receives more relevant context, producing accurate answers with fewer tokens wasted on irrelevant passages.
The built-in RAG module eliminates the need to build retrieval-then-generate pipelines manually. You send a query, Weaviate retrieves relevant objects, and a configured LLM generates an answer in one API call.
How to use
- Start Weaviate with Docker:
docker compose up -dusing the official compose file - Define a schema (collection) with your data properties and vectorizer module
- Import data objects and query with nearText, nearVector, or hybrid search operators
Example
import weaviate
client = weaviate.connect_to_local()
collection = client.collections.get('Article')
results = collection.query.hybrid(
query='machine learning optimization',
limit=5,
alpha=0.75 # weight toward vector search
)
for obj in results.objects:
print(obj.properties['title'])
This hybrid query blends vector similarity (75% weight) with BM25 keyword matching (25% weight) to find the most relevant articles.
| Search Mode | When to Use |
|---|---|
| nearText | Pure semantic search by meaning |
| bm25 | Exact keyword matching |
| hybrid | Best of both, configurable alpha |
| nearVector | Search with a pre-computed vector |
Related on TokRepo
- AI tools for RAG — retrieval-augmented generation tools and frameworks
- AI tools for database — database tools for AI applications curated on TokRepo
Common pitfalls
- Choosing the wrong vectorizer module at schema creation time locks you into that embedding model; plan your vectorizer before importing data
- Hybrid search alpha parameter needs tuning per use case; 0.75 favors vectors, 0.25 favors keywords
- Multi-tenancy requires planning upfront; migrating a single-tenant Weaviate instance to multi-tenant is non-trivial
Frequently Asked Questions
Hybrid search combines vector similarity search with BM25 keyword matching in a single query. The alpha parameter controls the weight: alpha=1 is pure vector search, alpha=0 is pure keyword search, and values in between blend both scores. This gives you semantic understanding without losing exact-match precision.
Weaviate has a built-in generative module that chains retrieval and generation. You send a query with a generative prompt, Weaviate retrieves relevant objects via vector or hybrid search, then passes them to a configured LLM to generate an answer. This eliminates the need for a separate orchestration layer.
Yes. Weaviate supports horizontal scaling by sharding data across multiple nodes. Each shard handles a portion of the data and queries are distributed. For read-heavy workloads, you can add replicas. The cluster coordinates queries and merges results transparently.
Weaviate supports OpenAI, Cohere, Hugging Face, Google PaLM, and custom vectorizer modules. You configure the vectorizer at the collection level. You can also bring your own vectors by inserting pre-computed embeddings directly, bypassing the vectorizer module entirely.
Weaviate is open-source and self-hostable, while Pinecone is a managed cloud service. Weaviate offers hybrid search, built-in RAG, and multi-tenancy out of the box. Pinecone focuses on managed vector search with minimal operational overhead. Choose Weaviate for control and flexibility; Pinecone for a fully managed experience.
Citations (3)
- Weaviate GitHub— Weaviate is an open-source vector database with hybrid search
- Weaviate Docs— Hybrid search combines vector and BM25 keyword matching
- Weaviate RAG Docs— Weaviate supports built-in RAG with generative modules
Related on TokRepo
Source & Thanks
Created by Weaviate. Licensed under BSD 3-Clause. weaviate/weaviate — 15,900+ GitHub stars
Discussion
Related Assets
HumHub — Open-Source Enterprise Social Network
A flexible, open-source social networking platform built on Yii2 for creating private communities, intranets, and collaboration spaces within organizations.
Dolibarr — Open-Source ERP & CRM for Business Management
A modular open-source ERP and CRM application written in PHP for managing contacts, invoices, orders, inventory, accounting, and more from a single web interface.
PrestaShop — Open-Source PHP E-Commerce Platform
A widely adopted open-source e-commerce platform written in PHP with a rich module marketplace, multi-language support, and a strong European user base.