Skills2026年3月31日·1 分钟阅读

Weaviate — Open-Source Vector Database at Scale

Weaviate is an open-source vector database for semantic search at scale. 15.9K+ GitHub stars. Hybrid search (vector + BM25), built-in RAG, reranking, multi-tenancy, and horizontal scaling. BSD 3-Claus

AI Open Source · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Established

入口

Weaviate — Open-Source Vector Database at Scale

直接安装命令

npx -y tokrepo@latest install 492f7d14-9545-43b7-8f9c-626f895b912e --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Weaviate stores vectors alongside data and serves hybrid semantic plus keyword search at scale.

§01

What it is

Weaviate is an open-source vector database designed for semantic search at scale. It stores data objects alongside their vector embeddings and supports hybrid search that combines vector similarity with BM25 keyword matching. Built-in modules handle RAG (retrieval-augmented generation), reranking, and multi-tenancy.

Weaviate targets teams building AI-powered search, recommendation engines, and RAG pipelines who need a database that understands meaning rather than just keywords. It scales horizontally and supports multi-tenant architectures for SaaS applications.

§02

Why it saves time or tokens

Traditional keyword search misses semantically related results. Weaviate's vector search retrieves relevant documents even when the exact words differ, which means fewer retrieval misses and fewer follow-up queries. For RAG pipelines, better retrieval means the LLM receives more relevant context, producing accurate answers with fewer tokens wasted on irrelevant passages.

The built-in RAG module eliminates the need to build retrieval-then-generate pipelines manually. You send a query, Weaviate retrieves relevant objects, and a configured LLM generates an answer in one API call.

§03

How to use

Start Weaviate with Docker: docker compose up -d using the official compose file
Define a schema (collection) with your data properties and vectorizer module
Import data objects and query with nearText, nearVector, or hybrid search operators

§04

Example

import weaviate

client = weaviate.connect_to_local()

collection = client.collections.get('Article')

results = collection.query.hybrid(
    query='machine learning optimization',
    limit=5,
    alpha=0.75  # weight toward vector search
)

for obj in results.objects:
    print(obj.properties['title'])

This hybrid query blends vector similarity (75% weight) with BM25 keyword matching (25% weight) to find the most relevant articles.

Search Mode	When to Use
nearText	Pure semantic search by meaning
bm25	Exact keyword matching
hybrid	Best of both, configurable alpha
nearVector	Search with a pre-computed vector

§05

Related on TokRepo

AI tools for RAG — retrieval-augmented generation tools and frameworks
AI tools for database — database tools for AI applications curated on TokRepo

§06

Common pitfalls

Choosing the wrong vectorizer module at schema creation time locks you into that embedding model; plan your vectorizer before importing data
Hybrid search alpha parameter needs tuning per use case; 0.75 favors vectors, 0.25 favors keywords
Multi-tenancy requires planning upfront; migrating a single-tenant Weaviate instance to multi-tenant is non-trivial

常见问题

What is hybrid search in Weaviate?+

Hybrid search combines vector similarity search with BM25 keyword matching in a single query. The alpha parameter controls the weight: alpha=1 is pure vector search, alpha=0 is pure keyword search, and values in between blend both scores. This gives you semantic understanding without losing exact-match precision.

How does Weaviate handle RAG?+

Weaviate has a built-in generative module that chains retrieval and generation. You send a query with a generative prompt, Weaviate retrieves relevant objects via vector or hybrid search, then passes them to a configured LLM to generate an answer. This eliminates the need for a separate orchestration layer.

Can Weaviate scale horizontally?+

Yes. Weaviate supports horizontal scaling by sharding data across multiple nodes. Each shard handles a portion of the data and queries are distributed. For read-heavy workloads, you can add replicas. The cluster coordinates queries and merges results transparently.

What embedding models does Weaviate support?+

Weaviate supports OpenAI, Cohere, Hugging Face, Google PaLM, and custom vectorizer modules. You configure the vectorizer at the collection level. You can also bring your own vectors by inserting pre-computed embeddings directly, bypassing the vectorizer module entirely.

How does Weaviate compare to Pinecone?+

Weaviate is open-source and self-hostable, while Pinecone is a managed cloud service. Weaviate offers hybrid search, built-in RAG, and multi-tenancy out of the box. Pinecone focuses on managed vector search with minimal operational overhead. Choose Weaviate for control and flexibility; Pinecone for a fully managed experience.

引用来源 (3)

Weaviate GitHub— Weaviate is an open-source vector database with hybrid search
Weaviate Docs— Hybrid search combines vector and BM25 keyword matching
Weaviate RAG Docs— Weaviate supports built-in RAG with generative modules

🙏

来源与感谢

Created by Weaviate. Licensed under BSD 3-Clause. weaviate/weaviate — 15,900+ GitHub stars

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Weaviate — Open-Source Vector Database at Scale

Agent 可直接安装

What it is

Why it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

NocoDB — Open Source No-Code Database Platform

Kepler.gl — Open Source Geospatial Data Visualization

Turbopuffer — Serverless Vector DB for AI Search

Verba — The Golden RAGtriever by Weaviate