Skills2026年3月31日·1 分钟阅读

Chroma — Open-Source Vector Database for AI

Chroma is the open-source vector database and data infrastructure for AI applications. 27.1K+ GitHub stars. Simple 4-function API for embedding, storing, and querying documents. Supports Python, JavaS

AI Open Source · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Established

入口

Chroma — Open-Source Vector Database for AI

直接安装命令

npx -y tokrepo@latest install 04367306-be4a-4f46-854d-dd2b4d0d429e --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Chroma provides a simple 4-function API to embed, store, and query documents by semantic similarity for AI applications.

§01

What it is

Chroma is an open-source vector database designed as the data infrastructure layer for AI applications. It provides a minimal API -- create a collection, add documents, query by similarity, and retrieve by ID -- that handles tokenization, embedding, and indexing automatically. Chroma supports Python, JavaScript/TypeScript, Go, and Rust clients.

Chroma is built for developers creating RAG pipelines, semantic search engines, recommendation systems, or any AI application that needs vector similarity search. It works well with Claude, OpenAI, and other LLM providers as the retrieval backend.

§02

How it saves time or tokens

Chroma eliminates the complexity of managing embeddings manually. You pass raw text documents to collection.add() and Chroma handles embedding generation, indexing, and storage. When querying, you pass a natural-language query and get back the most semantically similar documents. This reduces the amount of context you need to pass to an LLM, directly cutting token consumption in RAG workflows. The simple API means less code to write and fewer tokens spent generating database logic.

§03

How to use

Install the client: pip install chromadb (Python) or npm install chromadb (JavaScript).
Create a client and collection: client = chromadb.Client() then collection = client.create_collection('docs').
Add documents with collection.add() and query with collection.query() using natural language.

§04

Example

import chromadb

client = chromadb.Client()
collection = client.create_collection('my_docs')

# Add documents (Chroma embeds them automatically)
collection.add(
    documents=[
        'AI agents can browse the web autonomously',
        'Vector databases store embeddings for similarity search',
        'RAG retrieves relevant context before generating answers'
    ],
    ids=['doc1', 'doc2', 'doc3']
)

# Query by natural language
results = collection.query(
    query_texts=['How do retrieval systems work?'],
    n_results=2
)
print(results['documents'])

§05

Related on TokRepo

RAG tools -- retrieval-augmented generation tools and frameworks
AI tools for coding -- developer tools enhanced with AI capabilities

§06

Common pitfalls

Using the default in-memory client for production. For persistent storage, use chromadb.PersistentClient(path='./chroma_data') or connect to a Chroma server.
Not specifying an embedding function when your documents need a specific model. Chroma uses a default model but you can pass a custom one via embedding_function parameter.
Adding documents without meaningful IDs. Chroma uses IDs for deduplication and updates, so random UUIDs make it hard to update or delete specific documents later.

常见问题

What embedding models does Chroma support?+

Chroma uses a default embedding model out of the box (Sentence Transformers). You can plug in OpenAI, Cohere, HuggingFace, or any custom embedding function by passing it when creating a collection. This makes it model-agnostic.

Can Chroma handle large-scale production workloads?+

Chroma offers a client-server mode where the server runs separately and handles persistence, indexing, and queries. For large-scale deployments, Chroma Cloud provides managed infrastructure. The open-source server handles millions of embeddings on a single node.

How does Chroma compare to Pinecone or Weaviate?+

Chroma focuses on simplicity with its 4-function API and runs locally with zero configuration. Pinecone is fully managed and cloud-only. Weaviate offers more features like GraphQL queries and hybrid search. Chroma is the easiest to get started with for prototyping.

Does Chroma support metadata filtering?+

Yes. You can attach metadata dictionaries to documents and filter queries using where clauses. For example, filter by category, date range, or any custom field while still ranking by vector similarity.

Can I use Chroma with Claude or other LLMs?+

Yes. Chroma is commonly used as the retrieval backend in RAG pipelines. You query Chroma for relevant documents, then pass those documents as context to Claude, GPT, or any LLM to generate grounded answers.

引用来源 (3)

Chroma GitHub— Chroma is the open-source embedding database with 27K+ stars
Chroma Documentation— Simple 4-function API: create, add, query, get
Chroma Docs— Supports Python, JavaScript, Go, and Rust clients

🙏

来源与感谢

Created by Chroma. Licensed under Apache 2.0. chroma-core/chroma — 27,100+ GitHub stars

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Chroma — Open-Source Vector Database for AI

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

NocoDB — Open Source No-Code Database Platform

Reactive Resume — AI-Powered Open-Source Resume Builder

Twenty — Open-Source AI CRM (Salesforce Alternative)

Kepler.gl — Open Source Geospatial Data Visualization