# Chroma — Open-Source Embedding Database for AI

> Lightweight open-source vector database that runs anywhere. Chroma provides in-memory, local file, and client-server modes for embeddings with zero-config LangChain integration.

## Install

Copy the content below into your project:

## Quick Use

```bash
pip install chromadb
```

```python
import chromadb

client = chromadb.Client()  # In-memory
# Or: client = chromadb.PersistentClient(path="./chroma_db")

collection = client.create_collection("docs")

# Add documents (auto-embeds with default model)
collection.add(
    documents=["AI is transforming software", "Python is popular for ML"],
    ids=["doc1", "doc2"],
)

# Query with natural language
results = collection.query(query_texts=["machine learning tools"], n_results=2)
print(results["documents"])
```

## What is Chroma?

Chroma is a lightweight, open-source embedding database designed for AI applications. It handles embedding generation, storage, and retrieval in one package. Start with in-memory for prototyping, switch to persistent storage for production — no infrastructure changes needed. First-class integrations with LangChain, LlamaIndex, and OpenAI.

**Answer-Ready**: Chroma is an open-source embedding database for AI. Auto-generates embeddings, stores and queries vectors with zero config. In-memory, local file, or client-server modes. Native LangChain/LlamaIndex integration. Simplest path from prototype to production RAG. 16k+ GitHub stars.

**Best for**: Developers building RAG prototypes that need to scale. **Works with**: LangChain, LlamaIndex, OpenAI, any embedding model. **Setup time**: Under 1 minute.

## Core Features

### 1. Three Deployment Modes

```python
# In-memory (prototyping)
client = chromadb.Client()

# Local persistent (single-user)
client = chromadb.PersistentClient(path="./db")

# Client-server (production)
# Server: chroma run --path ./db --port 8000
client = chromadb.HttpClient(host="localhost", port=8000)
```

### 2. Auto-Embedding

```python
# Chroma embeds text automatically with default model
collection.add(documents=["Hello world"], ids=["1"])

# Or bring your own embeddings
collection.add(
    embeddings=[[0.1, 0.2, 0.3, ...]],
    documents=["Hello world"],
    ids=["1"],
)

# Or use custom embedding function
from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key="sk-...")
collection = client.create_collection("docs", embedding_function=openai_ef)
```

### 3. Metadata Filtering

```python
results = collection.query(
    query_texts=["AI tools"],
    n_results=5,
    where={"category": "development"},
    where_document={"$contains": "Python"},
)
```

### 4. LangChain Integration

```python
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(),
    persist_directory="./chroma_db",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
```

## Chroma vs Alternatives

| Feature | Chroma | Qdrant | Pinecone | FAISS |
|---------|--------|--------|----------|-------|
| Self-hosted | Yes | Yes | No | Yes |
| Auto-embedding | Yes | No | No | No |
| Zero config | Yes | Docker needed | Account needed | Code needed |
| Metadata filter | Yes | Advanced | Yes | No |
| Managed cloud | Yes | Yes | Yes | No |
| Best for | Prototyping → prod | Production scale | Managed scale | Research |

## FAQ

**Q: How does it scale?**
A: Client-server mode supports millions of embeddings. For billions, consider Qdrant or Pinecone.

**Q: Is auto-embedding good enough?**
A: Default model (all-MiniLM-L6-v2) is decent for English. For production, use OpenAI or Cohere embeddings.

**Q: Can I use it with Claude?**
A: Yes, store Claude's outputs as embeddings for retrieval, or use Chroma as context for Claude queries.

## Source & Thanks

> Created by [Chroma](https://github.com/chroma-core). Licensed under Apache 2.0.
>
> [chroma-core/chroma](https://github.com/chroma-core/chroma) — 16k+ stars

<!-- ZH -->

## 快速使用

```bash
pip install chromadb
```

三行代码启动向量数据库，自动嵌入。

## 什么是 Chroma？

轻量开源嵌入数据库，自动生成嵌入，零配置启动。内存/本地/服务器三种模式，原生 LangChain 集成。

**一句话总结**：开源嵌入数据库，自动嵌入 + 零配置 + 三种部署模式，原生 LangChain/LlamaIndex 集成，原型到生产最简路径，16k+ stars。

**适合人群**：构建 RAG 原型并需要平滑扩展的开发者。

## 核心功能

### 1. 零配置启动
pip install 后直接用，自动嵌入。

### 2. 三种模式
内存（原型）→ 本地文件（单用户）→ 客户端服务器（生产）。

### 3. 元数据过滤
按属性和文档内容过滤查询结果。

## 常见问题

**Q: 能扩展到多大？**
A: 客户端服务器模式支持百万嵌入。更大规模用 Qdrant。

## 来源与致谢

> [chroma-core/chroma](https://github.com/chroma-core/chroma) — 16k+ stars, Apache 2.0

---
Source: https://tokrepo.com/en/workflows/1bcbfd04-3daf-4183-8481-0fb0e1eea289
Author: AI Open Source