# Chroma — Open-Source Embedding Database for AI > Lightweight open-source vector database that runs anywhere. Chroma provides in-memory, local file, and client-server modes for embeddings with zero-config LangChain integration. ## Install Copy the content below into your project: ## Quick Use ```bash pip install chromadb ``` ```python import chromadb client = chromadb.Client() # In-memory # Or: client = chromadb.PersistentClient(path="./chroma_db") collection = client.create_collection("docs") # Add documents (auto-embeds with default model) collection.add( documents=["AI is transforming software", "Python is popular for ML"], ids=["doc1", "doc2"], ) # Query with natural language results = collection.query(query_texts=["machine learning tools"], n_results=2) print(results["documents"]) ``` ## What is Chroma? Chroma is a lightweight, open-source embedding database designed for AI applications. It handles embedding generation, storage, and retrieval in one package. Start with in-memory for prototyping, switch to persistent storage for production — no infrastructure changes needed. First-class integrations with LangChain, LlamaIndex, and OpenAI. **Answer-Ready**: Chroma is an open-source embedding database for AI. Auto-generates embeddings, stores and queries vectors with zero config. In-memory, local file, or client-server modes. Native LangChain/LlamaIndex integration. Simplest path from prototype to production RAG. 16k+ GitHub stars. **Best for**: Developers building RAG prototypes that need to scale. **Works with**: LangChain, LlamaIndex, OpenAI, any embedding model. **Setup time**: Under 1 minute. ## Core Features ### 1. Three Deployment Modes ```python # In-memory (prototyping) client = chromadb.Client() # Local persistent (single-user) client = chromadb.PersistentClient(path="./db") # Client-server (production) # Server: chroma run --path ./db --port 8000 client = chromadb.HttpClient(host="localhost", port=8000) ``` ### 2. Auto-Embedding ```python # Chroma embeds text automatically with default model collection.add(documents=["Hello world"], ids=["1"]) # Or bring your own embeddings collection.add( embeddings=[[0.1, 0.2, 0.3, ...]], documents=["Hello world"], ids=["1"], ) # Or use custom embedding function from chromadb.utils import embedding_functions openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key="sk-...") collection = client.create_collection("docs", embedding_function=openai_ef) ``` ### 3. Metadata Filtering ```python results = collection.query( query_texts=["AI tools"], n_results=5, where={"category": "development"}, where_document={"$contains": "Python"}, ) ``` ### 4. LangChain Integration ```python from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings vectorstore = Chroma.from_documents( documents=docs, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db", ) retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) ``` ## Chroma vs Alternatives | Feature | Chroma | Qdrant | Pinecone | FAISS | |---------|--------|--------|----------|-------| | Self-hosted | Yes | Yes | No | Yes | | Auto-embedding | Yes | No | No | No | | Zero config | Yes | Docker needed | Account needed | Code needed | | Metadata filter | Yes | Advanced | Yes | No | | Managed cloud | Yes | Yes | Yes | No | | Best for | Prototyping → prod | Production scale | Managed scale | Research | ## FAQ **Q: How does it scale?** A: Client-server mode supports millions of embeddings. For billions, consider Qdrant or Pinecone. **Q: Is auto-embedding good enough?** A: Default model (all-MiniLM-L6-v2) is decent for English. For production, use OpenAI or Cohere embeddings. **Q: Can I use it with Claude?** A: Yes, store Claude's outputs as embeddings for retrieval, or use Chroma as context for Claude queries. ## Source & Thanks > Created by [Chroma](https://github.com/chroma-core). Licensed under Apache 2.0. > > [chroma-core/chroma](https://github.com/chroma-core/chroma) — 16k+ stars ## 快速使用 ```bash pip install chromadb ``` 三行代码启动向量数据库,自动嵌入。 ## 什么是 Chroma? 轻量开源嵌入数据库,自动生成嵌入,零配置启动。内存/本地/服务器三种模式,原生 LangChain 集成。 **一句话总结**:开源嵌入数据库,自动嵌入 + 零配置 + 三种部署模式,原生 LangChain/LlamaIndex 集成,原型到生产最简路径,16k+ stars。 **适合人群**:构建 RAG 原型并需要平滑扩展的开发者。 ## 核心功能 ### 1. 零配置启动 pip install 后直接用,自动嵌入。 ### 2. 三种模式 内存(原型)→ 本地文件(单用户)→ 客户端服务器(生产)。 ### 3. 元数据过滤 按属性和文档内容过滤查询结果。 ## 常见问题 **Q: 能扩展到多大?** A: 客户端服务器模式支持百万嵌入。更大规模用 Qdrant。 ## 来源与致谢 > [chroma-core/chroma](https://github.com/chroma-core/chroma) — 16k+ stars, Apache 2.0 --- Source: https://tokrepo.com/en/workflows/1bcbfd04-3daf-4183-8481-0fb0e1eea289 Author: AI Open Source