Chroma — Open-Source Vector Database for AI
Chroma is the open-source vector database and data infrastructure for AI applications. 27.1K+ GitHub stars. Simple 4-function API for embedding, storing, and querying documents. Supports Python, JavaS
Instalación lista para agent
Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.
npx -y tokrepo@latest install 04367306-be4a-4f46-854d-dd2b4d0d429e --target codexEjecutar después de confirmar el plan con dry-run.
What it is
Chroma is an open-source vector database designed as the data infrastructure layer for AI applications. It provides a minimal API -- create a collection, add documents, query by similarity, and retrieve by ID -- that handles tokenization, embedding, and indexing automatically. Chroma supports Python, JavaScript/TypeScript, Go, and Rust clients.
Chroma is built for developers creating RAG pipelines, semantic search engines, recommendation systems, or any AI application that needs vector similarity search. It works well with Claude, OpenAI, and other LLM providers as the retrieval backend.
How it saves time or tokens
Chroma eliminates the complexity of managing embeddings manually. You pass raw text documents to collection.add() and Chroma handles embedding generation, indexing, and storage. When querying, you pass a natural-language query and get back the most semantically similar documents. This reduces the amount of context you need to pass to an LLM, directly cutting token consumption in RAG workflows. The simple API means less code to write and fewer tokens spent generating database logic.
How to use
- Install the client:
pip install chromadb(Python) ornpm install chromadb(JavaScript). - Create a client and collection:
client = chromadb.Client()thencollection = client.create_collection('docs'). - Add documents with
collection.add()and query withcollection.query()using natural language.
Example
import chromadb
client = chromadb.Client()
collection = client.create_collection('my_docs')
# Add documents (Chroma embeds them automatically)
collection.add(
documents=[
'AI agents can browse the web autonomously',
'Vector databases store embeddings for similarity search',
'RAG retrieves relevant context before generating answers'
],
ids=['doc1', 'doc2', 'doc3']
)
# Query by natural language
results = collection.query(
query_texts=['How do retrieval systems work?'],
n_results=2
)
print(results['documents'])
Related on TokRepo
- RAG tools -- retrieval-augmented generation tools and frameworks
- AI tools for coding -- developer tools enhanced with AI capabilities
Common pitfalls
- Using the default in-memory client for production. For persistent storage, use
chromadb.PersistentClient(path='./chroma_data')or connect to a Chroma server. - Not specifying an embedding function when your documents need a specific model. Chroma uses a default model but you can pass a custom one via
embedding_functionparameter. - Adding documents without meaningful IDs. Chroma uses IDs for deduplication and updates, so random UUIDs make it hard to update or delete specific documents later.
Preguntas frecuentes
Chroma uses a default embedding model out of the box (Sentence Transformers). You can plug in OpenAI, Cohere, HuggingFace, or any custom embedding function by passing it when creating a collection. This makes it model-agnostic.
Chroma offers a client-server mode where the server runs separately and handles persistence, indexing, and queries. For large-scale deployments, Chroma Cloud provides managed infrastructure. The open-source server handles millions of embeddings on a single node.
Chroma focuses on simplicity with its 4-function API and runs locally with zero configuration. Pinecone is fully managed and cloud-only. Weaviate offers more features like GraphQL queries and hybrid search. Chroma is the easiest to get started with for prototyping.
Yes. You can attach metadata dictionaries to documents and filter queries using where clauses. For example, filter by category, date range, or any custom field while still ranking by vector similarity.
Yes. Chroma is commonly used as the retrieval backend in RAG pipelines. You query Chroma for relevant documents, then pass those documents as context to Claude, GPT, or any LLM to generate grounded answers.
Referencias (3)
- Chroma GitHub— Chroma is the open-source embedding database with 27K+ stars
- Chroma Documentation— Simple 4-function API: create, add, query, get
- Chroma Docs— Supports Python, JavaScript, Go, and Rust clients
Relacionados en TokRepo
Fuente y agradecimientos
Created by Chroma. Licensed under Apache 2.0. chroma-core/chroma — 27,100+ GitHub stars
Discusión
Activos relacionados
NocoDB — Open Source No-Code Database Platform
NocoDB turns any SQL database into a smart spreadsheet with REST APIs. Open-source Airtable alternative with views, automations, and team collaboration.
Reactive Resume — AI-Powered Open-Source Resume Builder
Free open-source resume builder with AI integration. Supports Claude, GPT, Gemini for content generation. Drag-and-drop, PDF export, self-hostable, privacy-first. MIT, 36,000+ stars.
Twenty — Open-Source AI CRM (Salesforce Alternative)
Modern open-source CRM with AI features. Custom objects, kanban views, email sync, workflow automation. NestJS + React + PostgreSQL. AGPL-3.0, 43,700+ stars.
Plane — Open-Source AI Project Management
Open-source Jira/Linear alternative with AI-powered pages. Issues, sprints, modules, roadmaps, and real-time analytics. Self-hostable via Docker. AGPL-3.0, 47,500+ stars.