Kotaemon — Open-Source RAG Document Chat
Clean, open-source RAG tool for chatting with your documents. Supports PDF, DOCX, web pages. Multi-model, citation, and multi-user. Self-hostable. 25K+ stars.
Instalación lista para agent
Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.
npx -y tokrepo@latest install b0f93b10-3339-4ca0-ad20-d6335a3d7785 --target codexEjecutar después de confirmar el plan con dry-run.
What it is
Kotaemon is an open-source RAG (retrieval-augmented generation) tool that lets you chat with your documents. Upload PDFs, DOCX files, or web pages, and ask questions in natural language. Kotaemon retrieves relevant passages and generates answers with citations pointing back to the source documents. It supports multiple LLM providers and can be self-hosted.
It targets researchers, analysts, and knowledge workers who need to extract information from large document collections without reading everything manually.
How it saves time or tokens
Kotaemon handles the full RAG pipeline internally: document parsing, chunking, embedding, vector storage, retrieval, and answer generation with citations. Instead of building this stack from individual components, you run a single application. The citation feature is particularly valuable -- you can verify every answer against the source document.
How to use
- Install and run:
pip install kotaemon
python -m kotaemon
Or with Docker:
docker run -p 7860:7860 ghcr.io/cinnamon/kotaemon:latest
- Open http://localhost:7860.
- Configure your LLM provider (OpenAI, Anthropic, Ollama) in Settings.
- Upload documents and start asking questions.
Example
User: What are the main risks identified in the annual report?
Kotaemon: The report identifies three main risks:
1. Currency fluctuation exposure in Asian markets [page 12]
2. Supply chain disruption from single-source dependencies [page 15]
3. Regulatory changes in data privacy requirements [page 23]
[Click citations to view source passages]
Each answer includes clickable citations that link to the exact source passages.
Related on TokRepo
- AI tools for RAG -- RAG tools and frameworks
- AI tools for documents -- document processing tools
Common pitfalls
- Document parsing quality varies by file type. PDFs with complex layouts (multi-column, tables, scanned images) may not parse correctly. Pre-process problematic PDFs with an OCR tool for better results.
- Embedding model choice affects retrieval quality. The default embedding model works for general text. For specialized domains (legal, medical), consider a domain-specific embedding model.
- Large document collections increase storage and retrieval latency. For hundreds of documents, ensure adequate disk space and consider using a more performant vector store backend.
Preguntas frecuentes
Kotaemon supports PDF, DOCX, TXT, Markdown, and web pages (via URL). PDFs are the primary use case and receive the most parsing attention. For other formats, documents are converted to text before processing. Complex formatting in DOCX files is simplified during ingestion.
Yes. Kotaemon supports Ollama and other local LLM providers. Both the chat model and the embedding model can run locally, ensuring no data leaves your machine. This is ideal for sensitive documents. Quality depends on the local model's capability.
When Kotaemon generates an answer, it includes references to the specific document passages it used. Each citation links to the source document and highlights the relevant passage. This lets you verify the answer's accuracy and read the original context. Citations are a core feature, not an add-on.
Yes. Kotaemon supports multi-user access with separate accounts and document collections. Each user can upload their own documents and maintain private conversations. An admin can manage users and configure global settings. For team deployments, use the Docker version with persistent storage.
Both are RAG applications for document chat. Kotaemon focuses on clean document understanding with strong citation support. AnythingLLM is broader, including agents and a plugin system. Kotaemon has a more polished document experience with better PDF handling. AnythingLLM offers more flexibility with its agent and workspace features.
Referencias (3)
- Kotaemon GitHub— Kotaemon repository
- Kotaemon Docs— Kotaemon documentation
- RAG Paper (arXiv)— RAG retrieval-augmented generation concepts
Relacionados en TokRepo
Fuente y agradecimientos
Created by Cinnamon. Licensed under Apache 2.0. Cinnamon/kotaemon — 25,000+ GitHub stars
Discusión
Activos relacionados
Documenso — Open Source Document Signing Platform
Documenso is an open-source DocuSign alternative for self-hosted document signing with PDF e-signatures, audit trails, and Next.js stack.
Reactive Resume — AI-Powered Open-Source Resume Builder
Free open-source resume builder with AI integration. Supports Claude, GPT, Gemini for content generation. Drag-and-drop, PDF export, self-hostable, privacy-first. MIT, 36,000+ stars.
Open WebUI — Self-Hosted AI Chat Interface
User-friendly, self-hosted AI chat interface. Supports Ollama, OpenAI, Anthropic, and any OpenAI-compatible API. RAG, web search, voice, image gen, and plugins. 129K+ stars.
Open-Sora — Open-Source Text-to-Video Generation
Open-source alternative to Sora by HPC-AI Tech. Generate videos from text prompts with an 11B parameter model. Apache 2.0 licensed. 28,800+ stars.