# Cherry Studio Knowledge Base — Local RAG with 50+ Formats > Cherry Studio Knowledge Base ingests PDFs, Office docs, Markdown into a local vector index. Query offline, BYOK any LLM. Data stays on your machine. ## Install Copy the content below into your project: ## Quick Use 1. Download Cherry Studio from cherry-ai.com 2. Settings → Models → add an embedding model (Ollama nomic-embed-text or OpenAI text-embedding-3-small) 3. Sidebar → Knowledge → New, drag and drop your docs --- ## Intro Cherry Studio Knowledge Base lets the desktop app ingest 50+ file formats into a local vector index — PDFs, Word docs, Markdown, EPUB, even web bookmarks. Query offline using your choice of LLM (OpenAI / Claude / Ollama / etc), with retrieval running locally. Best for: privacy-conscious users who want personal RAG without sending docs to a cloud service. Works with: Cherry Studio 1.4+ on macOS / Windows / Linux. Setup time: 5 minutes. --- ### Build a knowledge base 1. Download Cherry Studio from cherry-ai.com 2. Settings → Models → add an embedding model (Ollama: `nomic-embed-text`, OpenAI: `text-embedding-3-small`, Voyage AI, etc) 3. Sidebar → Knowledge → New Knowledge Base 4. Name it, pick the embedding model, set chunk size (default 1000) 5. Drag and drop files or paste a folder ### Supported formats | Category | Formats | |---|---| | Documents | PDF, DOCX, DOC, RTF, ODT, EPUB | | Office | XLSX, CSV, PPTX | | Code | All text-based source (PY, JS, TS, GO, …) | | Web | URL list (auto-fetches and chunks) | | Markdown | MD, MDX | | Notebook | IPYNB | | Plain | TXT, LOG | ### Query the knowledge base in chat Toggle the knowledge base toggle in any chat. Cherry Studio retrieves top-k relevant chunks per query, prepends them to the LLM prompt with citations. ### Configure retrieval ``` Knowledge Base → ⚙ Settings: Chunk size: 1000 chars Chunk overlap: 200 chars Top-K: 6 chunks per query Rerank: optional (BGE Reranker via Ollama) Threshold: 0.6 (cosine similarity floor) ``` ### Sync vs local-only - **Local-only** (default): Vector store on disk under `~/Library/Application Support/CherryStudio/...` - **Sync** (optional): Push the index to S3-compatible storage (R2, MinIO) for cross-device sync, encrypted with a passphrase only you hold ### When to use Cherry Studio KB vs a hosted RAG | Cherry Studio KB | Pinecone Assistant / similar hosted | |---|---| | Personal docs, sensitive content | Multi-user team docs | | Offline access | Always-online | | One-time payment for the app + your LLM costs | Per-query subscription | | Limited to single device (or DIY sync) | Cross-device by default | --- ### FAQ **Q: Is Cherry Studio free?** A: Yes — Cherry Studio is open-source under Apache-2.0. The app is free; you bring your own LLM API keys and pay only for inference. Local Ollama models are fully free. **Q: Can it handle large PDFs?** A: Yes — large PDFs are chunked at the configured chunk size. A 500-page PDF takes ~1 minute to embed locally with Ollama and produces a few thousand chunks. Search is fast (cosine on a local FAISS-style index). **Q: Does the knowledge base work with images?** A: Mostly text-only currently. PDFs with images are OCR'd via the embedded text layer; image-only pages don't get text content. Image search is on the roadmap but not stable in 1.4. --- ## Source & Thanks > Built by [kangfenmao](https://github.com/kangfenmao). Licensed under Apache-2.0. > > [CherryHQ/cherry-studio](https://github.com/CherryHQ/cherry-studio) — ⭐ 18,000+ --- ## 快速使用 1. 从 cherry-ai.com 下载 Cherry Studio 2. 设置 → 模型 → 加 embedding 模型(Ollama 的 nomic-embed-text 或 OpenAI 的 text-embedding-3-small) 3. 侧边栏 → 知识库 → 新建,拖拽文档进去 --- ## 简介 Cherry Studio Knowledge Base 让桌面应用把 50+ 种文件格式入库到本地向量索引 —— PDF / Word / Markdown / EPUB / 甚至网页书签。用你选的 LLM(OpenAI / Claude / Ollama 等)离线查询,检索在本地跑。适合不想把文档发到云服务的隐私敏感用户的个人 RAG。兼容 macOS / Windows / Linux 的 Cherry Studio 1.4+。装机时间 5 分钟。 --- ### 建知识库 1. 从 cherry-ai.com 下载 Cherry Studio 2. 设置 → 模型 → 加一个 embedding 模型(Ollama 的 `nomic-embed-text`、OpenAI 的 `text-embedding-3-small`、Voyage AI 等) 3. 侧边栏 → 知识库 → 新建知识库 4. 命名、选 embedding 模型、设 chunk size(默认 1000) 5. 拖拽文件或粘贴文件夹 ### 支持的格式 | 类别 | 格式 | |---|---| | 文档 | PDF / DOCX / DOC / RTF / ODT / EPUB | | Office | XLSX / CSV / PPTX | | 代码 | 所有文本源码(PY / JS / TS / GO 等) | | 网页 | URL 列表(自动抓取和切分) | | Markdown | MD / MDX | | 笔记本 | IPYNB | | 纯文本 | TXT / LOG | ### 在聊天里查询知识库 任何聊天里点知识库开关。Cherry Studio 按 query 取 top-k 相关 chunk,带引用前置到 LLM prompt。 ### 配置检索 ``` 知识库 → ⚙ 设置: Chunk size:1000 字符 Chunk overlap:200 字符 Top-K:每查询 6 个 chunk Rerank:可选(通过 Ollama 跑 BGE Reranker) 阈值:0.6(cosine 相似度下限) ``` ### 同步 vs 仅本地 - **仅本地**(默认):向量存储在磁盘 `~/Library/Application Support/CherryStudio/...` - **同步**(可选):把索引推到 S3 兼容存储(R2 / MinIO)做跨设备同步,用只你知道的密码加密 ### 什么时候用 Cherry Studio KB vs 托管 RAG | Cherry Studio KB | Pinecone Assistant / 类似托管 | |---|---| | 个人文档、敏感内容 | 多人团队文档 | | 离线可用 | 永远要在线 | | 一次性付应用 + 你的 LLM 成本 | 按查询订阅 | | 限单设备(或 DIY 同步) | 默认跨设备 | --- ### FAQ **Q: Cherry Studio 免费吗?** A: 免费 —— Cherry Studio Apache-2.0 开源。应用免费,你用自己的 LLM API key,只付推理费。本地 Ollama 模型完全免费。 **Q: 能处理大 PDF 吗?** A: 能 —— 大 PDF 按配置的 chunk size 切。500 页 PDF 用 Ollama 本地 embedding 约 1 分钟出几千个 chunk。检索快(本地 FAISS 风格索引上做 cosine)。 **Q: 知识库支持图片吗?** A: 目前主要文本。带图 PDF 通过嵌入文本层 OCR;纯图片页拿不到文字。图像检索在规划里,1.4 版还不稳。 --- ## 来源与感谢 > Built by [kangfenmao](https://github.com/kangfenmao). Licensed under Apache-2.0. > > [CherryHQ/cherry-studio](https://github.com/CherryHQ/cherry-studio) — ⭐ 18,000+ --- Source: https://tokrepo.com/en/workflows/cherry-studio-knowledge-base-local-rag-with-50-formats Author: Cherry Studio