# Cherry Studio Knowledge Base — Local RAG with 50+ Formats

> Cherry Studio Knowledge Base ingests PDFs, Office docs, Markdown into a local vector index. Query offline, BYOK any LLM. Data stays on your machine.

## Install

Copy the content below into your project:

## Quick Use

1. Download Cherry Studio from cherry-ai.com
2. Settings → Models → add an embedding model (Ollama nomic-embed-text or OpenAI text-embedding-3-small)
3. Sidebar → Knowledge → New, drag and drop your docs

---

## Intro

Cherry Studio Knowledge Base lets the desktop app ingest 50+ file formats into a local vector index — PDFs, Word docs, Markdown, EPUB, even web bookmarks. Query offline using your choice of LLM (OpenAI / Claude / Ollama / etc), with retrieval running locally. Best for: privacy-conscious users who want personal RAG without sending docs to a cloud service. Works with: Cherry Studio 1.4+ on macOS / Windows / Linux. Setup time: 5 minutes.

---

### Build a knowledge base

1. Download Cherry Studio from cherry-ai.com
2. Settings → Models → add an embedding model (Ollama: `nomic-embed-text`, OpenAI: `text-embedding-3-small`, Voyage AI, etc)
3. Sidebar → Knowledge → New Knowledge Base
4. Name it, pick the embedding model, set chunk size (default 1000)
5. Drag and drop files or paste a folder

### Supported formats

| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, RTF, ODT, EPUB |
| Office | XLSX, CSV, PPTX |
| Code | All text-based source (PY, JS, TS, GO, …) |
| Web | URL list (auto-fetches and chunks) |
| Markdown | MD, MDX |
| Notebook | IPYNB |
| Plain | TXT, LOG |

### Query the knowledge base in chat

Toggle the knowledge base toggle in any chat. Cherry Studio retrieves top-k relevant chunks per query, prepends them to the LLM prompt with citations.

### Configure retrieval

```
Knowledge Base → ⚙ Settings:
  Chunk size: 1000 chars
  Chunk overlap: 200 chars
  Top-K: 6 chunks per query
  Rerank: optional (BGE Reranker via Ollama)
  Threshold: 0.6 (cosine similarity floor)
```

### Sync vs local-only

- **Local-only** (default): Vector store on disk under `~/Library/Application Support/CherryStudio/...`
- **Sync** (optional): Push the index to S3-compatible storage (R2, MinIO) for cross-device sync, encrypted with a passphrase only you hold

### When to use Cherry Studio KB vs a hosted RAG

| Cherry Studio KB | Pinecone Assistant / similar hosted |
|---|---|
| Personal docs, sensitive content | Multi-user team docs |
| Offline access | Always-online |
| One-time payment for the app + your LLM costs | Per-query subscription |
| Limited to single device (or DIY sync) | Cross-device by default |

---

### FAQ

**Q: Is Cherry Studio free?**
A: Yes — Cherry Studio is open-source under Apache-2.0. The app is free; you bring your own LLM API keys and pay only for inference. Local Ollama models are fully free.

**Q: Can it handle large PDFs?**
A: Yes — large PDFs are chunked at the configured chunk size. A 500-page PDF takes ~1 minute to embed locally with Ollama and produces a few thousand chunks. Search is fast (cosine on a local FAISS-style index).

**Q: Does the knowledge base work with images?**
A: Mostly text-only currently. PDFs with images are OCR'd via the embedded text layer; image-only pages don't get text content. Image search is on the roadmap but not stable in 1.4.

---

## Source & Thanks

> Built by [kangfenmao](https://github.com/kangfenmao). Licensed under Apache-2.0.
>
> [CherryHQ/cherry-studio](https://github.com/CherryHQ/cherry-studio) — ⭐ 18,000+

---

<!-- ZH -->

## 快速使用

1. 从 cherry-ai.com 下载 Cherry Studio
2. 设置 → 模型 → 加 embedding 模型（Ollama 的 nomic-embed-text 或 OpenAI 的 text-embedding-3-small）
3. 侧边栏 → 知识库 → 新建，拖拽文档进去

---

## 简介

Cherry Studio Knowledge Base 让桌面应用把 50+ 种文件格式入库到本地向量索引 —— PDF / Word / Markdown / EPUB / 甚至网页书签。用你选的 LLM（OpenAI / Claude / Ollama 等）离线查询，检索在本地跑。适合不想把文档发到云服务的隐私敏感用户的个人 RAG。兼容 macOS / Windows / Linux 的 Cherry Studio 1.4+。装机时间 5 分钟。

---

### 建知识库

1. 从 cherry-ai.com 下载 Cherry Studio
2. 设置 → 模型 → 加一个 embedding 模型（Ollama 的 `nomic-embed-text`、OpenAI 的 `text-embedding-3-small`、Voyage AI 等）
3. 侧边栏 → 知识库 → 新建知识库
4. 命名、选 embedding 模型、设 chunk size（默认 1000）
5. 拖拽文件或粘贴文件夹

### 支持的格式

| 类别 | 格式 |
|---|---|
| 文档 | PDF / DOCX / DOC / RTF / ODT / EPUB |
| Office | XLSX / CSV / PPTX |
| 代码 | 所有文本源码（PY / JS / TS / GO 等） |
| 网页 | URL 列表（自动抓取和切分） |
| Markdown | MD / MDX |
| 笔记本 | IPYNB |
| 纯文本 | TXT / LOG |

### 在聊天里查询知识库

任何聊天里点知识库开关。Cherry Studio 按 query 取 top-k 相关 chunk，带引用前置到 LLM prompt。

### 配置检索

```
知识库 → ⚙ 设置：
  Chunk size：1000 字符
  Chunk overlap：200 字符
  Top-K：每查询 6 个 chunk
  Rerank：可选（通过 Ollama 跑 BGE Reranker）
  阈值：0.6（cosine 相似度下限）
```

### 同步 vs 仅本地

- **仅本地**（默认）：向量存储在磁盘 `~/Library/Application Support/CherryStudio/...`
- **同步**（可选）：把索引推到 S3 兼容存储（R2 / MinIO）做跨设备同步，用只你知道的密码加密

### 什么时候用 Cherry Studio KB vs 托管 RAG

| Cherry Studio KB | Pinecone Assistant / 类似托管 |
|---|---|
| 个人文档、敏感内容 | 多人团队文档 |
| 离线可用 | 永远要在线 |
| 一次性付应用 + 你的 LLM 成本 | 按查询订阅 |
| 限单设备（或 DIY 同步） | 默认跨设备 |

---

### FAQ

**Q: Cherry Studio 免费吗？**
A: 免费 —— Cherry Studio Apache-2.0 开源。应用免费，你用自己的 LLM API key，只付推理费。本地 Ollama 模型完全免费。

**Q: 能处理大 PDF 吗？**
A: 能 —— 大 PDF 按配置的 chunk size 切。500 页 PDF 用 Ollama 本地 embedding 约 1 分钟出几千个 chunk。检索快（本地 FAISS 风格索引上做 cosine）。

**Q: 知识库支持图片吗？**
A: 目前主要文本。带图 PDF 通过嵌入文本层 OCR；纯图片页拿不到文字。图像检索在规划里，1.4 版还不稳。

---

## 来源与感谢

> Built by [kangfenmao](https://github.com/kangfenmao). Licensed under Apache-2.0.
>
> [CherryHQ/cherry-studio](https://github.com/CherryHQ/cherry-studio) — ⭐ 18,000+


---
Source: https://tokrepo.com/en/workflows/cherry-studio-knowledge-base-local-rag-with-50-formats
Author: Cherry Studio