Base de Conocimiento Personal — RAG sobre tus Notas, Diario y PDFs
Diez picks open-source para apuntar una IA a tus notas, diario y archivo de PDFs — de forma privada. App de notas, puente MCP/agent, indexador local, OCR, gestor de citas. El 'segundo cerebro' que de verdad te pertenece.
What's in this pack
This is the stack you build when you want an AI to read your notes — daily journal, meeting notes, half-finished essays, the PDFs you've been hoarding since grad school — and answer questions about them without uploading any of it to a vendor's cloud.
It's a different problem from chatting with a frontier model. There the goal is reasoning power; here the goal is recall over your private corpus. You don't need GPT-5 — you need plumbing: a notes app that stores plain Markdown, an indexer that can read every format you've ever saved, an MCP or agent bridge that lets the AI query the vault, and a chat layer to ask questions.
This pack is not the same as our local-first-ai pack, which assembles a general private AI rig (Ollama, Open WebUI, ComfyUI, image gen, transcription). This one is narrowly scoped to the personal RAG layer: where your notes live, how they get indexed, how an agent reaches them.
Install in this order
- Logseq — outliner-style, privacy-first notes app. Local Markdown files, optional encrypted sync, daily journal as a first-class concept. Start here if you don't already have a notes app you trust. The reason to pick Logseq over Notion or Bear: every note is a plain file on your disk that any tool downstream can read.
- Obsidian Agent Client — if you're already an Obsidian user, this plugin brings ACP-based agent integrations (Claude Code, Codex, Gemini) directly into the editor. Your vault becomes the context window for the agent. Install via BRAT or releases.
- Obsidian MCP Tools — the other Obsidian path: instead of bringing agents into Obsidian, expose Obsidian to Claude Desktop as a local MCP server. Claude can now query your vault, search notes, run dataview queries. Pick this if Claude Desktop is your daily driver.
- joplin-mcp — same idea for Joplin users. MCP server that gives any Claude-Desktop-compatible client read/write access to your Joplin notes. Joplin's optional E2EE plus local-only MCP means the notes never leave the device.
- Cherry Studio Knowledge Base — the all-in-one option. A desktop AI client with a local RAG engine that ingests 50+ formats (PDF, Markdown, DOCX, EPUB, web pages, even Notion exports), embeds them locally, and serves them to whichever LLM you wire up. If you want one app, not five, start here.
- Trilium Notes — hierarchical KB with scripting, attribute search, and a true tree structure. Picks up where Logseq's flat-graph model leaves off. The right call if you think in outlines and want every note to be queryable as structured data.
- Blinko — self-hosted personal AI note-taking with RAG already baked in. You don't wire up an indexer; the app comes with one. Closer to "a private NotebookLM" than "Obsidian with plugins". Trade-off: less customizable, more turn-key.
- Zotero — research source manager. The bridge between your notes and the academic / PDF world. Auto-extracts metadata from papers, builds a searchable library, exposes a local API that downstream RAG tools (Cherry Studio, custom scripts) can index.
- Paperless-ngx — self-hosted document management with OCR. The piece most knowledge-base setups forget: every paper bank statement, contract, and receipt scanned, OCR'd, tagged, indexed. Now your AI can answer "what was my electric bill in March 2024?" without you opening a single PDF.
- Memos — lightweight self-hosted note-taking. Twitter-style short captures, tagged and searchable. The journaling-by-microblog layer that catches the thoughts too small to belong in Logseq but too important to lose.
How they fit together
┌────────────────────────────────────────────────────────┐
│ Your private corpus (everything stays on disk) │
└────────────────────────────────────────────────────────┘
│ │ │ │
Logseq Obsidian Joplin Trilium
Memos (or Blinko, all-in-one)
│ │ │ │
│ ├─ Obsidian Agent Client (ACP in editor)
│ ├─ Obsidian MCP Tools ──► Claude Desktop
│ │
│ └─ joplin-mcp ──► Claude Desktop
│
└──► Cherry Studio Knowledge Base ──┐
│
Zotero (papers) ─────────────────────► ├──► Local LLM / Claude / GPT
Paperless-ngx (scanned PDFs) ─────► │ (your choice of model)
▼
Chat with your KB
The pattern: a notes app holds the raw text, a bridge layer (MCP server, agent plugin, or built-in RAG engine) makes it queryable, and an LLM client does the asking. You don't need every tool in this pack — pick the row matching the notes app you already use, then add Zotero and Paperless-ngx if your knowledge isn't all in Markdown.
Tradeoffs you'll hit
- MCP bridge vs in-app agent vs all-in-one — Three architectures, all valid. MCP bridge (Obsidian MCP Tools, joplin-mcp) keeps your existing notes app and lets Claude Desktop reach in; best for power users with one favorite client. In-app agent (Obsidian Agent Client, Blinko) puts the AI inside the editor; best when you want the answer next to the source. All-in-one (Cherry Studio, Blinko) bundles indexer + chat + model in one app; best when you don't want to maintain three tools.
- Embedding quality vs setup pain — Cheap path: use the notes app's built-in indexer (Blinko, Cherry Studio). Better-quality path: BGE-M3 or nomic-embed-text via Ollama, then point a custom RAG pipeline at your vault. Most people overestimate how much retrieval quality matters for personal notes — the corpus is small and you usually remember roughly where the answer lives. Start simple.
- Frontier model vs local model for the chat layer — RAG with Claude 4.5 or GPT-5 over your notes gives the best synthesis quality. RAG with a local Llama 3.1 8B over your notes keeps the journal 100% private but synthesis is noticeably weaker. Hybrid is fine: route the retrieval locally (so the embeddings of your notes never leave the machine), then send only the top-3 chunks plus your question to a frontier model.
- PDFs are the silent killer — Markdown notes index in seconds. Scanned PDFs need OCR (Paperless-ngx), academic PDFs need layout-aware extraction (Zotero handles citation metadata; for full-text RAG you may want GROBID or unstructured.io on top). Plan a separate pass for any non-text source.
Common pitfalls
- Indexing everything at once — A 10,000-file vault embedded against a poor chunking strategy gives you 10,000 useless 4-line snippets. Start with one sub-folder (say, your meeting notes from this year), measure retrieval quality, then expand.
- Daily-journal noise drowns out actual knowledge — If you also dump 2 KB of daily standup notes into the same vault, retrieval will surface yesterday's todos every time. Separate the corpora: daily journal in Logseq, evergreen notes in Obsidian, and only index the evergreen vault.
- MCP servers asking for too much — MCP gives the AI a lot of power. Read each server's permissions before installing. joplin-mcp and Obsidian MCP Tools both default to read+write; start with read-only until you trust the workflow.
- OCR quality on bad scans — Paperless-ngx is good but not magic. Phone-camera receipts at 30 degrees come back as gibberish. Use a flatbed scanner or the iOS Notes scan feature (auto-flattens) for anything you'll actually want to retrieve.
- Forgetting to back up the vault — The whole point is that this is your second brain. Encrypt and back it up to your own storage (Syncthing, a private rsync target, Joplin's server). Don't trust a single laptop with a decade of journal entries.
10 recursos listos para instalar
Preguntas frecuentes
I already use Obsidian — which of these do I actually need?
If Obsidian is your home, you have two paths and both are in this pack. Path A: install Obsidian Agent Client and bring the agent into the editor — best when you want the AI answer to live next to the note you're writing. Path B: install Obsidian MCP Tools and let Claude Desktop (or any MCP client) query your vault from outside — best when you want a separate chat surface and the vault is just the knowledge source. Most heavy users eventually run both. Skip the other notes apps in this pack; they're for people not already on Obsidian.
Is this actually private if I'm using Claude or GPT as the chat layer?
Partially. The local indexer (Cherry Studio's RAG engine, the MCP server, the notes file system) keeps your full corpus on disk. When you ask a question, only the top-K retrieved chunks plus your prompt go to the model provider. That's vastly less data than uploading your whole vault to ChatGPT — but it's not zero. For full privacy, route the chat layer to a local model via Ollama (see our local-first-ai pack). The realistic middle path: use Claude/GPT for synthesis, and just never put truly sensitive content (medical, legal, personal) in the indexed folders.
How does this differ from the existing local-first-ai pack?
local-first-ai is a full personal AI rig: chat (Open WebUI), code (Continue), image gen (ComfyUI), transcription (Faster Whisper), plus Khoj and Joplin as the knowledge layer. This pack is narrowly the personal-RAG slice and goes deeper there: multiple notes-app options (Logseq, Obsidian, Trilium, Blinko, Memos), the MCP bridges that let Claude Desktop reach your vault (Obsidian MCP Tools, joplin-mcp), document scanning (Paperless-ngx), and academic research (Zotero). No model runner — you bring your own from local-first-ai, or you point Cherry Studio at any API.
Can I use this with PDFs and scanned paper documents, not just Markdown?
Yes, that's why Paperless-ngx and Zotero are in the pack. Paperless-ngx runs OCR on scanned receipts, contracts, statements, and tax docs, then exposes a searchable index. Zotero handles academic PDFs, extracts metadata, and stores full text. Cherry Studio Knowledge Base can ingest both formats directly. For more exotic formats (EPUB, DOCX, web archives) Cherry Studio handles 50+ types out of the box. The pattern: every source format eventually becomes text, every text becomes embeddings, every embedding becomes searchable. PDFs are just the slowest first step.
What's the minimum viable setup if I don't want to install all ten tools?
Three tools, in order: (1) the notes app you'll actually use daily — pick Logseq if starting fresh, otherwise keep Obsidian or Joplin; (2) the bridge layer matching that app — Obsidian Agent Client or Obsidian MCP Tools for Obsidian, joplin-mcp for Joplin, Cherry Studio if you want a single app that does everything; (3) Paperless-ngx if you have a stack of paper documents you want searchable. That's the smallest working personal RAG. Add Zotero only if you're a researcher, Memos only if microblog-style capture fits your brain, Trilium or Blinko only if you outgrow your current notes app.
12 packs · 80+ recursos seleccionados
Explora todos los packs curados en la página principal
Volver a todos los packs