[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"pack-detail-local-first-ai-en":3,"seo:pack:local-first-ai:en":85},{"code":4,"message":5,"data":6},200,"操作成功",{"pack":7},{"slug":8,"icon":9,"tone":10,"status":11,"status_label":12,"title":13,"description":14,"items":15,"install_cmd":84},"local-first-ai","🔒","#1E40AF","new","New · this week","Local-First AI — Your Data Never Leaves the Laptop","Nine open-source picks for a full AI workflow — chat, RAG over personal docs, coding, transcription, image gen — running entirely on your machine. No OpenAI keys, no token bills, no journal entries in someone's training set.",[16,28,36,43,51,57,64,71,77],{"id":17,"uuid":18,"slug":19,"title":20,"description":21,"author_name":22,"view_count":23,"vote_count":24,"lang_type":25,"type":26,"type_label":27},162,"0eefb7ad-754e-4f35-8967-586ebf4c2a6a","ollama-run-llms-locally-0eefb7ad","Ollama — Run LLMs Locally","Run large language models locally on your machine. Supports Llama 3, Mistral, Gemma, Phi, and dozens more. One-command install, OpenAI-compatible API.","Script Depot",197,0,"en","skill","Skill",{"id":29,"uuid":30,"slug":31,"title":32,"description":33,"author_name":34,"view_count":35,"vote_count":24,"lang_type":25,"type":26,"type_label":27},274,"f493abd9-0870-49b3-a04b-719ee2a5df0f","gpt4all-run-llms-privately-your-desktop-f493abd9","GPT4All — Run LLMs Privately on Your Desktop","GPT4All runs large language models privately on everyday desktops and laptops without GPUs or API calls. 77.2K+ GitHub stars. Desktop app + Python SDK, LocalDocs for private data. MIT licensed.","AI Open Source",225,{"id":37,"uuid":38,"slug":39,"title":40,"description":41,"author_name":22,"view_count":42,"vote_count":24,"lang_type":25,"type":26,"type_label":27},218,"5d37ffb8-d351-4fb1-8665-bef4db25b275","open-webui-self-hosted-ai-chat-interface-5d37ffb8","Open WebUI — Self-Hosted AI Chat Interface","User-friendly, self-hosted AI chat interface. Supports Ollama, OpenAI, Anthropic, and any OpenAI-compatible API. RAG, web search, voice, image gen, and plugins. 129K+ stars.",208,{"id":44,"uuid":45,"slug":46,"title":47,"description":48,"author_name":49,"view_count":50,"vote_count":24,"lang_type":25,"type":26,"type_label":27},163,"8040c0e5-69f3-446b-bfa2-9800b79fcf08","continue-open-source-ai-code-assistant-8040c0e5","Continue — Open-Source AI Code Assistant","Open-source AI code assistant for VS Code and JetBrains. Tab autocomplete, chat, inline editing with any model — OpenAI, Anthropic, Ollama, or self-hosted.","Continue",222,{"id":52,"uuid":53,"slug":54,"title":55,"description":56,"author_name":34,"view_count":17,"vote_count":24,"lang_type":25,"type":26,"type_label":27},323,"4cbd3b7b-5251-4a16-a4ef-d7c1f9600d52","khoj-your-ai-second-brain-4cbd3b7b","Khoj — Your AI Second Brain","Khoj is a personal AI app for chat, search, and knowledge management. 33.8K+ stars. Multi-LLM, docs, Obsidian, WhatsApp, custom agents. AGPL-3.0.",{"id":58,"uuid":59,"slug":60,"title":61,"description":62,"author_name":22,"view_count":63,"vote_count":24,"lang_type":25,"type":26,"type_label":27},270,"24576b2c-a9d1-4f7a-9696-b1e5c50a17f3","faster-whisper-4x-faster-speech-text-24576b2c","Faster Whisper — 4x Faster Speech-to-Text","Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2, up to 4x faster with less memory. 21.8K+ GitHub stars. GPU\u002FCPU, 8-bit quantization, word timestamps, VAD. MIT licensed.",202,{"id":65,"uuid":66,"slug":67,"title":68,"description":69,"author_name":34,"view_count":70,"vote_count":24,"lang_type":25,"type":26,"type_label":27},2101,"3270e558-4080-11f1-9bc6-00163e2b0d79","meetily-privacy-first-ai-meeting-assistant-local-3270e558","Meetily — Privacy-First AI Meeting Assistant with Local Transcription","An open-source, self-hosted AI meeting assistant that provides real-time transcription, speaker diarization, and local summarization using Whisper and Ollama, with no cloud dependency.",146,{"id":42,"uuid":72,"slug":73,"title":74,"description":75,"author_name":34,"view_count":76,"vote_count":24,"lang_type":25,"type":26,"type_label":27},"02888d06-d950-42f4-bc45-960c1f604ee4","comfyui-node-based-ai-image-generation-02888d06","ComfyUI — Node-Based AI Image Generation","The most powerful modular AI image generation GUI with a node\u002Fgraph editor. Supports Stable Diffusion, Flux, SDXL, ControlNet, and 1000+ custom nodes. 107K+ stars.",195,{"id":78,"uuid":79,"slug":80,"title":81,"description":82,"author_name":22,"view_count":83,"vote_count":24,"lang_type":25,"type":26,"type_label":27},1110,"42403801-364b-11f1-9bc6-00163e2b0d79","joplin-privacy-focused-open-source-note-taking-app-42403801","Joplin — Privacy-Focused Open-Source Note Taking App","Joplin is a privacy-focused note taking app with sync capabilities for Windows, macOS, Linux, Android, and iOS. Markdown-based, end-to-end encrypted sync, supports Nextcloud, Dropbox, OneDrive, S3, and WebDAV. The open-source alternative to Evernote.",151,"tokrepo install pack\u002Flocal-first-ai",{"pageType":86,"pageKey":8,"locale":25,"title":87,"metaDescription":88,"h1":89,"tldr":90,"bodyMarkdown":91,"faq":92,"schema":108,"internalLinks":114,"citations":127,"wordCount":140,"generatedAt":141},"pack","Local-First AI — 9 Open-Source Tools That Keep Your Data on Your Laptop","Ollama, GPT4All, Open WebUI, Continue, Khoj, Faster Whisper, Meetily, ComfyUI, Joplin — a full AI stack that runs offline. Chat, RAG over your docs, coding, transcription, image gen. No OpenAI key, no token bills, no journal entries in someone's training set.","Local-First AI — A Full Private AI Workflow on One Laptop","Nine open-source tools to run chat, code, RAG over your own documents, transcription, and image generation entirely on your own machine. Model runner first, then chat UI, then specialized layers — every byte stays local.","## What's in this pack\n\nThis is the rig you build when you've decided your journal, your client recordings, and your half-written code are not going into someone else's training set. Every tool here is **open-source**, **actively maintained**, and runs **with no outbound network call required** once the models are downloaded.\n\nThe motivation is rarely just privacy in the abstract. It's three concrete things stacked: (1) the **monthly token bill** that scales with how curious you are, (2) **terms of service that change**, and (3) the dawning realization that you've been pasting your entire inbox into a chat window owned by a company that openly indexes it. A local stack fixes all three permanently.\n\nThis pack is **not the same** as our `self-hosted-ai` pack — that one is for shipping a SaaS on your own metal (Tabby, Onyx, LibreChat, n8n). This one is for **individuals** who want a private AI on a personal machine, including non-developer tools like meeting transcription and a notes app.\n\n## Install in this order\n\n1. **Ollama** — model runner. Start here. Single command (`curl -fsSL ollama.com\u002Finstall.sh | sh`), pulls models with `ollama pull llama3.1`, exposes an OpenAI-compatible API on `localhost:11434`. Everything downstream points at this.\n2. **GPT4All** — alternative model runner with a GUI. If you don't live in a terminal, install this instead of (or alongside) Ollama. Same job, friendlier surface for non-devs.\n3. **Open WebUI** — the local ChatGPT replacement. Talks to Ollama out of the box, supports multi-turn chat, RAG over uploaded files, web search plugins. This is where 80% of \"I just want to ask the AI something\" happens.\n4. **Continue** — local coding assistant for VS Code and JetBrains. Configure it to call your local Ollama model instead of Copilot's servers. Inline edits, chat, refactor — all on-device. Slower than Copilot, but your private repo never leaves the machine.\n5. **Khoj** — AI second brain. Indexes your Markdown notes, PDFs, org-mode, even Notion exports, then lets you chat with them via local LLM. This is the RAG layer for your *life*, not your codebase.\n6. **Faster Whisper** — speech-to-text. 4x faster than vanilla Whisper, runs on CPU or GPU, OpenAI Whisper accuracy. Drop audio in, get a transcript out. Foundation for the next tool.\n7. **Meetily** — privacy-first meeting assistant. Records, transcribes via Whisper locally, summarizes via your local LLM. Zoom\u002FMeet recordings never touch a cloud.\n8. **ComfyUI** — local image generation via Stable Diffusion. Node-based, fast on Apple Silicon and CUDA, runs SDXL \u002F Flux \u002F SD3 models pulled from Hugging Face. No prompt logging, no content policy, no usage cap.\n9. **Joplin** — privacy-focused note app with optional end-to-end encryption. Where you keep the source material your local AI reads. Markdown, plugins, syncs between devices via your own storage.\n\n## How they fit together\n\n```\n        ┌─────────────────────────────────────┐\n        │   Your laptop (no outbound calls)   │\n        └─────────────────────────────────────┘\n                       │\n  ┌────────────────────┴────────────────────┐\n  │                                          │\nOllama \u002F GPT4All  ◄──── OpenAI-compatible API ────┐\n  (model runner)                                  │\n  │                                                │\n  ├─► Open WebUI  ─── chat in browser              │\n  │                                                │\n  ├─► Continue    ─── code in VS Code              │\n  │                                                │\n  ├─► Khoj        ─── chat with your notes ◄── Joplin\n  │                                                │\n  └─► Meetily     ─── meeting summary ◄── Faster Whisper\n                                                   │\nComfyUI ── standalone (its own model runtime) ─────┘\n```\n\nThe trick is that **all six client tools (Open WebUI, Continue, Khoj, Meetily, plus anything else you wire up) point at the single Ollama endpoint**. You download a model once. Every app reuses it. Disk and RAM are the budgets to watch, not API quota.\n\n## Tradeoffs you'll hit\n\n- **Cloud quality vs local quality** — Be honest: GPT-5 \u002F Claude 4.5 still beat any 8B-quant local model at frontier reasoning, long-context, and code generation on unfamiliar codebases. Local wins on **privacy, latency for short prompts, cost at volume, and offline use**. The right mental model is \"local for 80% of daily work, cloud for the hard 20%\" — not \"local replaces cloud\".\n- **Apple Silicon vs NVIDIA** — Apple Silicon M2\u002FM3\u002FM4 with 32 GB+ RAM runs 13B models comfortably via Metal\u002FMPS. NVIDIA with 16 GB+ VRAM is faster on bigger models but louder, hotter, more expensive. Most of this pack runs well on a $2K Mac; ComfyUI and 70B models start asking for a real GPU.\n- **Quantized vs full precision** — Most Ollama models default to Q4_K_M (4-bit quantization). You lose maybe 2-3% accuracy for 4x less RAM. Always start quantized. Only go full precision if you can measure a quality gap that matters to you.\n\n## Common pitfalls\n\n- **RAM blow-ups** — running Open WebUI + Continue + Khoj simultaneously, each holding a model in memory, will OOM a 16 GB machine. Configure Ollama with `OLLAMA_MAX_LOADED_MODELS=1` and let it page models in and out.\n- **Model files are huge** — Llama 3.1 70B is 40 GB on disk. Plan storage before you `ollama pull` everything that looks interesting. Keep a kill list.\n- **MPS vs CUDA confusion** — most install guides assume NVIDIA. On Apple Silicon, check for the `-metal` or `mps` variant of each tool. ComfyUI in particular needs the right Python wheel.\n- **\"Actually I do need cloud for X\"** — be at peace with it. Routing your frontier-difficulty queries to Claude\u002FGPT through a privacy-aware client (LibreChat with logging off, or just the API with `Bearer` and no organization ID) is a sane hybrid.\n- **Voice assistant ambition** — Meetily + Faster Whisper handle batch transcription beautifully. Real-time conversational voice (sub-500ms latency, interruption) is still genuinely hard locally. Don't promise that to yourself in week one.",[93,96,99,102,105],{"q":94,"a":95},"Is local AI really private if I'm pulling models from Hugging Face \u002F Ollama?","Yes — the model download is a one-time fetch of weights. Once the file is on disk, the model runs entirely offline. No prompt, no document, no transcript is ever sent to Hugging Face or Ollama servers. Verify with Little Snitch or `lsof -i` if you want proof. The trust boundary is the open-source model itself, not the distribution channel.",{"q":97,"a":98},"What hardware do I actually need for this stack?","Comfortable: Apple Silicon Mac with 32 GB unified RAM, or a Windows\u002FLinux box with an NVIDIA GPU with 16 GB+ VRAM. Minimum viable: 16 GB RAM Mac runs 7-8B models and Faster Whisper fine but you'll juggle one model at a time. ComfyUI (image gen) is the most demanding piece; everything else is reasonable on a 4-year-old laptop.",{"q":100,"a":101},"How does this differ from the existing self-hosted-ai pack on TokRepo?","self-hosted-ai is dev-infra-focused: Tabby (coding server), Onyx (RAG-as-a-service), LibreChat (multi-user chat), n8n (workflow automation). It's what you deploy to a server when you want to give your team a private ChatGPT. This pack is the individual angle: Open WebUI for personal chat, Khoj for personal notes RAG, Meetily for your own meetings, ComfyUI for your image gen. Different problem, no overlapping picks.",{"q":103,"a":104},"What about Llama 3 \u002F Mistral \u002F Qwen — which model should I actually pull first?","For chat and general use: `llama3.1:8b-instruct-q4_K_M` (4.7 GB, fast, surprisingly good). For code in Continue: `qwen2.5-coder:7b` (4.7 GB, better at code than Llama for size). For RAG via Khoj: same Llama 3.1 8B works. Skip 70B until you've measured that 8B is actually failing you on real tasks — most people don't need it.",{"q":106,"a":107},"Can I still use Claude or GPT for the hard problems?","Absolutely, and you should. The point of this stack isn't fundamentalism — it's that the *default* should be local. When you hit a problem where 70B-quant clearly fails (deep code refactor across a strange repo, frontier-level reasoning, exotic language), route that one query to a frontier model. Hybrid is the realistic endpoint; pure-local for everything is a hobbyist trap.",{"@context":109,"@type":110,"name":111,"description":112,"numberOfItems":113,"inLanguage":25},"https:\u002F\u002Fschema.org","ItemList","Local-First AI","Nine open-source tools for a full private AI workflow that runs entirely on your own laptop — chat, coding, RAG over personal documents, transcription, and image generation.",9,[115,119,123],{"url":116,"anchor":117,"reason":118},"\u002Fen\u002Flocal-llm","Local LLM runners compared","Deeper comparison of Ollama, llama.cpp, LM Studio and friends",{"url":120,"anchor":121,"reason":122},"\u002Fen\u002Fai-memory","AI memory and personal knowledge layer","Khoj is the entry point — the full pack covers Mem0, Zep, and on-device alternatives",{"url":124,"anchor":125,"reason":126},"\u002Fen\u002Ffeatured","Featured assets on TokRepo","These nine tools live alongside the broader curated catalog",[128,132,136],{"claim":129,"source_name":130,"source_url":131},"Ollama provides one-command local LLM installation and an OpenAI-compatible API","Ollama official site","https:\u002F\u002Follama.com\u002F",{"claim":133,"source_name":134,"source_url":135},"Open WebUI is a self-hosted chat interface that talks to local model runners","Open WebUI","https:\u002F\u002Fopenwebui.com\u002F",{"claim":137,"source_name":138,"source_url":139},"Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2 for 4x speedup","faster-whisper GitHub","https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper",920,"2026-05-22T00:00:00Z"]