TOKREPO · ARSENAL
Stable

Self-Hosted AI

Tabby, Onyx, LibreChat, and an n8n starter kit — keep your data on your own metal.

6 assets

What's in this pack

This pack collects the six self-hosted AI assets that consistently show up when teams move off SaaS for compliance, cost, or sovereignty reasons. Three are coding/chat replacements (Tabby, LibreChat, Onyx). Three are infrastructure pieces (n8n AI starter kit, local STT, model gateway).

# Asset Type What it replaces
1 Tabby self-hosted service GitHub Copilot
2 Onyx self-hosted service Glean / enterprise ChatGPT
3 LibreChat self-hosted UI ChatGPT for the team
4 n8n AI starter kit docker-compose Zapier with AI nodes
5 Whisper STT (local) service Otter / Rev / cloud STT
6 Local model gateway service LiteLLM with local-first routing

Why this matters

The default 2026 AI stack assumes you're fine sending your code, chats, and customer data to OpenAI / Anthropic / Google. For most consumer apps that's fine. For regulated industries (health, finance, legal), gov work, or any team where your IP is the product, it's a non-starter. This pack is the assembled answer: a stack you can run on a single workstation or a small Kubernetes cluster that gives you Copilot-equivalent dev tools, ChatGPT-equivalent chat, and enterprise-search-equivalent retrieval — entirely on your own hardware.

The three headline replacements:

  • Tabby is the Copilot stand-in. Self-host it, point your IDE at it, and you get inline code completion backed by whatever local model you load (DeepSeek-Coder, Qwen-Coder, etc). On a single 3090 you'll match Copilot quality on most languages.
  • Onyx (formerly Danswer) is the enterprise-search stand-in. Connect it to your Confluence, Notion, GitHub, Slack, and it builds an internal ChatGPT that answers questions from your docs. Vector + keyword hybrid search, with citations.
  • LibreChat is the team-ChatGPT stand-in. Multi-user, multi-model (works with local Ollama or cloud APIs as a fallback), conversation history, prompt library. The default UI when you want to give your team "a ChatGPT" without paying per seat.

The three infrastructure pieces fill in the gaps. The n8n starter kit gives you Docker compose for n8n + Postgres + Qdrant + a local model — workflow automation on your own metal. Local Whisper means meeting transcripts and voice notes never leave your network. The model gateway routes between local and cloud models so you can fall back to Claude only when local can't answer.

Install in one command

# Install the entire pack
tokrepo install pack/self-hosted-ai

# Or pick the piece you actually need
tokrepo install tabby
tokrepo install onyx
tokrepo install librechat
tokrepo install n8n-ai-starter-kit

The TokRepo CLI installs the docker-compose files, environment templates, and the rule files / subagents for your AI tool that explain when to invoke the local stack vs the cloud. Run docker compose up -d after install and the services are reachable on localhost.

Common pitfalls

  • Don't run a 70B model on 16GB VRAM. Match model size to your GPU. Tabby's DeepSeek-Coder-7B fits on a 12GB card and is plenty for completion. For chat, Qwen-2.5-32B in 4-bit is a sweet spot if you have 24GB.
  • Onyx connectors silently rate-limit. When you point Onyx at a 50k-page Confluence, the initial sync takes hours and some connectors will pause. Watch the logs; don't trust the UI's progress bar in the first 24 hours.
  • n8n + AI workflows leak credentials. The starter kit ships with default Postgres credentials in plaintext. Change them, and bind n8n behind Cloudflare Tunnel or a reverse proxy with auth before exposing it.
  • LibreChat permissions are flat by default. Out of the box every user can see every conversation. Configure RBAC and per-user model whitelisting before you onboard a team.
  • Backups aren't automatic. Self-hosted = self-backup. Schedule pg_dump for LibreChat/Onyx and snapshot the Tabby model cache; budget storage 3× your active dataset for restore points.

Relationship to other packs

This pack pairs naturally with two others. MCP Server Stack gives you the protocol-level connectors (filesystem, browser, database MCP servers) that route through your local model gateway — so even Claude Code can call your local services. LLM Observability matters more here than on cloud APIs because you own the failure surface; Langfuse self-hosted is in that pack and integrates cleanly with Onyx and LibreChat.

If you're starting from zero, install order: 1) LibreChat (immediate user value), 2) Tabby (developer value), 3) Onyx (org-wide search), 4) n8n + gateway when you start building automations on top.

INSTALL · ONE COMMAND
$ tokrepo install pack/self-hosted-ai
hand it to your agent — or paste it in your terminal
What's inside

6 assets in this pack

Script#01
Tabby — Self-Hosted AI Coding Assistant

Self-hosted AI code completion and chat assistant. Privacy-first alternative to GitHub Copilot. Supports 20+ models, repo-aware context, and IDE integrations. 33K+ stars.

by TokRepo Curated·285 views
$ tokrepo install tabby-self-hosted-ai-coding-assistant-1a1d4061
Script#02
whisper.cpp — Local Speech-to-Text in Pure C/C++

High-performance port of OpenAI Whisper in C/C++. No Python, no GPU required. Runs on CPU, Apple Silicon, CUDA, and even Raspberry Pi. Real-time transcription.

by Script Depot·373 views
$ tokrepo install whisper-cpp-local-speech-text-pure-c-c-e1fd7c46
Config#03
Onyx — Self-Hosted AI Chat with 40+ Connectors

Onyx (formerly Danswer) is a self-hosted AI chat with RAG, custom agents, and 40+ knowledge connectors. 20.4K+ stars. Enterprise search. MIT.

by AI Open Source·150 views
$ tokrepo install onyx-self-hosted-ai-chat-40-connectors-210679a0
Config#04
LibreChat — Self-Hosted Multi-AI Chat Platform

LibreChat is a self-hosted AI chat platform unifying Claude, OpenAI, Google, AWS in one interface. 35.1K+ GitHub stars. Agents, MCP, code interpreter, multi-user auth. MIT.

by AI Open Source·111 views
$ tokrepo install librechat-self-hosted-multi-ai-chat-platform-850494fb
Config#05
Self-Hosted AI Starter Kit — Local AI with n8n

Docker Compose template by n8n that bootstraps a complete local AI environment with n8n workflow automation, Ollama LLMs, Qdrant vector database, and PostgreSQL. 14,500+ stars.

by AI Open Source·134 views
$ tokrepo install self-hosted-ai-starter-kit-local-ai-n8n-92d3cc62
Script#06
Typebot — Visual AI Chatbot Builder You Can Self-Host

Build advanced chatbots visually with 34+ blocks. Embed anywhere, collect results in real-time. OpenAI integration, custom themes, analytics. Self-hostable. 9,800+ stars.

by AI Open Source·145 views
$ tokrepo install typebot-visual-ai-chatbot-builder-you-can-self-host-f05a11a5
FAQ

Frequently asked questions

Is Tabby free?

Yes, Tabby is open-source under Apache 2.0 with a free self-hosted Community edition. There's a paid Enterprise tier for SSO, audit logs, and SLAs, but the Community edition is fully featured for individual and small-team use. You only pay for the GPU you run it on. Same model for Onyx, LibreChat, and n8n — all OSS with optional paid tiers.

Will this work with Cursor or Codex CLI instead of Claude Code?

The self-hosted services are tool-agnostic — Tabby exposes a Copilot-compatible API that any IDE supporting Copilot can hit (VS Code, JetBrains, Vim). LibreChat is a web UI so it's tool-independent. The TokRepo CLI installs the AI-tool-specific config (Cursor rules, AGENTS.md, Claude Code subagents) that tells your agent the local services exist.

How does Tabby compare to Cursor with a local model?

Cursor's local-model support is limited to specific endpoints; Tabby is purpose-built for self-hosted code completion with telemetry, model warmup, and a real backend. If you want IDE-agnostic, multi-team self-hosted Copilot, Tabby wins. If you specifically want Cursor's UX with a local model behind it, see the local model gateway in this pack — it can act as a Cursor-compatible endpoint.

What's the difference vs the MCP Server Stack pack?

MCP Server Stack is about protocol-level connectors so AI tools can read your filesystem, browser, database. Self-Hosted AI is about replacing the cloud LLM/UI/IDE assistant entirely with services on your own hardware. They're complementary: the MCP servers can be configured to route through your local model gateway, giving you a fully on-prem agent stack.

When should I NOT self-host?

When latency matters more than sovereignty (real-time voice, sub-300ms code completion against a small model is hard), when your usage is too low to justify a GPU ($100/mo of API calls is cheaper than a 4090 amortized over 3 years), or when you don't have ops support to handle backups, model upgrades, and the inevitable 2 a.m. OOM. Self-hosting is real ops work; budget it.

MORE FROM THE ARSENAL

12 packs · 80+ hand-picked assets

Browse every curated bundle on the home page

Back to all packs