Pack RAG pour Documentation d'Équipe
Dix choix pour une équipe construisant une base de connaissances partagée sur Confluence, Notion, Slack et le wiki interne — Onyx comme couche de réponse style Glean, Airweave comme colle de connecteurs, Docmost / PandaWiki comme destinations auto-hébergées, MCPs officiels Notion / Atlassian / Slack, Qdrant comme vector store partagé, Casbin pour le contrôle d'accès par source, et Haystack comme framework de récupération.
What this pack solves
Your team's institutional memory lives in five places: Confluence pages no one updated since the last reorg, Notion docs three PMs each maintain a fork of, Slack threads where the real answer is buried, a README in a 2023 repo, and one engineer's head. New hires waste their first month asking the same questions. The knowledge base is technically there — it just isn't searchable the way humans actually ask.
The goal of this pack is one URL where any teammate asks a plain-English question and gets a cited answer pulled from every source they have permission to read. That category has a paid leader (Glean, ~$40+/seat/month, ships your corpus to their cloud). This pack is the open-source equivalent: self-hostable, runs on a single mid-size VM for teams up to a few hundred, only paid line item is the LLM API at the answer layer (swappable for a local model).
Install in this order
- Onyx — start here. The closest open-source equivalent to Glean: AI chat front-end with 40+ built-in connectors (Confluence, Notion, Slack, Drive, GitHub, Jira), hybrid-search RAG, SSO, RBAC, per-document permission propagation. One
curl ... | bashand you have an answer engine onlocalhost:3000. If Onyx covers your sources, you may be done after step 1. - Airweave — connector glue when Onyx's built-ins miss a source. 50+ integrations exposed as MCP and REST endpoints. Useful upstream of Onyx when one connector framework needs to feed both Onyx and other agents.
- Notion MCP — if your canonical docs live in Notion, the MCP gives Claude Code, Cursor, and any MCP client direct read/write to the same workspace Onyx indexes. Engineers ask the wiki from inside their editor.
- MCP Atlassian — same idea for Confluence + Jira: read pages, query issues, post comments from agent context. Notion + Atlassian MCPs cover the 80% of teams whose docs live in one of those two homes.
- slack-mcp-server — Slack is where real-time decisions happen and they almost never get backported to the wiki. Indexing it is messy (permissions, ephemeral channels, DMs that should never be indexed) but unavoidable if you want "why Postgres over MySQL?" to find the actual thread. OAuth + stealth mode handles auth; scope channels explicitly.
- Docmost — destination wiki when you want to consolidate. Open-source, real-time collaborative, page-tree structure, AGPL. Author new content here; Onyx indexes legacy Confluence/Notion in parallel. Migration stays gradual.
- PandaWiki — alternative destination with AI-first authoring ("ask the wiki to draft this page"). Pick PandaWiki when you want the writing surface itself LLM-assisted; pick Docmost for a traditional Confluence-style editor.
- Qdrant — shared vector store. Onyx ships its own embedded index, fine until you outgrow it or want the same vectors queryable from a custom agent. Qdrant adds production-grade payload filters, hybrid search, snapshots, horizontal scale.
- Casbin — per-source access control. The hardest team-RAG problem isn't retrieval quality — it's keeping the C-suite Confluence space out of an IC's answer. Casbin is the policy framework: roles and source rules in one config, enforced at retrieval across Onyx, Airweave, and any custom agent.
- Haystack — underlying retrieval framework when Onyx isn't flexible enough. Most teams won't need it; if you do, you'll know — branching logic, custom rerankers, hybrid graph+vector. Haystack lets you assemble those and call from a Slack bot or custom UI.
How the stack fits together
[ Confluence ] [ Notion ] [ Slack ] [ Drive ] [ GitHub ] [ Internal wiki ]
│ │ │ │ │ │
└─── MCP ────┴──── Airweave / Onyx connectors ──┴──────────────┘
│
▼
[ chunk + embed ]
│
▼
[ Qdrant ] ◀── shared vector store
│
▼
┌─── Casbin policy gate (per-user, per-source) ───┐
│ │
▼ ▼
[ Onyx chat UI ] [ Slack bot / agent ]
│ │
└──────────── LLM (cloud or self-host) ──────────┘
Authoring side (parallel, optional):
Docmost or PandaWiki ◀── new canonical pages, also indexed
The critical insight: Onyx alone covers most of this graph. Steps 2-10 are additions you reach for when you hit a specific limit — missing connector (Airweave), in-editor agent access (the three MCPs), content-sprawl target (Docmost / PandaWiki), a permission model Onyx's RBAC can't express (Casbin). Don't install all ten on day one. Install Onyx against your three loudest sources, run it a week, and the missing pieces will tell you which of the rest you actually need.
Tradeoffs you'll hit
- Onyx vs Glean — Glean has better polish, slicker Slack bot, zero ops. Onyx has the same connector breadth, runs on your VPC, $0/seat, LLM bill is yours to control. Pick Glean if security review will reject self-host or your team won't tolerate one more service. Pick Onyx otherwise.
- Docmost vs PandaWiki — Docmost is the traditional editor, near-zero retraining from Confluence. PandaWiki bakes AI authoring in ("draft this page from the linked Jira ticket") — great for greenfield, unfamiliar to wiki veterans. Don't run both as canonical; pick one.
- Embedded vector store vs Qdrant — Onyx's built-in index is fine up to ~1M chunks. Past that, query latency and re-index time bite. Qdrant adds one service for stable performance at 10M+ chunks, plus same vectors queryable from custom agents without going through Onyx's API.
- Casbin vs Onyx native RBAC — Onyx native covers per-user, per-document, per-source. Casbin adds policy-as-code for the messy ones: "contractors can read engineering, not HR", "security space C-suite only except during incident response". Most teams start native and migrate to Casbin after the first permission incident.
- MCP servers vs Onyx connectors — Onyx connectors are batch-sync (index every N minutes, query the vector store). MCPs are real-time (agent calls Notion's API on demand). Onyx for heavy retrieval (one answer pulled from 50,000 pages), MCP for precise lookup (this specific Jira ticket, now). Run both layers; they don't conflict.
Common pitfalls
- Indexing Slack DMs. Don't. Scope explicit at connector setup — public channels only, with an opt-out list. One private DM in the wrong answer and trust is gone for a year.
- No deduplication across connectors. Same RFC lives in a Confluence page, a Notion mirror, a GitHub markdown. Without dedup the engine cites three versions of the same doc. Onyx has content-hash dedup — turn it on before the first big sync.
- Embedding the entire corpus before sampling. Full sync of Confluence + Notion + Slack on day one burns $200-500 in OpenAI embedding bills before you've answered a single question. Start with one space / one channel; iterate quality at small scale; expand only after you trust the answers.
- Permissions inferred from a stale identity provider. Six-month-departed people still in the IdP will have their docs indexed under their identity. Sync the IdP first, audit groups, only then turn on the answer engine.
- One LLM for embedding and generation. Different jobs. Cheap embedding model (
text-embedding-3-small, localbge-large) for the corpus; a strong reasoning model (Claude Sonnet, GPT-4-class) only at answer time. - No feedback loop. Most answer engines ship thumbs up/down. Wire to a weekly review — bad answers usually trace to one missing source, one wrong chunk boundary, one bad permission rule. Ten minutes of triage a week separates a trusted KB from a toy.
10 ressources prêtes à installer
Questions fréquentes
How does this stack actually compare to Glean — am I really getting feature parity?
On the answer layer, yes: Onyx ships hybrid-search RAG, custom agents, deep research, and 40+ connectors covering the same sources Glean does. Where Glean wins is polish — the in-Slack bot, the universal browser sidebar, the analytics dashboards. Where this pack wins is total cost (Glean is ~$40-50/seat/month, this stack is the LLM bill plus one VM), data residency (everything stays in your VPC), and extensibility (the MCPs let any agent in the team's editor query the same corpus). For a 50-person team the math is roughly $25K/year saved vs Glean, less the ~10-20 hours/month of ops time.
Do I need all ten tools, or is Onyx enough?
Onyx alone is enough to start. The other nine address specific failure modes: Airweave for missing connectors, the three MCPs for in-editor agent access, Docmost / PandaWiki when you're consolidating away from Confluence, Qdrant when your corpus crosses ~1M chunks, Casbin when permission policy gets messy, Haystack when retrieval needs branching custom logic. Install Onyx in week one, point it at your three loudest sources, and the missing pieces will tell you which others you need.
What about access control — how do I keep the C-suite Confluence space out of an IC's answer?
Three layers. First, source-level: when you configure a connector in Onyx, scope it to channels / spaces / collections an IC role can see. Second, per-document: Onyx and Airweave both propagate source-system permissions (Confluence ACLs, Notion sharing, Slack channel membership) so a doc the user can't read in the source can't appear in an answer. Third, policy: Casbin sits in front of the retrieval API for the cases the first two layers don't cover — contractor groups, time-bounded access, role exceptions. Most teams need all three eventually; start with the first two.
Where does the LLM run — is my corpus going to OpenAI?
Your call. Onyx supports any OpenAI-compatible endpoint plus local options via Ollama / vLLM / LiteLLM. The embedding model and the answer model are configured independently. A common setup: a local embedding model (free, runs on a single GPU) for the corpus, and a cloud reasoning model (Claude / GPT-4) only at answer time with the retrieved chunks as context — the cloud sees the retrieved excerpts, not your full corpus. For the strictest data residency, run a local 70B-class model for both.
How do I keep the index fresh — does Slack get re-synced every hour?
Onyx schedules per-connector. The defaults are reasonable: Slack channels poll every 10 minutes (events API where available), Confluence and Notion re-sync every hour, GitHub on webhook. You can tune each one; the tradeoff is sync cost vs staleness. For Slack specifically, treat it as eventually-consistent: a question asked thirty seconds after the answer was posted may miss. For Confluence and Notion, hourly is plenty — those sources change slowly. Set up a Sentry/Loki alert on connector failures so you find out before users do.
12 packs · 80+ ressources sélectionnées
Découvrez tous les packs curatés sur la page d'accueil
Retour à tous les packs