RAG Pipelines
Quivr, RAGFlow, GraphRAG, plus production best-practices. Skip the bad first-retrieval architecture.
What's in this pack
Most teams ship their first RAG demo in a weekend and then spend six months untangling why it gives subtly wrong answers. This pack collects the eight assets that get you past that wall: three production-grade engines, three retrieval/indexing patterns, and two evaluation tools.
| # | Asset | Layer | Why it's here |
|---|---|---|---|
| 1 | Quivr | full-stack RAG | the "second brain" reference implementation, MIT-licensed |
| 2 | RAGFlow | full-stack RAG | deep document parsing — beats LangChain for tables/forms |
| 3 | GraphRAG | retrieval | Microsoft's knowledge-graph approach for multi-hop questions |
| 4 | Chunking patterns | indexing | semantic vs fixed-size vs recursive — when each wins |
| 5 | Hybrid search | retrieval | BM25 + dense vectors, with reranking |
| 6 | Cross-encoder reranker | retrieval | the single biggest precision lift you can drop in |
| 7 | RAG eval harness | observability | golden-set + LLM-as-judge for nightly regression |
| 8 | Citation enforcement | guardrails | refuse-to-answer when retrieval below threshold |
Why this matters
Vector search alone gets you ~70% of demo quality. The last 30% — the part users actually notice — comes from the non-vector layers: how you chunk, how you rerank, how you decide when retrieval failed and the LLM should refuse rather than hallucinate.
Three failure modes show up in every RAG audit we run:
- Chunking destroys context. A naïve 512-token split breaks tables in half and orphans headings. RAGFlow's layout-aware parser solves this; pure-LangChain pipelines don't.
- Top-k retrieval returns near-duplicates. Cosine similarity loves to surface 5 paraphrases of the same paragraph. A cross-encoder rerank step (BGE-reranker, Cohere Rerank) cuts duplicate-payload by 60%+ on most corpora.
- No multi-hop reasoning. Single-shot vector lookup can't answer "compare X across years 2022, 2023, and 2024." GraphRAG builds a knowledge graph at index time so traversal-based answers become possible.
Install in one command
# Install the entire pack
tokrepo install pack/rag-pipelines
# Or pick the engine you want to start with
tokrepo install quivr
tokrepo install ragflow
tokrepo install graphrag
The TokRepo CLI normalizes setup files across the eight supported AI tools, so the engines come pre-configured to slot into your existing Claude Code, Cursor, or Codex CLI project.
Common pitfalls
- Treating RAG as "embed everything." The cheapest precision win is not indexing low-signal pages. Audit your corpus first; remove duplicates, navigation chrome, and outdated versions.
- Skipping the rerank step. Adding a cross-encoder rerank on top-50 → top-5 typically lifts answer-correctness by 15-25 points on RAG benchmarks. Skipping it to "save latency" is almost always wrong.
- No eval harness. If you can't run a golden-set regression, you can't tell whether your last prompt change made things better or worse. Build the eval before you scale the corpus.
- Storing chunks without parent context. Always keep a pointer back to the source document and adjacent chunks; let the LLM expand if it needs more context.
- Picking a vector DB before knowing your scale. Pinecone makes sense at 100M+ vectors; below 10M, Qdrant or Chroma on a single VM is faster, cheaper, and easier to debug.
When this pack alone isn't enough
If your bottleneck is ingest quality (PDFs, scans, multi-column layouts), pair this with the Document AI Pipeline pack — Surya/Docling/MinerU clean up the source before chunking. If your bottleneck is evaluation, layer the LLM Eval & Guardrails pack on top: DeepEval, Ragas, and Promptfoo plug into the eval harness here.
For storage: this pack is engine-agnostic — see the Vector DB Showdown pack to choose between Chroma, Weaviate, Pinecone, Qdrant, or txtai based on your latency, cost, and accuracy targets.
8 assets in this pack
Frequently asked questions
Are these RAG engines free?
Quivr, RAGFlow, and GraphRAG are all open-source under permissive licenses (Apache 2.0 / MIT). You self-host. The only paid components you might add are the embedding API (OpenAI, Cohere, Voyage) and a managed vector DB if you don't want to run your own. A laptop-scale demo costs nothing; a 10M-doc production deployment is dominated by the embedding bill, not the engine.
How does GraphRAG compare to vanilla RAG?
Vanilla RAG retrieves top-k chunks by vector similarity and stuffs them in the prompt — great for single-hop questions like "what is X." GraphRAG builds an entity-relationship graph at index time, so it can answer multi-hop questions like "how did X's role change across these documents." The trade-off: indexing is 5-10x more expensive and slower. Use GraphRAG when your queries are analytical, vanilla RAG when they're factual lookups.
Will this work with Cursor or Codex CLI?
Yes — these are server-side engines, not editor extensions. You run RAGFlow or Quivr as a service, then any AI coding tool that can call HTTP can query it. The TokRepo install drops the docker-compose and config files into your project so the same setup works across Claude Code, Cursor, Codex CLI, Cline, and the rest. The retrieval API is identical.
What's the difference between this pack and the Vector DB Showdown pack?
Vector DB Showdown answers "where do my embeddings live" — Chroma, Qdrant, Pinecone, Weaviate, etc. RAG Pipelines answers "how do I retrieve and rerank from that storage to produce a correct answer." You pick one from each. Most production setups are Qdrant or pgvector underneath, with RAGFlow or a custom pipeline on top.
How do I know if my RAG is actually working?
Build a golden set of 50-200 question-answer pairs from real user queries. Run them nightly. Track three numbers: retrieval recall (did the right chunk appear in top-k), answer correctness (LLM-as-judge against the gold answer), and citation faithfulness (did the answer cite a real retrieved chunk). Without these three, you're flying blind. Pack 28 (LLM Eval & Guardrails) ships the harness.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs