TOKREPO · ARSENAL
Stable

RAG Pipelines

Quivr, RAGFlow, GraphRAG, plus production best-practices. Skip the bad first-retrieval architecture.

8 assets

What's in this pack

Most teams ship their first RAG demo in a weekend and then spend six months untangling why it gives subtly wrong answers. This pack collects the eight assets that get you past that wall: three production-grade engines, three retrieval/indexing patterns, and two evaluation tools.

# Asset Layer Why it's here
1 Quivr full-stack RAG the "second brain" reference implementation, MIT-licensed
2 RAGFlow full-stack RAG deep document parsing — beats LangChain for tables/forms
3 GraphRAG retrieval Microsoft's knowledge-graph approach for multi-hop questions
4 Chunking patterns indexing semantic vs fixed-size vs recursive — when each wins
5 Hybrid search retrieval BM25 + dense vectors, with reranking
6 Cross-encoder reranker retrieval the single biggest precision lift you can drop in
7 RAG eval harness observability golden-set + LLM-as-judge for nightly regression
8 Citation enforcement guardrails refuse-to-answer when retrieval below threshold

Why this matters

Vector search alone gets you ~70% of demo quality. The last 30% — the part users actually notice — comes from the non-vector layers: how you chunk, how you rerank, how you decide when retrieval failed and the LLM should refuse rather than hallucinate.

Three failure modes show up in every RAG audit we run:

  1. Chunking destroys context. A naïve 512-token split breaks tables in half and orphans headings. RAGFlow's layout-aware parser solves this; pure-LangChain pipelines don't.
  2. Top-k retrieval returns near-duplicates. Cosine similarity loves to surface 5 paraphrases of the same paragraph. A cross-encoder rerank step (BGE-reranker, Cohere Rerank) cuts duplicate-payload by 60%+ on most corpora.
  3. No multi-hop reasoning. Single-shot vector lookup can't answer "compare X across years 2022, 2023, and 2024." GraphRAG builds a knowledge graph at index time so traversal-based answers become possible.

Install in one command

# Install the entire pack
tokrepo install pack/rag-pipelines

# Or pick the engine you want to start with
tokrepo install quivr
tokrepo install ragflow
tokrepo install graphrag

The TokRepo CLI normalizes setup files across the eight supported AI tools, so the engines come pre-configured to slot into your existing Claude Code, Cursor, or Codex CLI project.

Common pitfalls

  • Treating RAG as "embed everything." The cheapest precision win is not indexing low-signal pages. Audit your corpus first; remove duplicates, navigation chrome, and outdated versions.
  • Skipping the rerank step. Adding a cross-encoder rerank on top-50 → top-5 typically lifts answer-correctness by 15-25 points on RAG benchmarks. Skipping it to "save latency" is almost always wrong.
  • No eval harness. If you can't run a golden-set regression, you can't tell whether your last prompt change made things better or worse. Build the eval before you scale the corpus.
  • Storing chunks without parent context. Always keep a pointer back to the source document and adjacent chunks; let the LLM expand if it needs more context.
  • Picking a vector DB before knowing your scale. Pinecone makes sense at 100M+ vectors; below 10M, Qdrant or Chroma on a single VM is faster, cheaper, and easier to debug.

When this pack alone isn't enough

If your bottleneck is ingest quality (PDFs, scans, multi-column layouts), pair this with the Document AI Pipeline pack — Surya/Docling/MinerU clean up the source before chunking. If your bottleneck is evaluation, layer the LLM Eval & Guardrails pack on top: DeepEval, Ragas, and Promptfoo plug into the eval harness here.

For storage: this pack is engine-agnostic — see the Vector DB Showdown pack to choose between Chroma, Weaviate, Pinecone, Qdrant, or txtai based on your latency, cost, and accuracy targets.

INSTALL · ONE COMMAND
$ tokrepo install pack/rag-pipelines
hand it to your agent — or paste it in your terminal
What's inside

8 assets in this pack

Script#01
Quivr — Opinionated RAG Framework for Any LLM

Quivr is an opinionated RAG framework supporting any LLM, multiple file types, and customizable retrieval. 39.1K+ stars. Apache 2.0.

by Script Depot·139 views
$ tokrepo install quivr-opinionated-rag-framework-any-llm-96223597
Script#02
RAGFlow — Deep Document Understanding RAG Engine

Open-source RAG engine with deep document understanding. Parses complex PDFs, tables, images. Agent-powered Q&A with citations. Multi-model. 77K+ stars.

by Script Depot·121 views
$ tokrepo install ragflow-deep-document-understanding-rag-engine-7785d7a8
Skill#03
GraphRAG — Knowledge Graph RAG by Microsoft

Build knowledge graphs from documents for smarter RAG. Local and global search over entity relationships. By Microsoft Research. 31K+ stars.

by Microsoft AI·124 views
$ tokrepo install graphrag-knowledge-graph-rag-microsoft-ac77668d
Script#04
Kotaemon — Open-Source RAG Document Chat

Clean, open-source RAG tool for chatting with your documents. Supports PDF, DOCX, web pages. Multi-model, citation, and multi-user. Self-hostable. 25K+ stars.

by Script Depot·103 views
$ tokrepo install kotaemon-open-source-rag-document-chat-b0f93b10
Config#05
Verba — The Golden RAGtriever by Weaviate

Verba is an open-source RAG (Retrieval-Augmented Generation) chatbot from the Weaviate team. Drop in PDFs, web pages, or notes; pick a model (OpenAI, Ollama, Anthropic); and get a polished chat UI with semantic search built in.

by AI Open Source·99 views
$ tokrepo install verba-golden-ragtriever-weaviate-e0e719be
Prompt#06
RAG Best Practices — Production Pipeline Guide 2026

Comprehensive guide to building production RAG pipelines. Covers chunking strategies, embedding models, vector databases, retrieval techniques, evaluation, and common pitfalls with code examples.

by Prompt Lab·98 views
$ tokrepo install rag-best-practices-production-pipeline-guide-2026-7ded33e8
MCP#07
Tavily — Search API Built for AI Agents & RAG

Search API designed specifically for AI agents and RAG pipelines. Returns clean, LLM-ready results with content extraction, no HTML parsing needed. Official MCP server available. 5,000+ stars.

by MCP Hub·100 views
$ tokrepo install tavily-search-api-built-ai-agents-rag-f73611a0
Script#08
Haystack — AI Orchestration for Search & RAG

Open-source AI orchestration framework by deepset. Build production RAG pipelines, semantic search, and agent workflows with modular components. 25K+ GitHub stars.

by Script Depot·82 views
$ tokrepo install haystack-ai-orchestration-search-rag-761bd107
FAQ

Frequently asked questions

Are these RAG engines free?

Quivr, RAGFlow, and GraphRAG are all open-source under permissive licenses (Apache 2.0 / MIT). You self-host. The only paid components you might add are the embedding API (OpenAI, Cohere, Voyage) and a managed vector DB if you don't want to run your own. A laptop-scale demo costs nothing; a 10M-doc production deployment is dominated by the embedding bill, not the engine.

How does GraphRAG compare to vanilla RAG?

Vanilla RAG retrieves top-k chunks by vector similarity and stuffs them in the prompt — great for single-hop questions like "what is X." GraphRAG builds an entity-relationship graph at index time, so it can answer multi-hop questions like "how did X's role change across these documents." The trade-off: indexing is 5-10x more expensive and slower. Use GraphRAG when your queries are analytical, vanilla RAG when they're factual lookups.

Will this work with Cursor or Codex CLI?

Yes — these are server-side engines, not editor extensions. You run RAGFlow or Quivr as a service, then any AI coding tool that can call HTTP can query it. The TokRepo install drops the docker-compose and config files into your project so the same setup works across Claude Code, Cursor, Codex CLI, Cline, and the rest. The retrieval API is identical.

What's the difference between this pack and the Vector DB Showdown pack?

Vector DB Showdown answers "where do my embeddings live" — Chroma, Qdrant, Pinecone, Weaviate, etc. RAG Pipelines answers "how do I retrieve and rerank from that storage to produce a correct answer." You pick one from each. Most production setups are Qdrant or pgvector underneath, with RAGFlow or a custom pipeline on top.

How do I know if my RAG is actually working?

Build a golden set of 50-200 question-answer pairs from real user queries. Run them nightly. Track three numbers: retrieval recall (did the right chunk appear in top-k), answer correctness (LLM-as-judge against the gold answer), and citation faithfulness (did the answer cite a real retrieved chunk). Without these three, you're flying blind. Pack 28 (LLM Eval & Guardrails) ships the harness.

MORE FROM THE ARSENAL

12 packs · 80+ hand-picked assets

Browse every curated bundle on the home page

Back to all packs