Practical Notes
- Quant: install is a single command (
pip install flashrag-dev --pre) and index building is runnable viapython -m ...scripts. - Quant: start with one corpus and run at least 3 retrieval configs (dense, sparse, hybrid) to establish baselines.
A repeatable RAG experiment loop
FlashRAG is most useful when you treat retrieval work like experiments:
- Fix your corpus snapshot (version it).
- Build indexes with explicit parameters (batch size, pooling, FAISS type).
- Evaluate with a stable question set and record results per run.
Practical guardrails
- Keep your first index small enough to rebuild in minutes; scale later.
- If you add optional dependencies (faiss, pyserini), write them into your environment file so teammates reproduce the same results.
- Don’t mix “model upgrades” and “retrieval changes” in the same run; change one variable at a time.
FAQ
Q: Is this only for dense retrieval? A: No. The README covers dense and sparse (BM25) index builds and different backends.
Q: Why is faiss installed via conda sometimes? A: The README notes pip incompatibilities and provides conda install commands.
Q: What should I do first? A: Build a tiny index from the sample corpus format, then run one evaluation loop before scaling up.