# FlashRAG — Efficient RAG Research Toolkit

> FlashRAG is a Python toolkit for RAG experiments: install `flashrag-dev`, build dense/sparse indexes, and iterate on retrieval configs.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

1. Install (Python 3.10+ per README):
   ```bash
   pip install flashrag-dev --pre
   ```
2. Build a dense index (example command from README; adjust paths):
   ```bash
   python -m flashrag.retriever.index_builder \
     --retrieval_method e5 \
     --model_path /model/e5-base-v2/ \
     --corpus_path indexes/sample_corpus.jsonl \
     --save_dir indexes/ \
     --use_fp16 --max_length 512 --batch_size 256 --pooling_method mean --faiss_type Flat
   ```

## Intro

FlashRAG is a Python toolkit for RAG experiments: install `flashrag-dev`, build dense/sparse indexes, and iterate on retrieval configs.

- **Best for:** RAG teams who want a research-friendly toolkit to benchmark retrieval methods and index builds
- **Works with:** Python 3.10+; optional deps (vLLM, sentence-transformers, pyserini, faiss via conda) per README
- **Setup time:** 25–60 minutes

## Practical Notes

- Quant: install is a single command (`pip install flashrag-dev --pre`) and index building is runnable via `python -m ...` scripts.
- Quant: start with one corpus and run at least **3** retrieval configs (dense, sparse, hybrid) to establish baselines.

## A repeatable RAG experiment loop

FlashRAG is most useful when you treat retrieval work like experiments:

1. **Fix your corpus snapshot** (version it).
2. **Build indexes** with explicit parameters (batch size, pooling, FAISS type).
3. **Evaluate** with a stable question set and record results per run.

## Practical guardrails

- Keep your first index small enough to rebuild in minutes; scale later.
- If you add optional dependencies (faiss, pyserini), write them into your environment file so teammates reproduce the same results.
- Don’t mix “model upgrades” and “retrieval changes” in the same run; change one variable at a time.

### FAQ

**Q: Is this only for dense retrieval?**
A: No. The README covers dense and sparse (BM25) index builds and different backends.

**Q: Why is faiss installed via conda sometimes?**
A: The README notes pip incompatibilities and provides conda install commands.

**Q: What should I do first?**
A: Build a tiny index from the sample corpus format, then run one evaluation loop before scaling up.

## Source & Thanks

> Source: https://github.com/RUC-NLPIR/FlashRAG
> License: MIT
> GitHub stars: 3,484 · forks: 301

---

<!-- ZH -->

## 快速使用

1. 安装（README 标注需 Python 3.10+）：
   ```bash
   pip install flashrag-dev --pre
   ```
2. 构建 dense 索引（README 示例；路径按需替换）：
   ```bash
   python -m flashrag.retriever.index_builder \
     --retrieval_method e5 \
     --model_path /model/e5-base-v2/ \
     --corpus_path indexes/sample_corpus.jsonl \
     --save_dir indexes/ \
     --use_fp16 --max_length 512 --batch_size 256 --pooling_method mean --faiss_type Flat
   ```

## 简介

FlashRAG 是面向 RAG 研究/迭代的 Python 工具箱：安装 `flashrag-dev` 预发布版本后，可构建 dense/sparse 检索索引，并用可复现脚本快速对比不同检索配置。

- **适合谁：** 需要系统化对比检索方法、并做索引构建实验的 RAG 团队
- **可搭配：** Python 3.10+；可选依赖（vLLM、sentence-transformers、pyserini、faiss/conda 等，见 README）
- **准备时间：** 25–60 分钟

## 实战建议

- 量化信息：安装是一条命令（`pip install flashrag-dev --pre`），索引构建可直接用 `python -m ...` 跑脚本复现。
- 量化信息：先固定一个 corpus，至少跑 **3** 组检索配置（dense/sparse/hybrid）建立基线再扩展。

## 可复现的 RAG 实验闭环

FlashRAG 更适合用“实验”的思路来做检索：

1. **固定 corpus 快照**（并做版本化）。
2. **显式参数构建索引**（batch size、pooling、FAISS type 等都写清楚）。
3. **用固定问题集评测**，每次运行都记录结果。

## 实战护栏

- 第一个索引别做太大，确保能在分钟级重建；跑通流程再扩展。
- 加可选依赖（faiss、pyserini）后，把环境写进配置文件，避免团队复现不了。
- 不要一轮里同时升级模型又改检索，尽量“一次只改一个变量”。

### FAQ

**只支持 dense 检索吗？**
答：不是。README 同时覆盖 dense 与 BM25 等 sparse 索引构建。

**为什么 faiss 建议用 conda？**
答：README 提到 pip 安装存在兼容性问题，并给出 conda 安装方式。

**第一步做什么？**
答：先用最小 corpus 按示例格式建一个小索引，跑通一次评测闭环再扩展。

## 来源与感谢

> Source: https://github.com/RUC-NLPIR/FlashRAG
> License: MIT
> GitHub stars: 3,484 · forks: 301


---
Source: https://tokrepo.com/en/workflows/flashrag-efficient-rag-research-toolkit
Author: AI Open Source