# Example RAG App — FastAPI + Langfuse

> A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

## Install

Copy the content below into your project:

## Quick Use

```bash
git clone https://github.com/ajac-zero/example-rag-app.git
cd example-rag-app

uv tool install just
just scaffold

# Interactive CLI
uv run cli
```

## Intro

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

- **Best for:** teams that want a clean, testable RAG template with local infra and observability
- **Works with:** Python + uv; Docker Compose; FastAPI; Typer; LiteLLM; Langfuse; Qdrant; Redis
- **Setup time:** 25–60 minutes

## Practical Notes

- Per README: uses LiteLLM as a proxy to call 100+ providers via the OpenAI library.
- Local-first infra: `just scaffold` spins up microservices with `docker compose`.
- Dev loop includes Ruff lint/format, Mypy type checks, and unit/integration/e2e tests via `just test`.

## Main

Use this repo as a checklist for “production-shaped” RAG:

1. **Infrastructure as code (local first).** Bring up vector DB + cache + observability with one command so every teammate can reproduce issues.
2. **Separation of concerns.** Keep ingestion/indexing separate from serving; make the serving API stateless where possible.
3. **Observe retrieval, not just the model.** Log: query, retrieved docs, chunk sizes, and latency per stage (retrieve → rerank → generate).
4. **Treat tests as guardrails.** Start with unit tests for prompt templates and retrieval filters; add integration tests once infra is stable.

The most common failure mode is “retrieval drift”: the index changes but prompts/tests don’t. Pin your ingest config and re-run evals when you change chunking or filters.

### FAQ

**Q: Do I need an LLM framework?**
A: No—README highlights it avoids heavy frameworks and talks to the OpenAI API directly (with LiteLLM as a provider proxy).

**Q: Where do I start?**
A: Run `just scaffold`, then `uv run cli`. Once it works, add your own ingest pipeline or adapt the included one.

**Q: How do I keep costs under control?**
A: Track token usage and retrieval payload size; then tighten chunking, dedupe context, and add caching where it matters.

## Source & Thanks

> Source: https://github.com/ajac-zero/example-rag-app
> License: MIT
> GitHub stars: 159 · forks: 24

---

<!-- ZH -->

## 快速使用

```bash
git clone https://github.com/ajac-zero/example-rag-app.git
cd example-rag-app

uv tool install just
just scaffold

# 交互式 CLI
uv run cli
```

## 简介

这是一个 RAG 参考实现：FastAPI + Typer CLI、Docker 本地基础设施、LiteLLM（100+ 供应商）与 Langfuse 可观测性，按最佳实践组织代码、类型检查与测试。

- **适合谁：** 想要一份干净、可测试、带本地基础设施与可观测性的 RAG 模板的团队
- **可搭配：** Python + uv；Docker Compose；FastAPI；Typer；LiteLLM；Langfuse；Qdrant；Redis
- **准备时间：** 25–60 分钟

## 实战建议

- README 标注：用 LiteLLM 代理，通过 OpenAI SDK 适配 100+ 供应商。
- 本地优先：`just scaffold` 用 `docker compose` 拉起依赖微服务。
- 开发闭环：Ruff 格式化/检查、Mypy 类型检查，以及 unit/integration/e2e 测试（`just test`）。

## 主要内容

把它当作“生产化 RAG 清单”：

1. **本地优先的 IaC。** 一条命令拉起向量库/缓存/可观测性，让每个同事能复现问题。
2. **职责拆分。** ingestion/indexing 跟 serving 分开；尽量让 serving API 无状态。
3. **观测检索而不是只看模型。** 记录 query、召回文档、chunk 大小，以及每阶段延迟（retrieve → rerank → generate）。
4. **用测试当护栏。** 先写 prompt 模板与检索过滤的单测；基础设施稳定后补集成测。

最常见故障是“检索漂移”：索引变了但 prompt/测试没跟上。改 chunking 或过滤策略时，把 ingest 配置固定并重跑评测。

### FAQ

**必须用 LLM 框架吗？**
答：不必须。README 强调它不依赖框架，直接调用 OpenAI API，并用 LiteLLM 作为供应商代理。

**从哪一步开始？**
答：先跑 `just scaffold`，再 `uv run cli`；跑通后再替换 ingestion 或按需改造。

**怎么控制成本？**
答：先观测 token 与检索 payload 体积，再收紧 chunking、去重上下文，并在关键路径加缓存。

## 来源与感谢

> Source: https://github.com/ajac-zero/example-rag-app
> License: MIT
> GitHub stars: 159 · forks: 24


---
Source: https://tokrepo.com/en/workflows/example-rag-app-fastapi-langfuse
Author: AI Open Source