# Example RAG App — FastAPI + Langfuse > A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices. ## Install Copy the content below into your project: ## Quick Use ```bash git clone https://github.com/ajac-zero/example-rag-app.git cd example-rag-app uv tool install just just scaffold # Interactive CLI uv run cli ``` ## Intro A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices. - **Best for:** teams that want a clean, testable RAG template with local infra and observability - **Works with:** Python + uv; Docker Compose; FastAPI; Typer; LiteLLM; Langfuse; Qdrant; Redis - **Setup time:** 25–60 minutes ## Practical Notes - Per README: uses LiteLLM as a proxy to call 100+ providers via the OpenAI library. - Local-first infra: `just scaffold` spins up microservices with `docker compose`. - Dev loop includes Ruff lint/format, Mypy type checks, and unit/integration/e2e tests via `just test`. ## Main Use this repo as a checklist for “production-shaped” RAG: 1. **Infrastructure as code (local first).** Bring up vector DB + cache + observability with one command so every teammate can reproduce issues. 2. **Separation of concerns.** Keep ingestion/indexing separate from serving; make the serving API stateless where possible. 3. **Observe retrieval, not just the model.** Log: query, retrieved docs, chunk sizes, and latency per stage (retrieve → rerank → generate). 4. **Treat tests as guardrails.** Start with unit tests for prompt templates and retrieval filters; add integration tests once infra is stable. The most common failure mode is “retrieval drift”: the index changes but prompts/tests don’t. Pin your ingest config and re-run evals when you change chunking or filters. ### FAQ **Q: Do I need an LLM framework?** A: No—README highlights it avoids heavy frameworks and talks to the OpenAI API directly (with LiteLLM as a provider proxy). **Q: Where do I start?** A: Run `just scaffold`, then `uv run cli`. Once it works, add your own ingest pipeline or adapt the included one. **Q: How do I keep costs under control?** A: Track token usage and retrieval payload size; then tighten chunking, dedupe context, and add caching where it matters. ## Source & Thanks > Source: https://github.com/ajac-zero/example-rag-app > License: MIT > GitHub stars: 159 · forks: 24 --- ## 快速使用 ```bash git clone https://github.com/ajac-zero/example-rag-app.git cd example-rag-app uv tool install just just scaffold # 交互式 CLI uv run cli ``` ## 简介 这是一个 RAG 参考实现:FastAPI + Typer CLI、Docker 本地基础设施、LiteLLM(100+ 供应商)与 Langfuse 可观测性,按最佳实践组织代码、类型检查与测试。 - **适合谁:** 想要一份干净、可测试、带本地基础设施与可观测性的 RAG 模板的团队 - **可搭配:** Python + uv;Docker Compose;FastAPI;Typer;LiteLLM;Langfuse;Qdrant;Redis - **准备时间:** 25–60 分钟 ## 实战建议 - README 标注:用 LiteLLM 代理,通过 OpenAI SDK 适配 100+ 供应商。 - 本地优先:`just scaffold` 用 `docker compose` 拉起依赖微服务。 - 开发闭环:Ruff 格式化/检查、Mypy 类型检查,以及 unit/integration/e2e 测试(`just test`)。 ## 主要内容 把它当作“生产化 RAG 清单”: 1. **本地优先的 IaC。** 一条命令拉起向量库/缓存/可观测性,让每个同事能复现问题。 2. **职责拆分。** ingestion/indexing 跟 serving 分开;尽量让 serving API 无状态。 3. **观测检索而不是只看模型。** 记录 query、召回文档、chunk 大小,以及每阶段延迟(retrieve → rerank → generate)。 4. **用测试当护栏。** 先写 prompt 模板与检索过滤的单测;基础设施稳定后补集成测。 最常见故障是“检索漂移”:索引变了但 prompt/测试没跟上。改 chunking 或过滤策略时,把 ingest 配置固定并重跑评测。 ### FAQ **必须用 LLM 框架吗?** 答:不必须。README 强调它不依赖框架,直接调用 OpenAI API,并用 LiteLLM 作为供应商代理。 **从哪一步开始?** 答:先跑 `just scaffold`,再 `uv run cli`;跑通后再替换 ingestion 或按需改造。 **怎么控制成本?** 答:先观测 token 与检索 payload 体积,再收紧 chunking、去重上下文,并在关键路径加缓存。 ## 来源与感谢 > Source: https://github.com/ajac-zero/example-rag-app > License: MIT > GitHub stars: 159 · forks: 24 --- Source: https://tokrepo.com/en/workflows/example-rag-app-fastapi-langfuse Author: AI Open Source