WorkflowsMay 12, 2026·2 min read

Example RAG App — FastAPI + Langfuse

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 94/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Cli
Install
Manual
Trust
Trust: Established
Entrypoint
just scaffold
Universal CLI install command
npx tokrepo install f8fc50fa-3d93-5f58-9a49-51927df86907
Intro

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

  • Best for: teams that want a clean, testable RAG template with local infra and observability
  • Works with: Python + uv; Docker Compose; FastAPI; Typer; LiteLLM; Langfuse; Qdrant; Redis
  • Setup time: 25–60 minutes

Practical Notes

  • Per README: uses LiteLLM as a proxy to call 100+ providers via the OpenAI library.
  • Local-first infra: just scaffold spins up microservices with docker compose.
  • Dev loop includes Ruff lint/format, Mypy type checks, and unit/integration/e2e tests via just test.

Main

Use this repo as a checklist for “production-shaped” RAG:

  1. Infrastructure as code (local first). Bring up vector DB + cache + observability with one command so every teammate can reproduce issues.
  2. Separation of concerns. Keep ingestion/indexing separate from serving; make the serving API stateless where possible.
  3. Observe retrieval, not just the model. Log: query, retrieved docs, chunk sizes, and latency per stage (retrieve → rerank → generate).
  4. Treat tests as guardrails. Start with unit tests for prompt templates and retrieval filters; add integration tests once infra is stable.

The most common failure mode is “retrieval drift”: the index changes but prompts/tests don’t. Pin your ingest config and re-run evals when you change chunking or filters.

FAQ

Q: Do I need an LLM framework? A: No—README highlights it avoids heavy frameworks and talks to the OpenAI API directly (with LiteLLM as a provider proxy).

Q: Where do I start? A: Run just scaffold, then uv run cli. Once it works, add your own ingest pipeline or adapt the included one.

Q: How do I keep costs under control? A: Track token usage and retrieval payload size; then tighten chunking, dedupe context, and add caching where it matters.

🙏

Source & Thanks

Source: https://github.com/ajac-zero/example-rag-app > License: MIT > GitHub stars: 159 · forks: 24

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets