Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 12, 2026·2 min de lecture

Example RAG App — FastAPI + Langfuse

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 94/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Cli
Installation
Manual
Confiance
Confiance : Established
Point d'entrée
just scaffold
Commande CLI universelle
npx tokrepo install f8fc50fa-3d93-5f58-9a49-51927df86907
Introduction

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

  • Best for: teams that want a clean, testable RAG template with local infra and observability
  • Works with: Python + uv; Docker Compose; FastAPI; Typer; LiteLLM; Langfuse; Qdrant; Redis
  • Setup time: 25–60 minutes

Practical Notes

  • Per README: uses LiteLLM as a proxy to call 100+ providers via the OpenAI library.
  • Local-first infra: just scaffold spins up microservices with docker compose.
  • Dev loop includes Ruff lint/format, Mypy type checks, and unit/integration/e2e tests via just test.

Main

Use this repo as a checklist for “production-shaped” RAG:

  1. Infrastructure as code (local first). Bring up vector DB + cache + observability with one command so every teammate can reproduce issues.
  2. Separation of concerns. Keep ingestion/indexing separate from serving; make the serving API stateless where possible.
  3. Observe retrieval, not just the model. Log: query, retrieved docs, chunk sizes, and latency per stage (retrieve → rerank → generate).
  4. Treat tests as guardrails. Start with unit tests for prompt templates and retrieval filters; add integration tests once infra is stable.

The most common failure mode is “retrieval drift”: the index changes but prompts/tests don’t. Pin your ingest config and re-run evals when you change chunking or filters.

FAQ

Q: Do I need an LLM framework? A: No—README highlights it avoids heavy frameworks and talks to the OpenAI API directly (with LiteLLM as a provider proxy).

Q: Where do I start? A: Run just scaffold, then uv run cli. Once it works, add your own ingest pipeline or adapt the included one.

Q: How do I keep costs under control? A: Track token usage and retrieval payload size; then tighten chunking, dedupe context, and add caching where it matters.

🙏

Source et remerciements

Source: https://github.com/ajac-zero/example-rag-app > License: MIT > GitHub stars: 159 · forks: 24

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires