Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 12, 2026·2 min de lectura

Example RAG App — FastAPI + Langfuse

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 94/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Cli
Instalación
Manual
Confianza
Confianza: Established
Entrada
just scaffold
Comando CLI universal
npx tokrepo install f8fc50fa-3d93-5f58-9a49-51927df86907
Introducción

A reference RAG app with FastAPI + Typer CLI, local Docker infra, LiteLLM (100+ providers), and Langfuse observability—built to teach best practices.

  • Best for: teams that want a clean, testable RAG template with local infra and observability
  • Works with: Python + uv; Docker Compose; FastAPI; Typer; LiteLLM; Langfuse; Qdrant; Redis
  • Setup time: 25–60 minutes

Practical Notes

  • Per README: uses LiteLLM as a proxy to call 100+ providers via the OpenAI library.
  • Local-first infra: just scaffold spins up microservices with docker compose.
  • Dev loop includes Ruff lint/format, Mypy type checks, and unit/integration/e2e tests via just test.

Main

Use this repo as a checklist for “production-shaped” RAG:

  1. Infrastructure as code (local first). Bring up vector DB + cache + observability with one command so every teammate can reproduce issues.
  2. Separation of concerns. Keep ingestion/indexing separate from serving; make the serving API stateless where possible.
  3. Observe retrieval, not just the model. Log: query, retrieved docs, chunk sizes, and latency per stage (retrieve → rerank → generate).
  4. Treat tests as guardrails. Start with unit tests for prompt templates and retrieval filters; add integration tests once infra is stable.

The most common failure mode is “retrieval drift”: the index changes but prompts/tests don’t. Pin your ingest config and re-run evals when you change chunking or filters.

FAQ

Q: Do I need an LLM framework? A: No—README highlights it avoids heavy frameworks and talks to the OpenAI API directly (with LiteLLM as a provider proxy).

Q: Where do I start? A: Run just scaffold, then uv run cli. Once it works, add your own ingest pipeline or adapt the included one.

Q: How do I keep costs under control? A: Track token usage and retrieval payload size; then tighten chunking, dedupe context, and add caching where it matters.

🙏

Fuente y agradecimientos

Source: https://github.com/ajac-zero/example-rag-app > License: MIT > GitHub stars: 159 · forks: 24

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados