Practical Notes
- Per README: uses LiteLLM as a proxy to call 100+ providers via the OpenAI library.
- Local-first infra:
just scaffoldspins up microservices withdocker compose. - Dev loop includes Ruff lint/format, Mypy type checks, and unit/integration/e2e tests via
just test.
Main
Use this repo as a checklist for “production-shaped” RAG:
- Infrastructure as code (local first). Bring up vector DB + cache + observability with one command so every teammate can reproduce issues.
- Separation of concerns. Keep ingestion/indexing separate from serving; make the serving API stateless where possible.
- Observe retrieval, not just the model. Log: query, retrieved docs, chunk sizes, and latency per stage (retrieve → rerank → generate).
- Treat tests as guardrails. Start with unit tests for prompt templates and retrieval filters; add integration tests once infra is stable.
The most common failure mode is “retrieval drift”: the index changes but prompts/tests don’t. Pin your ingest config and re-run evals when you change chunking or filters.
FAQ
Q: Do I need an LLM framework? A: No—README highlights it avoids heavy frameworks and talks to the OpenAI API directly (with LiteLLM as a provider proxy).
Q: Where do I start?
A: Run just scaffold, then uv run cli. Once it works, add your own ingest pipeline or adapt the included one.
Q: How do I keep costs under control? A: Track token usage and retrieval payload size; then tighten chunking, dedupe context, and add caching where it matters.