Practical Notes
- Organized into products, tutorials, surveys, benchmarks, and paper sections (see README table of contents).
- Use one benchmark to define your acceptance bar (latency, recall, token budget), then pick an approach.
- Keep a “memory regression set”: 20–50 queries that used to work, to catch drift when you change memory policy.
Main
A selection workflow that actually works:
- Define what “memory” means for your agent: project facts, user preferences, tool state, or long transcripts.
- Decide your constraint triangle: latency, privacy, token budget.
- Pick a baseline approach (summaries + retrieval, vector store, graph/wiki, or hybrid).
- Evaluate on one benchmark + your own domain tasks, then iterate.
The key is avoiding “infinite context”. Good memory systems are selective: they store high-signal facts and can justify why a memory was retrieved.
FAQ
Q: Is vector search enough? A: Sometimes. For coding agents, you often need hybrid memory: durable facts + searchable artifacts + updated summaries.
Q: What’s the first metric to watch? A: Retrieval precision: how often retrieved items actually help the answer. Low precision is the fastest way to waste tokens.
Q: How do I prevent stale memory? A: Attach timestamps and sources; re-validate critical facts periodically and prune memories that don’t get used.