Memory Layer for Agents
Mem0, Zep, Cognee, and the patterns to make agents remember across sessions — without baking everything into the prompt.
What's in this pack
This pack collects the seven memory-layer assets that show up in every agent that needs to remember things between sessions without re-pasting them into the prompt every time. Three are the canonical libraries. Four are pattern templates that wrap them — patterns Anthropic and OpenAI both surface in their long-running-agent guides.
| # | Asset | Type | What it gives you |
|---|---|---|---|
| 1 | Mem0 | library | Auto-extract & update user facts, drop-in API |
| 2 | Zep | service | Temporal knowledge graph, long-term memory |
| 3 | Cognee | library | Graph + vector hybrid memory pipeline |
| 4 | Episodic-summary pattern | template | Compress long sessions into summary memories |
| 5 | Working-memory scratchpad | template | Inter-step state without prompt bloat |
| 6 | User-fact extractor | template | Pull stable facts from chat into a memory store |
| 7 | Cross-session recall | template | "What did we decide last week?" pattern |
Why this matters
The default Claude / GPT-4 / Gemini setup has zero memory. Every conversation starts fresh. Most apps fake memory by stuffing previous turns into the system prompt — that works for a while, then your context window blows up, your bill triples, and the model loses the plot. Memory layers solve this by storing facts outside the prompt and only injecting the relevant ones per turn.
The three libraries each pick a different bet:
- Mem0 is the easiest. One
mem0.add(messages, user_id=...)call and the library extracts what's worth remembering. Best for chatbot-style apps with a clear user identity. - Zep is the production option. Runs as a service, gives you a temporal knowledge graph (memories with timestamps and relationships), and supports multi-tenant. Best when you need audit trails or memory shared across an org.
- Cognee is the graph-native bet. It models memory as a knowledge graph from day one — useful if your domain is research, code, or anything with strong entity relationships.
The four patterns aren't libraries — they're prompt templates and small adapters that work with any of the three. They're the difference between "I installed Mem0" and "memory actually works in my app."
Install in one command
# Install the entire pack
tokrepo install pack/agent-memory-layer
# Or install one library at a time
tokrepo install mem0
tokrepo install zep
tokrepo install cognee
The TokRepo CLI normalizes file placement: Claude Code subagents into .claude/agents/, Cursor rules into .cursor/rules/, AGENTS.md entries for Codex CLI. The library installs are pip/npm — TokRepo just wires them into your AI tool's config so the agent knows the memory layer exists.
Common pitfalls
- Don't store everything. Memory cost scales with what you write, not what you retrieve. Use a fact extractor (pattern #6) to filter — only durable facts about the user/project belong in long-term memory.
- Don't skip the recency bias. Pure vector recall pulls semantically-similar but stale memories. Zep's temporal graph and Mem0's update-in-place both fix this; if you roll your own, weight by recency or you'll keep retrieving 6-month-old context.
- Don't share user IDs across tenants. All three libraries support per-user namespaces. Use them. Memory leakage between users is a much worse incident than no memory at all.
- Token budget the recall step. Even with a memory layer, you can blow your context window if you set
top_k=50for retrieval. Start attop_k=5and tune up only if recall is missing. - Reconcile on conflict. If the user says "I'm vegetarian" in March and "I'm vegan" in May, you need an update strategy. Mem0 handles this automatically; Zep gives you the conflict surface; Cognee leaves it to you.
Common misconceptions
"RAG and memory are the same thing." They're not. RAG retrieves from a static corpus (docs, codebase). Memory writes new entries based on what the user/agent said and retrieves them later. RAG is read-only; memory is read-write. The patterns in pack/rag-pipelines are different from this pack on purpose.
"I can just use the conversation history." For a 5-turn session, sure. For an app where the same user comes back next week, no — you'd have to feed every prior turn into the prompt forever. Memory extracts the facts and discards the chat.
"Mem0 vs Zep is a hard choice." Most teams use Mem0 first because it's a 5-minute setup, then graduate to Zep when they need multi-tenant or audit. The two are similar enough that migration is a weekend, not a quarter.
7 assets in this pack
Frequently asked questions
Is Mem0 free?
The Mem0 OSS library is MIT-licensed and free to self-host. They also have a managed cloud option with usage-based pricing if you don't want to run the embedding/vector store yourself. Zep has the same OSS + cloud model. Cognee is fully OSS with no managed option as of mid-2026 — you run it yourself.
Will this work in Cursor / Codex CLI / Windsurf?
The libraries are language-level (Python / Node) so they work with any agent framework, not just Claude Code. The TokRepo CLI installs the right config files for each AI tool. Codex CLI users should pair the memory layer with AGENTS.md instructions; Cursor users embed it in the rule set.
How does Mem0 compare to Zep?
Mem0 is library-first — you import it and call .add()/.search() inline. Zep is service-first — you run a server (Docker), it owns the graph, and your app calls the API. Mem0 wins on time-to-first-memory; Zep wins on multi-tenant, audit, and explicit relationship modeling. Pick Mem0 for prototypes, Zep when you have ops support.
What's the difference vs the RAG Pipelines pack?
RAG retrieves from a fixed corpus (your docs, your codebase). Memory writes new facts as the agent runs and retrieves them later. RAG is read-only; memory is read-write and accumulates. Most production agents need both: RAG for static knowledge, memory for the user-specific stuff.
When should I NOT add a memory layer?
When sessions are stateless and short — single-shot tasks like 'summarize this PDF' don't benefit from memory and the layer adds latency. Also skip it for purely factual lookup (use RAG instead). Memory layers are worth their cost when the same user comes back, the agent is multi-step, or both.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs