Mem0 — Memory Layer for AI Applications
Add persistent, personalized memory to AI agents and assistants. Mem0 stores user preferences, past interactions, and learned context across sessions.
Ready-to-run agent install
This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.
npx -y tokrepo@latest install 96da1f40-1823-4d87-a84f-7d8269edeb24 --target codexRun after dry-run confirms the install plan.
The context-window problem, solved at the storage layer
A fresh LLM session knows nothing about you. Every conversation starts from zero — your name, your stack, your preferences, all re-introduced every time. Mem0 inserts a persistence layer between your application and the LLM so that conversational state, user facts, and long-term context survive sessions.
The 10-line integration
from mem0 import Memory
m = Memory()
# Remember something
m.add("Alice prefers TypeScript over JavaScript", user_id="alice")
m.add("Alice's project uses PostgreSQL and Redis", user_id="alice")
# Retrieve for the next LLM call
hits = m.search("what database does alice use?", user_id="alice")
# [{'memory': 'Uses PostgreSQL and Redis', 'score': 0.95}]
That's it. Mem0 handles extraction, deduplication, embedding, and semantic retrieval under the hood. Supports OpenAI, Claude, Gemini, and any LiteLLM-compatible model.
Architecture in 30 seconds
- Fact extraction — an LLM extracts atomic facts from each interaction (e.g., "user prefers X").
- Dedup & update — new facts are compared to existing memory; duplicates collapse, contradictions trigger updates.
- Storage — facts stored as text + embedding in your vector DB (Qdrant, Pinecone, pgvector, Chroma).
- Retrieval — at query time, the top-k relevant facts are pulled and injected into the LLM prompt.
Self-hosted vs cloud
| Mode | Setup | Cost | Best for |
|---|---|---|---|
| Open-source self-hosted | pip install mem0ai + your vector DB | $0 | Privacy-sensitive apps |
| Mem0 Platform (SaaS) | API key, managed infra | $19+/mo | Startups, prototypes |
| Enterprise | Custom deployment | Talk to sales | Regulated industries |
Benchmarks from the Mem0 team show 26% accuracy improvement on LOCOMO long-conversation benchmark vs OpenAI's built-in memory, and 91% lower latency than re-feeding full history every call.
Real production use cases
- Personal AI assistants — Perplexity-style AI remembers your research topics across days.
- Customer support bots — agent recalls ticket history without SQL queries.
- Voice assistants — continuity across phone calls.
- Gaming NPCs — characters remember past player interactions.
Integration with popular frameworks
First-class support for LangChain, LangGraph, CrewAI, AutoGen, LiveKit, Vercel AI SDK, and the OpenAI Agents SDK.
Common pitfalls
- Over-memorization — if you add every message, retrieval gets noisy. Use
addselectively or rely on the auto-extraction heuristics. - Vector DB cold start — Qdrant/pgvector need index warm-up. First query after idle can take 500ms+.
- Cost control — fact extraction uses an LLM call; budget accordingly. Claude Haiku or GPT-4o-mini are cost-effective for the extraction step.
- User ID scoping — memories are keyed by
user_id. Always pass it; default scoping leaks memories across users.
Frequently Asked Questions
Mem0 is LLM-agnostic and self-hostable. It works with Claude, Gemini, Llama, and any vector DB. On the LOCOMO long-conversation benchmark, Mem0 scored 26% higher accuracy and 91% lower latency than OpenAI's built-in memory feature.
Mem0 supports Qdrant, Pinecone, pgvector on PostgreSQL, Chroma, Weaviate, and Milvus. Qdrant is the default. You configure the vector store via environment variables or the Memory constructor.
Yes. Mem0 uses an LLM to extract atomic facts from conversations before storing. Claude Haiku or GPT-4o-mini are most cost-effective for this step. Total extraction cost averages $0.0002 per interaction.
No. The core retrieval mechanism is semantic search over embeddings. However, you can combine Mem0 with filters and metadata for hybrid retrieval.
Yes. Mem0 has 26K+ GitHub stars, Apache 2.0 license, and is used in production by Y Combinator-backed companies. The hosted Mem0 Platform offers SLA-backed managed infra for teams that prefer SaaS.
Citations (3)
- Mem0 Benchmarks— Mem0 scores 26% higher accuracy on LOCOMO benchmark vs OpenAI memory
- Mem0 Official Docs— First-class integration with LangGraph, CrewAI, AutoGen, LiveKit
- Mem0 GitHub— 26K+ GitHub stars as of 2026
Source & Thanks
- GitHub: mem0ai/mem0 (25k+ stars)
- Docs: docs.mem0.ai
Discussion
Related Assets
Wax — Single-File Memory Layer for AI Agents
Wax stores documents, embeddings, and knowledge in one portable `.wax` file, giving AI agents a local memory layer without extra servers.
Zep — Long-Term Memory for AI Agents and Assistants
Production memory layer for AI assistants. Zep stores conversation history, extracts facts, builds knowledge graphs, and provides temporal-aware retrieval for LLMs.
Memori — Agent-Native Memory Infrastructure
Memori is an Apache-2.0 memory layer that captures what agents do (not just say) and plugs into existing stacks via Python/Node SDKs and a cloud option.
Wax — On-Device Memory Layer + MCP Server
Single-file memory layer for AI agents (Swift) plus an MCP server for local-first RAG. Good for private, on-device workflows on Apple Silicon.