AI Memory
mem0 — Long-term Memory for AI Agents (2026 Guide) logo

mem0 — Long-term Memory for AI Agents (2026 Guide)

mem0 is an open-source memory layer that extracts, stores, and retrieves user-specific facts across conversations. Drop it in next to your OpenAI or Claude calls and your chatbot stops forgetting who it is talking to.

Why mem0

mem0 solves a specific pain — stuffing entire chat history into every LLM call burns tokens and breaks at 50K+ messages. Instead, mem0 runs a small extraction pipeline after each turn, pulls out durable facts ("user prefers TypeScript over Python"), stores them with embeddings, and retrieves only the top-k relevant memories on the next prompt.

The architecture is minimal on purpose. A memory is just an embedded fact plus metadata (user_id, agent_id, run_id, timestamp). No session abstractions, no elaborate graph schema. That restraint is the reason mem0 reaches production faster than most alternatives — you can wrap an existing OpenAI client in ~10 lines.

What you trade: no temporal reasoning ("what did the user prefer last quarter?") and no multi-hop relationships ("friend-of-friend"). If those matter, look at Graphiti or a hybrid setup. For 80% of personalization and chatbot use cases, mem0 is the right default in 2026.

Quick Start — 10 Lines with OpenAI

Run once — memory.add() ships facts to the vector store in the background. Second turn, memory.search() retrieves them automatically. With no external config, mem0 uses an in-memory Qdrant and OpenAI embeddings. For production, swap in a persistent backend (Qdrant, Chroma, PGVector) via a single config dict.

# pip install mem0ai openai
import os
from openai import OpenAI
from mem0 import Memory

os.environ["OPENAI_API_KEY"] = "sk-..."

memory = Memory()
oai = OpenAI()
user_id = "william"

def chat(message: str) -> str:
    # Pull relevant memories for this user
    hits = memory.search(query=message, user_id=user_id, limit=5)
    context = "\n".join(m["memory"] for m in hits["results"])

    resp = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"You know about the user:\n{context}"},
            {"role": "user", "content": message},
        ],
    )
    answer = resp.choices[0].message.content

    # Persist the new turn
    memory.add([
        {"role": "user", "content": message},
        {"role": "assistant", "content": answer},
    ], user_id=user_id)
    return answer

print(chat("I prefer TypeScript for frontend work"))
print(chat("What language should I use for my next React project?"))

Key Features

Automatic fact extraction

An LLM pass after each turn identifies durable facts worth remembering ("user is on Mac", "prefers terse responses") and ignores conversational filler. No manual tagging.

Hybrid search (vector + keyword)

Retrieves memories by semantic similarity plus exact keyword boost — catches both fuzzy "similar" matches and exact entity mentions like project names.

Scoped by user / agent / run

Every memory carries user_id, agent_id, and run_id. Same vector store can serve multi-tenant apps without bleed — query scope is a single where clause.

Pluggable stores & LLMs

Vector backends: Qdrant, Chroma, Pinecone, PGVector, Weaviate. LLMs: OpenAI, Anthropic, Gemini, Ollama, AWS Bedrock. Swap via config, not code.

Graph memory (optional)

mem0 v0.1+ adds an optional Neo4j graph layer that extracts entity relationships alongside facts — a middle ground before committing to full graph-first systems.

Managed cloud

mem0 Platform (paid) handles storage, scaling, and a web UI for inspecting what your agent remembers. Same SDK, add api_key.

mem0 vs other memory layers

Numbers drawn from each project’s README benchmarks and issue threads (Q1 2026). Retrieval latency measured against a 10K-memory corpus with OpenAI embeddings.

 Storage ModelRetrieval LatencyLLM SupportLicenseSelf-host
mem0thisVector (+ optional graph)~40-80msOpenAI, Claude, Gemini, Ollama, BedrockApache 2.0Yes
ZepHybrid (vector + summary + graph)~60-120msOpenAI, Claude, local via OllamaApache 2.0 (community) + managedYes
Letta (MemGPT)Paged OS (working + archival)~100-200msOpenAI, Claude, localApache 2.0Yes
GraphitiTemporal graph (Neo4j/FalkorDB)~80-150msOpenAI, Claude, GeminiApache 2.0Yes

Use Cases

01. Chatbot personalization

Consumer-facing assistants (coach, tutor, companion) that need to recall preferences, goals, and prior context without re-asking. mem0 scales from MVP to millions of users on a single Qdrant cluster.

02. Customer-support copilots

Support agents that remember each customer’s account history, past tickets, and known environment details — freeing the human agent from re-reading transcripts.

03. Dev-tool agents with long context

Coding assistants that accumulate knowledge of your project — "which test runner", "which lint config", "preferred import style" — over many sessions without polluting the immediate prompt.

Pricing & License

Open source: Apache 2.0 — free to self-host with your own vector DB and LLM keys. GitHub.

mem0 Platform (managed): Free tier for dev, then usage-based. Paid plans add SSO, audit logs, custom embedders, and guaranteed latency SLAs. See mem0.ai/pricing for current tiers.

Cost reality check: the hidden cost is the extraction LLM call after every turn. On gpt-4o-mini this is ~$0.0002 per user turn — negligible at 10K DAU, meaningful at 10M. Swap to a smaller/local extractor for high-volume scenarios.

Related Assets on TokRepo

Frequently Asked Questions

How is mem0 different from a vector database?+

A vector DB (Qdrant, Pinecone) stores and searches embeddings — you write all the logic around extraction, deduplication, and scoping. mem0 is the layer above: it decides WHAT to remember, how to scope it, and how to retrieve it. Internally mem0 uses a vector DB, so you keep the performance characteristics you expect.

Does mem0 work with Claude or only OpenAI?+

Yes — mem0 supports Anthropic Claude (Opus, Sonnet, Haiku), Google Gemini, AWS Bedrock, and local models via Ollama. You pick one LLM for extraction (can be small/cheap) and another for your app (can be the best available). Configure both via the Memory(config={...}) constructor.

Can I self-host mem0?+

Yes — Apache 2.0 licensed. Typical self-host stack: mem0 SDK + Qdrant (or Chroma/PGVector) + your own LLM key. Takes ~15 minutes with Docker Compose from the official quickstart. Managed mem0 Platform is optional and only needed if you want the web UI and multi-user org features.

How does mem0 decide what to remember?+

After each conversation turn, mem0 runs a lightweight LLM call with a fact-extraction prompt that pulls durable user-specific facts and filters out conversational noise. Extracted facts are deduplicated against existing memories (LLM-judged similarity) so you don’t end up with 20 copies of "user likes TypeScript".

Can I use mem0 with LangChain or LlamaIndex?+

Yes. mem0 ships official adapters for LangChain (as a Memory class), LlamaIndex (as a Node postprocessor), and CrewAI (as a shared agent memory). For the Claude Agent SDK or Vercel AI SDK, use the Python/JS SDK directly — it’s just a function call.

mem0 vs ChatGPT memory — are they related?+

No — ChatGPT memory is a closed feature inside OpenAI’s consumer product. mem0 is an open-source library you run in your own app, with your own models and storage. Use ChatGPT memory if you’re using chatgpt.com; use mem0 if you’re building your own product.

Compare Alternatives