为什么选它
Zep’s differentiator is the session abstraction. Where mem0 thinks in facts, Zep thinks in sessions: a conversation has a beginning, a middle that gets summarized, and a tail that stays verbatim. When you fetch memory, Zep returns a summary of the old + the latest messages + relevant long-term facts — already formatted for prompt injection.
Under the hood it runs a hybrid retrieval pipeline: dense vector search for semantic similarity, BM25 for exact terms, and a small entity graph for "who’s who" resolution. That extra machinery costs ~20ms per call but noticeably improves recall on real chat corpora — especially when users reference specific project names or past entities.
Zep ships as both managed and self-hosted. The managed service (Zep Cloud) includes a web UI where you can inspect every stored memory, which turns out to be critical for debugging agent behavior. Self-host when you have data-residency needs; use Cloud when time-to-production matters more than per-token cost.
Quick Start — Python SDK
The key API is memory.get(session_id).context — it returns a single pre-formatted string containing the running summary, extracted user facts, and the last N messages. Drop it into your system prompt and the rest of your LLM code stays unchanged.
# pip install zep-python openai
from zep_python.client import Zep
from openai import OpenAI
zep = Zep(api_key="z_...") # or base_url="http://localhost:8000" for self-host
oai = OpenAI()
user_id, session_id = "william", "s_2026_04_14"
zep.user.add(user_id=user_id, email="william@example.com")
zep.memory.add_session(session_id=session_id, user_id=user_id)
def chat(message: str) -> str:
zep.memory.add(session_id=session_id,
messages=[{"role": "user", "content": message}])
# Zep returns pre-formatted context: summary + facts + recent messages
ctx = zep.memory.get(session_id=session_id).context
resp = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": ctx},
{"role": "user", "content": message},
],
)
answer = resp.choices[0].message.content
zep.memory.add(session_id=session_id,
messages=[{"role": "assistant", "content": answer}])
return answer
print(chat("I'm planning a trip to Tokyo in May"))
print(chat("What should I pack for that trip?"))核心能力
Automatic session summarization
Zep runs a summarizer in the background as sessions grow. Once a session passes ~20 messages, older turns are collapsed into a running summary — context window never balloons.
Hybrid search (vector + BM25 + graph)
Every memory is indexed three ways. Queries blend all three signals, which measurably outperforms vector-only retrieval on real chat data where exact term matches matter.
Knowledge graph extraction
Zep’s graph service extracts entities and relationships from conversations. Ask "who did the user mention working with" and the graph returns them directly — no LLM hallucination.
Fact extraction with dedup
Long-term user facts are extracted automatically and deduped against prior memories. Inspect and edit them in the Zep UI if your agent remembers something wrong.
Low-latency SDK (10-30ms p99 managed)
The managed service sits geographically close to major LLM providers. Hot-path reads use a pre-computed session context object — single DB round trip.
Self-host option
Full stack (API, worker, Postgres, NATS) runs via docker-compose. Apache 2.0 licensed. Same SDK, just point base_url at your cluster.
对比
| Session Model | Summarization | Graph Support | Deployment | |
|---|---|---|---|---|
| Zepthis | First-class sessions + users | Built-in (rolling) | Yes — native entity graph | Managed + self-host |
| mem0 | Facts only (no session concept) | No | Optional Neo4j plugin | SDK + optional platform |
| Letta | Agent state (not sessions) | Agent-driven paging | No | Self-host + cloud |
| LangMem | LangChain thread-based | Opt-in | No | SDK only |
实际用例
01. Production customer support
Sessions map naturally to conversations. Summarization keeps month-long customer relationships queryable without ballooning token cost. Zep’s UI gives support engineers a window into what the bot "knows" about a customer.
02. Multi-agent teams sharing context
User-level facts are scoped to a user, not a session — so a handoff from sales bot to onboarding bot can share everything known about the user while keeping session histories separate.
03. Analytics-heavy assistants
When agents need to answer "show me everyone who mentioned feature X", the graph layer lets you traverse entity relationships directly, not fuzz-match across 500K embeddings.
价格与许可
Zep Community Edition: Apache 2.0, self-host. Includes the full API, hybrid search, summarization, and graph. Run on your own Postgres + infra.
Zep Cloud: Free dev tier, then pay-as-you-go. Paid plans add the web UI, team management, SOC 2 reporting, and scale-out. Current pricing on getzep.com/pricing.
What you actually pay for: summarization LLM calls. Zep bills managed LLM use through their platform; self-host uses your own OpenAI/Claude key. Expect ~$0.0003 per added message turn on cheap models.
相关 TokRepo 资产
Graphiti — Real-Time Knowledge Graphs for AI Agents
Build real-time knowledge graphs for AI agents by Zep. Temporal awareness, entity extraction, community detection, and hybrid search. Production-ready. 24K+ stars.
Graphiti — Temporal AI Knowledge Graph by Zep
Build dynamic knowledge graphs from AI agent conversations. Graphiti tracks entity changes over time, resolves contradictions, and provides temporal-aware queries.
Zep — Long-Term Memory for AI Agents and Assistants
Production memory layer for AI assistants. Zep stores conversation history, extracts facts, builds knowledge graphs, and provides temporal-aware retrieval for LLMs.
常见问题
Zep vs mem0 — which should I pick?+
Pick Zep when sessions are a first-class concept in your app (support, tutoring, booking) and you want summarization + graph + UI out of the box. Pick mem0 when you want a lighter-weight fact store and prefer to compose your own session logic.
Can Zep replace my vector database?+
For memory yes — Zep stores embeddings internally. For general RAG over documents, no: keep a separate vector DB (Qdrant/Pinecone/Chroma). Zep is tuned for conversation memory, not arbitrary document corpora.
Does Zep work with local LLMs?+
Yes. Self-hosted Zep supports Ollama, LiteLLM, and any OpenAI-compatible endpoint for summarization and extraction. The SDK is LLM-agnostic on the read path — it returns text/facts that you feed to whatever model you like.
How does Zep’s graph differ from Graphiti?+
Zep’s graph is conversation-scoped: entities and relations mentioned in chat, extracted and updated as the session progresses. Graphiti is a temporal graph library — it tracks time-bounded validity of every edge. Use Zep for in-app memory; use Graphiti when you need to reason about "what was true when".
What’s the latency penalty of hybrid search?+
Typically +10-20ms vs pure vector search on a 100K-memory corpus. Worth it for recall improvements on chat corpora (exact term matches, entity references). If you need sub-50ms p99, self-host close to your app.