KnowledgeApr 7, 2026·2 min read

Zep — Long-Term Memory for AI Agents and Assistants

Production memory layer for AI assistants. Zep stores conversation history, extracts facts, builds knowledge graphs, and provides temporal-aware retrieval for LLMs.

TL;DR
Zep stores conversations, extracts facts, builds knowledge graphs, and gives LLMs temporal-aware retrieval.
§01

What it is

Zep is a production memory layer for AI assistants and agents. It goes beyond simple conversation logging by extracting structured facts from dialogue, building knowledge graphs of entities and relationships, and providing temporal-aware retrieval so LLMs can recall context from hours or months ago.

Zep is designed for developers building chatbots, copilots, and autonomous agents that need to remember user preferences, prior decisions, and evolving context across sessions.

§02

How it saves time or tokens

Without Zep, developers typically stuff entire conversation histories into prompts, burning tokens on irrelevant context. Zep's fact extraction and knowledge graph compress thousands of messages into structured memory that can be queried selectively. This reduces prompt size while improving answer relevance.

The temporal awareness layer means Zep can distinguish between what a user said last week versus what they said today, preventing stale information from polluting responses.

§03

How to use

  1. Install Zep and start the server:
pip install zep-python
# Or deploy via Docker
docker compose up -d
  1. Create a session and add messages:
from zep_python import ZepClient, Message

client = ZepClient('http://localhost:8000')
session_id = 'user-123-session'

client.memory.add_memory(
    session_id,
    [
        Message(role='user', content='I prefer Python over JavaScript'),
        Message(role='assistant', content='Noted. I will suggest Python solutions.')
    ]
)
  1. Retrieve relevant memory for a new prompt:
memory = client.memory.get_memory(session_id)
facts = client.memory.search_memory(session_id, 'language preference')
  1. Inject the retrieved facts into your LLM prompt alongside the current user message.
§04

Example

# Retrieve structured facts instead of raw history
results = client.memory.search_memory(
    session_id,
    query='What programming language does the user prefer?',
    limit=3
)
for r in results:
    print(f'Fact: {r.message.content} (relevance: {r.score:.2f})')

This returns only the relevant facts rather than the full conversation, keeping prompts lean.

§05

Related on TokRepo

§06

Common pitfalls

  • Treating Zep as a simple key-value store. Its value comes from fact extraction and knowledge graphs, not raw message storage. Use the search API, not just get.
  • Forgetting to set session boundaries. Without clear session IDs, memories from different users or conversations can bleed together.
  • Over-relying on Zep for real-time context. Zep excels at long-term memory; for the current conversation turn, pass recent messages directly in the prompt.
  • Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

Frequently Asked Questions

How does Zep differ from simple vector store memory?+

Vector stores retrieve similar text chunks. Zep goes further by extracting structured facts, building entity relationship graphs, and tracking temporal context. This means Zep can answer questions like 'what did the user say about budgets last month' rather than just finding semantically similar messages.

Can Zep work with any LLM provider?+

Yes. Zep is LLM-agnostic. It stores and retrieves memory independently of your model choice. You can use Zep with OpenAI, Anthropic Claude, local models via Ollama, or any other provider. The memory retrieval results are plain text you inject into any prompt format.

Does Zep support knowledge graphs?+

Yes. Zep automatically extracts entities and relationships from conversations and builds a knowledge graph. You can query this graph to understand connections between people, projects, preferences, and other concepts mentioned across sessions.

What is temporal-aware retrieval?+

Temporal-aware retrieval means Zep tracks when information was mentioned and can prioritize recent facts over older ones. If a user changed their preference from Python to Rust last week, Zep surfaces the Rust preference rather than the outdated Python one.

How does Zep handle multi-user applications?+

Zep uses session IDs to isolate memory per user or conversation. Each session has its own fact store and knowledge graph. You can also create user-level memory that persists across sessions for the same user by using consistent user identifiers.

Citations (3)
🙏

Source & Thanks

Created by Zep AI. Licensed under Apache 2.0.

getzep/zep — 3k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.