What is Letta?
Letta (formerly MemGPT) is a framework for building AI agents with persistent, long-term memory. It solves the context window limitation by implementing a tiered memory architecture — core memory (always in context), recall memory (conversation history), and archival memory (unlimited storage). The agent manages its own memory, deciding what to remember and forget.
Answer-Ready: Letta is an AI agent framework with persistent memory management. Uses tiered memory (core/recall/archival) to overcome context window limits. Formerly MemGPT. Agents self-manage memory across conversations. 12k+ GitHub stars.
Best for: Developers building stateful AI agents that need to remember across sessions. Works with: OpenAI, Anthropic, local models via Ollama. Setup time: Under 3 minutes.
Core Features
1. Tiered Memory Architecture
| Memory Tier | Purpose | Size |
|---|---|---|
| Core | Always in context, editable by agent | ~2K tokens |
| Recall | Searchable conversation history | Unlimited |
| Archival | Long-term knowledge storage | Unlimited |
2. Agent Self-Management
# Agent decides what to save
agent.send_message("My meeting is at 3pm tomorrow with Sarah about the Q2 budget.")
# Agent automatically stores this in archival memory3. Tool Use
from letta import tool
@tool
def search_web(query: str) -> str:
"Search the web for information."
# Your search implementation
return results
agent = client.create_agent(tools=[search_web])4. REST API Server
letta server --port 8283
# Full REST API for agent management
# POST /v1/agents - Create agent
# POST /v1/agents/{id}/messages - Send messageUse Cases
| Use Case | How |
|---|---|
| Personal Assistant | Remember user preferences across sessions |
| Customer Support | Track customer history and context |
| Research Agent | Accumulate findings over long investigations |
| Coding Companion | Remember codebase context and decisions |
FAQ
Q: How does it differ from RAG? A: RAG retrieves from static documents. Letta agents actively manage their own memory — writing, updating, and deleting memories as conversations evolve.
Q: Can I use local models? A: Yes, supports Ollama, vLLM, and any OpenAI-compatible endpoint.
Q: Is it production-ready? A: Yes, Letta Cloud offers managed hosting. Self-hosted server supports Docker deployment.