Letta — AI Agent Long-Term Memory Framework
Build AI agents with persistent memory using MemGPT architecture. Letta manages context windows automatically with tiered memory for stateful LLM applications.
What it is
Letta, formerly known as MemGPT, is an open-source framework for building AI agents that maintain persistent long-term memory across conversations. It addresses the fundamental context window limitation of LLMs by implementing a tiered memory architecture inspired by operating system virtual memory.
The framework is aimed at developers building stateful LLM applications -- chatbots that remember users, research agents that accumulate knowledge, and assistants that improve over time without losing context.
How it saves time or tokens
Without Letta, developers must manually manage context windows: truncating old messages, summarizing history, or re-injecting relevant facts. Letta automates this entirely. The agent decides what to store in core memory (always in context), recall memory (searchable conversation history), or archival memory (unlimited vector-indexed storage). This reduces wasted tokens on context management boilerplate and avoids the common failure mode where agents forget critical user preferences mid-conversation.
The workflow estimates around 4,100 tokens for a basic setup, but the real savings come from eliminating repeated context injection across sessions.
How to use
- Install Letta and start the server:
pip install letta
letta server
- Create a client and configure an agent with memory:
from letta import create_client
client = create_client()
agent = client.create_agent(
name='my_agent',
memory=client.create_block(
'You are a helpful assistant.',
label='system'
),
)
- Send messages and let the agent manage its own memory:
response = agent.send_message(
'Remember: my favorite color is blue.'
)
print(response.messages)
Example
from letta import create_client
client = create_client()
# Create agent with tiered memory
agent = client.create_agent(
name='research_assistant',
memory=client.create_block(
'You are a research assistant that remembers '
'all papers the user has discussed.',
label='system'
),
)
# Agent stores facts in archival memory automatically
agent.send_message('I just read the Attention Is All You Need paper.')
agent.send_message('What papers have I mentioned?')
Related on TokRepo
- AI memory frameworks compared -- Browse all memory solutions including Mem0, Zep, and MemGPT variants
- Letta deep-dive on TokRepo -- Dedicated page for Letta architecture and usage patterns
- Agent tools directory -- Other frameworks for building autonomous AI agents
Common pitfalls
- Running
letta serverwithout sufficient disk space for the SQLite-backed archival memory can cause silent failures on large datasets - The tiered memory system works best with models that follow function-calling conventions; smaller open models may not reliably trigger memory operations
- Confusing Letta (the rebranded project) with the original MemGPT academic paper -- the API surface has changed substantially since the rename
Frequently Asked Questions
Letta is the rebranded and production-ready version of MemGPT. The original MemGPT was a research project demonstrating virtual memory for LLMs. Letta took that concept and built a full agent framework with a server, REST API, and multi-user support. The core idea of tiered memory remains the same, but the API and architecture have evolved significantly.
Letta implements three memory tiers: core memory (always present in the LLM context window), recall memory (searchable conversation history stored in a database), and archival memory (unlimited vector-indexed storage for long-term facts). The agent autonomously decides what to promote or demote between tiers.
Yes, Letta supports multiple LLM backends including OpenAI, Anthropic, and local models via endpoints compatible with the OpenAI API format. However, the memory management functions work most reliably with models that have strong function-calling capabilities.
When the context window fills up, Letta automatically summarizes older messages and moves them to recall memory. Critical facts flagged by the agent are stored in archival memory for later retrieval. This process happens transparently without developer intervention.
Letta provides a server mode with REST API endpoints, making it suitable for multi-user deployments. Each agent maintains its own isolated memory state. The server supports PostgreSQL as a backend for production workloads, replacing the default SQLite for better concurrency.
Citations (3)
- Letta GitHub— Letta implements tiered memory architecture for AI agents
- MemGPT Paper— MemGPT virtual memory concept for LLMs
- Letta Documentation— Function calling enables agents to manage their own memory
Related on TokRepo
Source & Thanks
Created by Letta Team. Licensed under Apache 2.0.
letta-ai/letta — 12k+ stars
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.