ScriptsApr 8, 2026·2 min read

Letta — AI Agent Long-Term Memory Framework

Build AI agents with persistent memory using MemGPT architecture. Letta manages context windows automatically with tiered memory for stateful LLM applications.

TL;DR
Letta gives AI agents tiered long-term memory so they manage their own context window automatically.
§01

What it is

Letta, formerly known as MemGPT, is an open-source framework for building AI agents that maintain persistent long-term memory across conversations. It addresses the fundamental context window limitation of LLMs by implementing a tiered memory architecture inspired by operating system virtual memory.

The framework is aimed at developers building stateful LLM applications -- chatbots that remember users, research agents that accumulate knowledge, and assistants that improve over time without losing context.

§02

How it saves time or tokens

Without Letta, developers must manually manage context windows: truncating old messages, summarizing history, or re-injecting relevant facts. Letta automates this entirely. The agent decides what to store in core memory (always in context), recall memory (searchable conversation history), or archival memory (unlimited vector-indexed storage). This reduces wasted tokens on context management boilerplate and avoids the common failure mode where agents forget critical user preferences mid-conversation.

The workflow estimates around 4,100 tokens for a basic setup, but the real savings come from eliminating repeated context injection across sessions.

§03

How to use

  1. Install Letta and start the server:
pip install letta
letta server
  1. Create a client and configure an agent with memory:
from letta import create_client

client = create_client()
agent = client.create_agent(
    name='my_agent',
    memory=client.create_block(
        'You are a helpful assistant.',
        label='system'
    ),
)
  1. Send messages and let the agent manage its own memory:
response = agent.send_message(
    'Remember: my favorite color is blue.'
)
print(response.messages)
§04

Example

from letta import create_client

client = create_client()

# Create agent with tiered memory
agent = client.create_agent(
    name='research_assistant',
    memory=client.create_block(
        'You are a research assistant that remembers '
        'all papers the user has discussed.',
        label='system'
    ),
)

# Agent stores facts in archival memory automatically
agent.send_message('I just read the Attention Is All You Need paper.')
agent.send_message('What papers have I mentioned?')
§05

Related on TokRepo

§06

Common pitfalls

  • Running letta server without sufficient disk space for the SQLite-backed archival memory can cause silent failures on large datasets
  • The tiered memory system works best with models that follow function-calling conventions; smaller open models may not reliably trigger memory operations
  • Confusing Letta (the rebranded project) with the original MemGPT academic paper -- the API surface has changed substantially since the rename

Frequently Asked Questions

What is the difference between Letta and MemGPT?+

Letta is the rebranded and production-ready version of MemGPT. The original MemGPT was a research project demonstrating virtual memory for LLMs. Letta took that concept and built a full agent framework with a server, REST API, and multi-user support. The core idea of tiered memory remains the same, but the API and architecture have evolved significantly.

What memory tiers does Letta provide?+

Letta implements three memory tiers: core memory (always present in the LLM context window), recall memory (searchable conversation history stored in a database), and archival memory (unlimited vector-indexed storage for long-term facts). The agent autonomously decides what to promote or demote between tiers.

Can Letta work with open-source LLMs?+

Yes, Letta supports multiple LLM backends including OpenAI, Anthropic, and local models via endpoints compatible with the OpenAI API format. However, the memory management functions work most reliably with models that have strong function-calling capabilities.

How does Letta handle context window overflow?+

When the context window fills up, Letta automatically summarizes older messages and moves them to recall memory. Critical facts flagged by the agent are stored in archival memory for later retrieval. This process happens transparently without developer intervention.

Is Letta suitable for production multi-user applications?+

Letta provides a server mode with REST API endpoints, making it suitable for multi-user deployments. Each agent maintains its own isolated memory state. The server supports PostgreSQL as a backend for production workloads, replacing the default SQLite for better concurrency.

Citations (3)
🙏

Source & Thanks

Created by Letta Team. Licensed under Apache 2.0.

letta-ai/letta — 12k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets