Letta — Agent Memory OS (formerly MemGPT)
Letta is a stateful agent framework built around MemGPT-style paged memory. Agents explicitly read from and write to their own memory via function calls — the most transparent memory model in production.
Why Letta
Letta (the company behind the open-source MemGPT paper) treats memory as an operating system abstraction. The agent has a small "working memory" that sits in the LLM context, plus a larger "archival memory" it can page in and out via explicit function calls (memory_search, memory_insert, memory_replace).
What this buys you: auditability. Every memory change is a tool call in the trace, so you can see exactly what the agent chose to remember, forget, or rewrite. Compare to black-box summarization or automatic extraction — when something goes wrong, you can point at the exact tool call that did it.
What it costs you: more LLM turns. The agent makes extra calls to manage its own memory. For throughput-bound use cases (100s of concurrent users on cheap models) the overhead matters. For autonomous long-running agents where a single user’s session spans days or weeks, the transparency is worth it.
Quick Start — Letta Server + Python Client
Letta runs as a stateful server (FastAPI + Postgres). Agents persist across restarts; memory is durable. In the trace you’ll see tool calls like core_memory_append — that’s the agent deciding to update its own memory. No hidden extraction pass.
# pip install letta-client
# docker run -p 8283:8283 letta/letta:latest
from letta_client import Letta
client = Letta(base_url="http://localhost:8283")
agent = client.agents.create(
name="assistant",
memory_blocks=[
{"label": "human", "value": "Name: William. Role: founder."},
{"label": "persona", "value": "You are a terse, senior engineer."},
],
model="openai/gpt-4o-mini",
embedding="openai/text-embedding-3-small",
)
resp = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "I'm using Nuxt for my new project."}],
)
for m in resp.messages:
print(m.message_type, getattr(m, "content", None))
# Agent will autonomously call core_memory_append / archival_memory_insert
# so "Nuxt" ends up in either working or archival memory — visible in the traceKey Features
Core + archival memory blocks
Core memory = in-context (always visible to the agent, limited to ~2KB per block). Archival = off-context, searchable via function calls. The split is the key MemGPT insight.
Self-directed memory updates
The agent chooses what to remember. core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search are tools exposed to the LLM — not automatic background processes.
Stateful agent server
Agents persist in Postgres. Kill the process, restart, pick up exactly where you left off — including all memory blocks, conversation state, and in-flight tool calls.
Multi-agent with shared memory
Multiple agents can share memory blocks. Good for manager/worker patterns where a supervisor reads the same context as its subordinates but maintains its own persona block.
ADE (Agent Development Environment)
Letta’s web UI lets you inspect an agent’s memory blocks in real time, watch tool calls as they happen, and edit memory directly when debugging.
OpenAI-compatible API
Letta agents can be called via an OpenAI-style chat completions endpoint, making them drop-in replacements for stateless OpenAI calls in existing apps.
Comparison
| Memory Control | State Persistence | Overhead | Best For | |
|---|---|---|---|---|
| Lettathis | Agent-directed (explicit tools) | Postgres-backed, durable | High (extra turns) | Long-running autonomous agents |
| mem0 | Background extraction | Vector DB of your choice | Low (async pipeline) | Chatbot personalization |
| Zep | Service-managed summarization | Postgres-backed service | Low-medium | Session-based apps |
| MemGPT (research) | Same as Letta (its ancestor) | Research prototype | High | Research, prototyping |
Use Cases
01. Autonomous research agents
Long-running agents that ingest papers, maintain hypotheses in core memory, and archive sources. The explicit memory model is essential when an agent runs for hours — you need to audit what it "learned".
02. Personal assistant copilots
Assistants that build up model of their user over weeks. Letta’s persistent Postgres state means restarts, migrations, and infra changes don’t wipe the relationship.
03. Multi-agent supervision
Supervisor agent reads shared memory blocks written by worker agents. The shared-block model is cleaner than message-passing for coordination scenarios.
Pricing & License
Letta OSS: Apache 2.0 — self-host via docker-compose with Postgres. No license cost. You pay for your own LLM and embedding API keys.
Letta Cloud: Hosted service with the ADE and managed Postgres. Usage-based pricing per active agent. See letta.com/pricing.
Cost model: Letta adds tool-call overhead. Budget ~2-3x the token cost of a stateless chat API for equivalent work, offset by not needing a separate context-stuffing pipeline.
Related Assets on TokRepo
Letta — Stateful AI Agents with Memory
Letta builds stateful AI agents that learn and self-improve with advanced memory. 21.8K+ stars. CLI, Python/TS SDKs, skills, subagents. Apache 2.0.
Letta — AI Agent Long-Term Memory Framework
Build AI agents with persistent memory using MemGPT architecture. Letta manages context windows automatically with tiered memory for stateful LLM applications.
Frequently Asked Questions
Letta vs MemGPT — what is the relationship?+
MemGPT is the 2023 research paper (UC Berkeley) that introduced paged agent memory. Letta is the company and production-grade open-source framework built on those ideas. If you looked at memgpt.ai in 2023 and are wondering where the project went — you’re looking for Letta now.
Do I have to self-host Postgres?+
For self-host, yes — Letta needs a Postgres instance. docker-compose handles it locally; in production use RDS/Supabase/Neon. If you don’t want to manage Postgres, Letta Cloud handles it for you.
Can Letta agents call external APIs?+
Yes. Tools are regular Python functions. Register them via client.tools.create(source_code=...) and the agent can call your custom tools alongside its memory tools. Common patterns: web search, database lookups, Slack posts.
What LLMs work with Letta?+
OpenAI, Anthropic Claude, Google Gemini, Groq, Ollama, and any OpenAI-compatible endpoint (including vLLM and LM Studio). Configure per-agent via the model field — you can mix (e.g., cheap model for memory ops, Claude for the primary response).
When is Letta overkill?+
Simple chatbots with 10-turn sessions, or RAG systems where the knowledge lives in documents not conversation history. In those cases mem0 or Zep get you 80% of the way with 30% of the infrastructure.