AI Memory

Letta — Agent 记忆操作系统（原 MemGPT）

Letta 是围绕 MemGPT 分页式记忆构建的状态化 Agent 框架。Agent 通过函数调用显式读写自己的记忆——生产中最透明的记忆模型。

Why Letta

Letta (the company behind the open-source MemGPT paper) treats memory as an operating system abstraction. The agent has a small "working memory" that sits in the LLM context, plus a larger "archival memory" it can page in and out via explicit function calls (memory_search, memory_insert, memory_replace).

What this buys you: auditability. Every memory change is a tool call in the trace, so you can see exactly what the agent chose to remember, forget, or rewrite. Compare to black-box summarization or automatic extraction — when something goes wrong, you can point at the exact tool call that did it.

What it costs you: more LLM turns. The agent makes extra calls to manage its own memory. For throughput-bound use cases (100s of concurrent users on cheap models) the overhead matters. For autonomous long-running agents where a single user’s session spans days or weeks, the transparency is worth it.

Quick Start — Letta Server + Python Client

Letta runs as a stateful server (FastAPI + Postgres). Agents persist across restarts; memory is durable. In the trace you’ll see tool calls like core_memory_append — that’s the agent deciding to update its own memory. No hidden extraction pass.

# pip install letta-client
# docker run -p 8283:8283 letta/letta:latest
from letta_client import Letta

client = Letta(base_url="http://localhost:8283")

agent = client.agents.create(
    name="assistant",
    memory_blocks=[
        {"label": "human", "value": "Name: William. Role: founder."},
        {"label": "persona", "value": "You are a terse, senior engineer."},
    ],
    model="openai/gpt-4o-mini",
    embedding="openai/text-embedding-3-small",
)

resp = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "I'm using Nuxt for my new project."}],
)
for m in resp.messages:
    print(m.message_type, getattr(m, "content", None))

# Agent will autonomously call core_memory_append / archival_memory_insert
# so "Nuxt" ends up in either working or archival memory — visible in the trace

核心能力

Core + archival memory blocks

Core memory = in-context (always visible to the agent, limited to ~2KB per block). Archival = off-context, searchable via function calls. The split is the key MemGPT insight.

Self-directed memory updates

The agent chooses what to remember. core_memory_append, core_memory_replace, archival_memory_insert, archival_memory_search are tools exposed to the LLM — not automatic background processes.

Stateful agent server

Agents persist in Postgres. Kill the process, restart, pick up exactly where you left off — including all memory blocks, conversation state, and in-flight tool calls.

Multi-agent with shared memory

Multiple agents can share memory blocks. Good for manager/worker patterns where a supervisor reads the same context as its subordinates but maintains its own persona block.

ADE (Agent Development Environment)

Letta’s web UI lets you inspect an agent’s memory blocks in real time, watch tool calls as they happen, and edit memory directly when debugging.

OpenAI-compatible API

Letta agents can be called via an OpenAI-style chat completions endpoint, making them drop-in replacements for stateless OpenAI calls in existing apps.

对比

	Memory Control	State Persistence	Overhead	Best For
Letta本工具	Agent-directed (explicit tools)	Postgres-backed, durable	High (extra turns)	Long-running autonomous agents
mem0	Background extraction	Vector DB of your choice	Low (async pipeline)	Chatbot personalization
Zep	Service-managed summarization	Postgres-backed service	Low-medium	Session-based apps
MemGPT (research)	Same as Letta (its ancestor)	Research prototype	High	Research, prototyping

实际用例

01. Autonomous research agents

Long-running agents that ingest papers, maintain hypotheses in core memory, and archive sources. The explicit memory model is essential when an agent runs for hours — you need to audit what it "learned".

02. Personal assistant copilots

Assistants that build up model of their user over weeks. Letta’s persistent Postgres state means restarts, migrations, and infra changes don’t wipe the relationship.

03. Multi-agent supervision

Supervisor agent reads shared memory blocks written by worker agents. The shared-block model is cleaner than message-passing for coordination scenarios.

价格与许可

Letta OSS: Apache 2.0 — self-host via docker-compose with Postgres. No license cost. You pay for your own LLM and embedding API keys.

Letta Cloud: Hosted service with the ADE and managed Postgres. Usage-based pricing per active agent. See letta.com/pricing.

Cost model: Letta adds tool-call overhead. Budget ~2-3x the token cost of a stateless chat API for equivalent work, offset by not needing a separate context-stuffing pipeline.

常见问题

Letta vs MemGPT — what is the relationship?+

MemGPT is the 2023 research paper (UC Berkeley) that introduced paged agent memory. Letta is the company and production-grade open-source framework built on those ideas. If you looked at memgpt.ai in 2023 and are wondering where the project went — you’re looking for Letta now.

Do I have to self-host Postgres?+

For self-host, yes — Letta needs a Postgres instance. docker-compose handles it locally; in production use RDS/Supabase/Neon. If you don’t want to manage Postgres, Letta Cloud handles it for you.

Can Letta agents call external APIs?+

Yes. Tools are regular Python functions. Register them via client.tools.create(source_code=...) and the agent can call your custom tools alongside its memory tools. Common patterns: web search, database lookups, Slack posts.

What LLMs work with Letta?+

OpenAI, Anthropic Claude, Google Gemini, Groq, Ollama, and any OpenAI-compatible endpoint (including vLLM and LM Studio). Configure per-agent via the model field — you can mix (e.g., cheap model for memory ops, Claude for the primary response).

When is Letta overkill?+

Simple chatbots with 10-turn sessions, or RAG systems where the knowledge lives in documents not conversation history. In those cases mem0 or Zep get you 80% of the way with 30% of the infrastructure.