Memory Architecture

向量记忆 vs 图记忆 — 如何选型（2026）

两种主流 AI 记忆架构的实战对比：何时用向量嵌入，何时用知识图谱，以及何时组合使用。

Two memory models, one decision

Every AI memory system reduces to one or both of the two primitives: vector memory (embed content, retrieve by semantic similarity) and graph memory (entities connected by typed edges, retrieved by traversal). Choosing between them is one of the highest-leverage decisions in agent architecture.

Vector memory wins on speed, simplicity, and fuzziness. Every major library supports it (Qdrant, Pinecone, pgvector), it handles free-form text natively, and sub-100ms retrieval over millions of memories is routine. It struggles with multi-hop reasoning ("friend of a friend who works on X") and temporal queries ("what was the user interested in last quarter?").

Graph memory wins on structure, relationships, and history. Questions like "who collaborated with whom on which project" are direct graph traversals. Graphiti-style temporal graphs add explicit history. The cost: higher operational complexity (a graph DB to run), harder extraction quality (LLM must produce well-typed edges), and slower cold-path queries on deep traversals.

Most production systems in 2026 are hybrid: vector store for everything, graph store for the 10-20% of queries where relationships matter. This page walks through when to pick which — and when to build both.

Decision framework

This flowchart is the short version of the decision. Every concrete library recommendation above has its own page on TokRepo with code and cost trade-offs — click through when you need detail.

# 1. What kind of questions will your agent answer?
#
#   (a) "Find me memories similar to X"     → VECTOR
#   (b) "Who is connected to whom via what relationship?"  → GRAPH
#   (c) "What was true about X at time T?"  → GRAPH (temporal)
#   (d) "Has this topic come up before?"    → VECTOR
#   (e) Mix of (a)+(b)+(c)                  → HYBRID
#
# 2. What does your data look like?
#
#   Free-form chat, long-form content, docs   → VECTOR first
#   Structured entity-relationship patterns   → GRAPH first
#   Conversational with named entities        → HYBRID
#
# 3. What is your operational budget?
#
#   1 dev, small app          → VECTOR only (simpler ops)
#   Team, production workload → HYBRID (graph for the 10% that needs it)
#
# 4. Concrete stack recommendations:
#
#   MVP chatbot     → mem0 (vector) + Qdrant
#   Session chat    → Zep (vector + built-in entity graph)
#   Long-running    → Letta (paged) OR Graphiti (temporal graph)
#   Research agent  → Graphiti + vector DB side-by-side
#
# 5. Don't optimize prematurely.
#    Ship vector memory first. Add graph when you can point at a
#    specific query your vector memory fails on. Hybrid is a
#    complexity tax — pay it only when it buys accuracy or correctness.

核心能力

Vector: semantic similarity

Retrieves by distance in embedding space. Excellent for "find content like X" even when wording differs. Fails when the question is about relationships, not similarity.

Graph: typed relationships

Retrieves by traversing edges (friend_of, worked_on, lives_in). Excellent for multi-hop questions and structural queries. Requires the extractor to produce clean, typed edges.

Vector: simpler ops

Vector DBs are a single service. Managed options (Pinecone, Qdrant Cloud) take minutes to provision. Debugging is straightforward — inspect embeddings and scores.

Graph: temporal reasoning

Graphiti-style bitemporal edges let you ask "what was true when?" — impossible in a pure vector model. Essential for regulated domains (healthcare, finance, compliance).

Hybrid: the production default

Most real systems store memory twice: once in a vector index for fast semantic recall, once in a graph for structural queries. Queries route by intent.

Cost asymmetry

Vector retrieval costs fractions of a cent at millions of memories. Graph construction costs more (LLM extraction per episode), but graph queries can be cheaper than multi-vector reranking for relationship-heavy workloads.

对比

	Semantic recall	Multi-hop relations	Temporal queries	Operational cost
Vector memory	Excellent	Poor	Poor (workarounds exist)	Low
Graph memory	Medium (depends on embedding-on-nodes)	Excellent	Excellent (with temporal edges)	Medium-High
Hybrid (vector + graph)本工具	Excellent	Excellent	Excellent	Medium-High

实际用例

01. Consumer chatbot (pick vector)

mem0 or Zep gives you 90% of the perceived intelligence at minimal operational cost. Don’t add a graph until a user complains about a specific relationship-shaped question.

02. Healthcare / compliance (pick graph or hybrid)

Regulators ask "what was the patient on in January?" Bitemporal graph edges (Graphiti) are the standard way to answer. Add a vector layer for free-text note retrieval.

03. Engineering code intelligence (hybrid)

Vector over code chunks for "find similar code"; graph over symbols/imports for "what breaks if I rename this?" Neither alone is sufficient for production dev tools.

价格与许可

Vector-only stacks: Qdrant/Chroma self-hosted is free; managed vector DBs start around $70/month for production. LLM embedding costs scale with memory write volume — typically fractions of a cent per memory.

Graph stacks: Neo4j AuraDB free tier covers prototypes; production starts around $65/month. LLM extraction costs are higher because every episode triggers an entity/edge extraction pass.

Hybrid stacks: pay both infra lines. The payoff is measured in agent accuracy on the subset of queries that benefit from graph traversal. Measure before committing.

常见问题

Do I really need a graph database for AI memory?+

Usually not. Vector memory covers most chatbot and personalization use cases. Add a graph when you can point at specific questions your agent fails because it can’t traverse relationships — not because graph databases are intellectually interesting.

Can I use a graph DB as my only memory store?+

Yes, but you’ll want vector-search features on it. Neo4j has native vector indexes; FalkorDB is built on Redis with vector support. Pure-graph-no-vector works for structured domains but hurts on free-form chat recall.

How do vector + graph hybrids route queries?+

Common pattern: classify the query intent with a tiny LLM call, then route to vector for "find content" questions and to graph for "find relationships" questions. Merge results in a reranker. Zep does this internally; Graphiti + vector DB side-by-side lets you do it explicitly.

What about GraphRAG — is that vector or graph?+

Microsoft GraphRAG is both. It builds a graph from documents (entities + edges + community detection), then queries hierarchical summaries at multiple levels. It’s a retrieval architecture more than a memory system — best for large static knowledge bases, not conversational memory.

Which is more "future-proof"?+

Neither — they’re complementary, not competing. The research trend is toward learned structured memory that combines both (think "neural graph stores"). For 2026, build hybrid if you need the power; build vector-only if you don’t. Either way, keep your memory layer abstracted enough to swap implementations.