向量记忆 vs 图记忆 — 如何选型(2026)
两种主流 AI 记忆架构的实战对比:何时用向量嵌入,何时用知识图谱,以及何时组合使用。
为什么选它
Every AI memory system reduces to one or both of the two primitives: vector memory (embed content, retrieve by semantic similarity) and graph memory (entities connected by typed edges, retrieved by traversal). Choosing between them is one of the highest-leverage decisions in agent architecture.
Vector memory wins on speed, simplicity, and fuzziness. Every major library supports it (Qdrant, Pinecone, pgvector), it handles free-form text natively, and sub-100ms retrieval over millions of memories is routine. It struggles with multi-hop reasoning ("friend of a friend who works on X") and temporal queries ("what was the user interested in last quarter?").
Graph memory wins on structure, relationships, and history. Questions like "who collaborated with whom on which project" are direct graph traversals. Graphiti-style temporal graphs add explicit history. The cost: higher operational complexity (a graph DB to run), harder extraction quality (LLM must produce well-typed edges), and slower cold-path queries on deep traversals.
Most production systems in 2026 are hybrid: vector store for everything, graph store for the 10-20% of queries where relationships matter. This page walks through when to pick which — and when to build both.
Decision framework
This flowchart is the short version of the decision. Every concrete library recommendation above has its own page on TokRepo with code and cost trade-offs — click through when you need detail.
# 1. What kind of questions will your agent answer?
#
# (a) "Find me memories similar to X" → VECTOR
# (b) "Who is connected to whom via what relationship?" → GRAPH
# (c) "What was true about X at time T?" → GRAPH (temporal)
# (d) "Has this topic come up before?" → VECTOR
# (e) Mix of (a)+(b)+(c) → HYBRID
#
# 2. What does your data look like?
#
# Free-form chat, long-form content, docs → VECTOR first
# Structured entity-relationship patterns → GRAPH first
# Conversational with named entities → HYBRID
#
# 3. What is your operational budget?
#
# 1 dev, small app → VECTOR only (simpler ops)
# Team, production workload → HYBRID (graph for the 10% that needs it)
#
# 4. Concrete stack recommendations:
#
# MVP chatbot → mem0 (vector) + Qdrant
# Session chat → Zep (vector + built-in entity graph)
# Long-running → Letta (paged) OR Graphiti (temporal graph)
# Research agent → Graphiti + vector DB side-by-side
#
# 5. Don't optimize prematurely.
# Ship vector memory first. Add graph when you can point at a
# specific query your vector memory fails on. Hybrid is a
# complexity tax — pay it only when it buys accuracy or correctness.核心能力
Vector: semantic similarity
Retrieves by distance in embedding space. Excellent for "find content like X" even when wording differs. Fails when the question is about relationships, not similarity.
Graph: typed relationships
Retrieves by traversing edges (friend_of, worked_on, lives_in). Excellent for multi-hop questions and structural queries. Requires the extractor to produce clean, typed edges.
Vector: simpler ops
Vector DBs are a single service. Managed options (Pinecone, Qdrant Cloud) take minutes to provision. Debugging is straightforward — inspect embeddings and scores.
Graph: temporal reasoning
Graphiti-style bitemporal edges let you ask "what was true when?" — impossible in a pure vector model. Essential for regulated domains (healthcare, finance, compliance).
Hybrid: the production default
Most real systems store memory twice: once in a vector index for fast semantic recall, once in a graph for structural queries. Queries route by intent.
Cost asymmetry
Vector retrieval costs fractions of a cent at millions of memories. Graph construction costs more (LLM extraction per episode), but graph queries can be cheaper than multi-vector reranking for relationship-heavy workloads.
对比
| Semantic recall | Multi-hop relations | Temporal queries | Operational cost | |
|---|---|---|---|---|
| Vector memory | Excellent | Poor | Poor (workarounds exist) | Low |
| Graph memory | Medium (depends on embedding-on-nodes) | Excellent | Excellent (with temporal edges) | Medium-High |
| Hybrid (vector + graph)this | Excellent | Excellent | Excellent | Medium-High |
实际用例
01. Consumer chatbot (pick vector)
mem0 or Zep gives you 90% of the perceived intelligence at minimal operational cost. Don’t add a graph until a user complains about a specific relationship-shaped question.
02. Healthcare / compliance (pick graph or hybrid)
Regulators ask "what was the patient on in January?" Bitemporal graph edges (Graphiti) are the standard way to answer. Add a vector layer for free-text note retrieval.
03. Engineering code intelligence (hybrid)
Vector over code chunks for "find similar code"; graph over symbols/imports for "what breaks if I rename this?" Neither alone is sufficient for production dev tools.
价格与许可
Vector-only stacks: Qdrant/Chroma self-hosted is free; managed vector DBs start around $70/month for production. LLM embedding costs scale with memory write volume — typically fractions of a cent per memory.
Graph stacks: Neo4j AuraDB free tier covers prototypes; production starts around $65/month. LLM extraction costs are higher because every episode triggers an entity/edge extraction pass.
Hybrid stacks: pay both infra lines. The payoff is measured in agent accuracy on the subset of queries that benefit from graph traversal. Measure before committing.
相关 TokRepo 资产
Claude Memory Compiler — Evolving Knowledge Base
Auto-capture Claude Code sessions into a structured knowledge base. Hooks extract decisions and lessons, compiler organizes into cross-referenced articles. No vector DB needed. 365+ stars.
Chroma — Open-Source Embedding Database for AI
Lightweight open-source vector database that runs anywhere. Chroma provides in-memory, local file, and client-server modes for embeddings with zero-config LangChain integration.
Self-Hosted AI Starter Kit — Local AI with n8n
Docker Compose template by n8n that bootstraps a complete local AI environment with n8n workflow automation, Ollama LLMs, Qdrant vector database, and PostgreSQL. 14,500+ stars.
Weaviate — Open-Source Vector Database at Scale
Weaviate is an open-source vector database for semantic search at scale. 15.9K+ GitHub stars. Hybrid search (vector + BM25), built-in RAG, reranking, multi-tenancy, and horizontal scaling. BSD 3-Claus
常见问题
Do I really need a graph database for AI memory?+
Usually not. Vector memory covers most chatbot and personalization use cases. Add a graph when you can point at specific questions your agent fails because it can’t traverse relationships — not because graph databases are intellectually interesting.
Can I use a graph DB as my only memory store?+
Yes, but you’ll want vector-search features on it. Neo4j has native vector indexes; FalkorDB is built on Redis with vector support. Pure-graph-no-vector works for structured domains but hurts on free-form chat recall.
How do vector + graph hybrids route queries?+
Common pattern: classify the query intent with a tiny LLM call, then route to vector for "find content" questions and to graph for "find relationships" questions. Merge results in a reranker. Zep does this internally; Graphiti + vector DB side-by-side lets you do it explicitly.
What about GraphRAG — is that vector or graph?+
Microsoft GraphRAG is both. It builds a graph from documents (entities + edges + community detection), then queries hierarchical summaries at multiple levels. It’s a retrieval architecture more than a memory system — best for large static knowledge bases, not conversational memory.
Which is more "future-proof"?+
Neither — they’re complementary, not competing. The research trend is toward learned structured memory that combines both (think "neural graph stores"). For 2026, build hybrid if you need the power; build vector-only if you don’t. Either way, keep your memory layer abstracted enough to swap implementations.