# Embedding Drift Monitoring — Retrieval Regression Runbook > Embedding drift monitoring runbook for RAG and agent search. Uses golden queries, recall@K, rank delta, and rollback gates. ## Install Copy the content below into your project: --- title: Embedding Drift Monitoring — Retrieval Regression Runbook asset_kind: knowledge target_tools: [codex, claude_code, cursor, gemini_cli] install_mode: single entrypoint: README.md --- # Embedding Drift Monitoring — Retrieval Regression Runbook Use this runbook when search quality drops after a model, chunking, corpus, or ranking change. Embedding drift is not one metric. Treat it as a retrieval regression problem: keep a fixed query set, fixed expected documents, and compare old versus new retrieval behavior before shipping. ## Quick Use Build a small golden set first: ```json [ { "query": "oauth device code flow cli login", "must_include": ["oauth-device-flow-runbook"], "should_include": ["cli-auth-security"] }, { "query": "mcp tools/list p95 latency", "must_include": ["mcp-latency-probe"] } ] ``` Then compare old and new retrieval: ```text for each query: run old index top_k=10 run new index top_k=10 compute recall@10 for must_include compute overlap@10 record rank movement for critical docs ``` ## Metrics That Matter | Metric | Use | |---|---| | Recall@K on golden queries | Catches lost must-return documents. | | Rank delta for critical docs | Shows whether important docs fell below the fold. | | Top-K overlap | Detects broad distribution shifts. | | Empty-result rate | Finds tokenizer, filter, or metadata regressions. | | Click or install follow-through | Confirms search quality after launch. | Vector distance alone is not enough. A lower average distance can still be worse if the wrong assets now rank above the exact answer. ## Change Types To Test - Embedding model upgrade or provider switch. - Chunk size, overlap, or markdown parsing change. - Metadata filter changes such as `visibility`, `asset_kind`, or language. - Hybrid ranking weight changes between BM25 and vector score. - Corpus refresh that adds many near-duplicate documents. ## Ship Gate Ship only when: 1. Must-include recall does not regress. 2. Empty-result rate does not increase for high-intent queries. 3. Top critical docs remain in top 3 or top 5 where expected. 4. Any intentional ranking shift is documented with examples. 5. Rollback is available: old index, old embedding model, or old ranker config. ## Source & Thanks This is an original TokRepo runbook by William Wang. It uses standard IR evaluation ideas such as recall@K and rank movement, and applies them to vector search systems commonly used with RAG and agent registries. # Embedding Drift Monitoring:检索回归运行手册 当搜索质量在模型、切块、语料或排序改动后变差时,用这份手册。Embedding drift 不是单一距离指标,而是检索回归问题:固定 query 集、固定预期文档,对比新旧检索结果。 ## 快速使用 先建立 golden set,每条 query 至少包含 must_include 文档。然后同时跑旧索引和新索引,比较 recall@10、top-K overlap、关键文档 rank delta 和 empty-result rate。 ## 发布门禁 1. must-include recall 不下降。 2. 高意图 query 的空结果率不上升。 3. 关键文档仍在预期 top 3 或 top 5。 4. 任何有意的排序变化都有例子说明。 5. 可回滚到旧索引、旧 embedding 模型或旧 ranker 配置。 --- Source: https://tokrepo.com/en/workflows/embedding-drift-monitoring-retrieval-regression-runbook-ea696ee5 Author: henuwangkai