# Embedding Drift Monitoring — Retrieval Regression Runbook

> Embedding drift monitoring runbook for RAG and agent search. Uses golden queries, recall@K, rank delta, and rollback gates.

## Install

Copy the content below into your project:

---
title: Embedding Drift Monitoring — Retrieval Regression Runbook
asset_kind: knowledge
target_tools: [codex, claude_code, cursor, gemini_cli]
install_mode: single
entrypoint: README.md
---

# Embedding Drift Monitoring — Retrieval Regression Runbook

Use this runbook when search quality drops after a model, chunking, corpus, or ranking change. Embedding drift is not one metric. Treat it as a retrieval regression problem: keep a fixed query set, fixed expected documents, and compare old versus new retrieval behavior before shipping.

## Quick Use

Build a small golden set first:

```json
[
  {
    "query": "oauth device code flow cli login",
    "must_include": ["oauth-device-flow-runbook"],
    "should_include": ["cli-auth-security"]
  },
  {
    "query": "mcp tools/list p95 latency",
    "must_include": ["mcp-latency-probe"]
  }
]
```

Then compare old and new retrieval:

```text
for each query:
  run old index top_k=10
  run new index top_k=10
  compute recall@10 for must_include
  compute overlap@10
  record rank movement for critical docs
```

## Metrics That Matter

| Metric | Use |
|---|---|
| Recall@K on golden queries | Catches lost must-return documents. |
| Rank delta for critical docs | Shows whether important docs fell below the fold. |
| Top-K overlap | Detects broad distribution shifts. |
| Empty-result rate | Finds tokenizer, filter, or metadata regressions. |
| Click or install follow-through | Confirms search quality after launch. |

Vector distance alone is not enough. A lower average distance can still be worse if the wrong assets now rank above the exact answer.

## Change Types To Test

- Embedding model upgrade or provider switch.
- Chunk size, overlap, or markdown parsing change.
- Metadata filter changes such as `visibility`, `asset_kind`, or language.
- Hybrid ranking weight changes between BM25 and vector score.
- Corpus refresh that adds many near-duplicate documents.

## Ship Gate

Ship only when:

1. Must-include recall does not regress.
2. Empty-result rate does not increase for high-intent queries.
3. Top critical docs remain in top 3 or top 5 where expected.
4. Any intentional ranking shift is documented with examples.
5. Rollback is available: old index, old embedding model, or old ranker config.

## Source & Thanks

This is an original TokRepo runbook by William Wang. It uses standard IR evaluation ideas such as recall@K and rank movement, and applies them to vector search systems commonly used with RAG and agent registries.

<!-- ZH -->

# Embedding Drift Monitoring：检索回归运行手册

当搜索质量在模型、切块、语料或排序改动后变差时，用这份手册。Embedding drift 不是单一距离指标，而是检索回归问题：固定 query 集、固定预期文档，对比新旧检索结果。

## 快速使用

先建立 golden set，每条 query 至少包含 must_include 文档。然后同时跑旧索引和新索引，比较 recall@10、top-K overlap、关键文档 rank delta 和 empty-result rate。

## 发布门禁

1. must-include recall 不下降。
2. 高意图 query 的空结果率不上升。
3. 关键文档仍在预期 top 3 或 top 5。
4. 任何有意的排序变化都有例子说明。
5. 可回滚到旧索引、旧 embedding 模型或旧 ranker 配置。


---
Source: https://tokrepo.com/en/workflows/embedding-drift-monitoring-retrieval-regression-runbook-ea696ee5
Author: henuwangkai