Metrics That Matter
| Metric | Use |
|---|---|
| Recall@K on golden queries | Catches lost must-return documents. |
| Rank delta for critical docs | Shows whether important docs fell below the fold. |
| Top-K overlap | Detects broad distribution shifts. |
| Empty-result rate | Finds tokenizer, filter, or metadata regressions. |
| Click or install follow-through | Confirms search quality after launch. |
Vector distance alone is not enough. A lower average distance can still be worse if the wrong assets now rank above the exact answer.
Change Types To Test
- Embedding model upgrade or provider switch.
- Chunk size, overlap, or markdown parsing change.
- Metadata filter changes such as
visibility,asset_kind, or language. - Hybrid ranking weight changes between BM25 and vector score.
- Corpus refresh that adds many near-duplicate documents.
Ship Gate
Ship only when:
- Must-include recall does not regress.
- Empty-result rate does not increase for high-intent queries.
- Top critical docs remain in top 3 or top 5 where expected.
- Any intentional ranking shift is documented with examples.
- Rollback is available: old index, old embedding model, or old ranker config.