Cette page est affichée en anglais. Une traduction française est en cours.
KnowledgeMay 19, 2026·2 min de lecture

LLM Prompt Caching — Cache-Key Design Runbook

LLM prompt caching techniques for agents and apps. Covers stable prefixes, cache keys, TTLs, metrics, and cached-output validation.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Knowledge
Installation
Single
Confiance
Éditeur vérifié
Point d'entrée
README.md
Commande CLI universelle
npx tokrepo install bf4d41a0-4a12-4f95-83e3-417a6ddae333

What To Cache

Good prompt-cache candidates:

  • Long system prompts that change only when the app ships.
  • Policy packs, output schemas, rubrics, and static examples.
  • Tool descriptions when the tool catalog is stable.
  • Retrieval snippets that are shared by many users and have a clear version.

Bad prompt-cache candidates:

  • User-specific files, emails, tickets, or private context.
  • Prompts containing API tokens or session cookies.
  • Live market, legal, medical, or breaking-news facts.
  • Anything where a stale answer can cause a destructive action.

The most useful LLM prompt caching techniques are boring: stable-prefix extraction, schema-versioned keys, TTL or deploy-version invalidation, and cached-vs-uncached evaluation. Avoid clever semantic cache reuse until those basics are measured.

Validation Checklist

Before enabling prompt caching, verify these gates:

  1. The cache key includes model name and prompt schema version.
  2. Volatile user input is excluded from the reusable prefix key.
  3. Cache entries have a TTL or deploy-version invalidation rule.
  4. Evaluation compares cached and uncached outputs on at least 20 real tasks.
  5. Logs report hit rate, saved input tokens, first-token latency, and stale-cache rejects.

Common Failure Modes

  • Over-broad key: two different policy versions share a cache entry.
  • Under-broad key: every user message creates a unique key, so hit rate stays near zero.
  • Hidden volatility: the system prompt embeds today's date or account-specific state.
  • Silent stale behavior: the cache works technically, but no metric shows wrong reuse.
🙏

Source et remerciements

This is an original TokRepo runbook by William Wang. It aligns with the general prompt caching concepts described in the Anthropic prompt caching docs and the OpenAI prompt caching guide.

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires