Esta página se muestra en inglés. Una traducción al español está en curso.
KnowledgeMay 19, 2026·2 min de lectura

LLM Prompt Caching — Cache-Key Design Runbook

LLM prompt caching techniques for agents and apps. Covers stable prefixes, cache keys, TTLs, metrics, and cached-output validation.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Knowledge
Instalación
Single
Confianza
Publisher verificado
Entrada
README.md
Comando CLI universal
npx tokrepo install bf4d41a0-4a12-4f95-83e3-417a6ddae333

What To Cache

Good prompt-cache candidates:

  • Long system prompts that change only when the app ships.
  • Policy packs, output schemas, rubrics, and static examples.
  • Tool descriptions when the tool catalog is stable.
  • Retrieval snippets that are shared by many users and have a clear version.

Bad prompt-cache candidates:

  • User-specific files, emails, tickets, or private context.
  • Prompts containing API tokens or session cookies.
  • Live market, legal, medical, or breaking-news facts.
  • Anything where a stale answer can cause a destructive action.

The most useful LLM prompt caching techniques are boring: stable-prefix extraction, schema-versioned keys, TTL or deploy-version invalidation, and cached-vs-uncached evaluation. Avoid clever semantic cache reuse until those basics are measured.

Validation Checklist

Before enabling prompt caching, verify these gates:

  1. The cache key includes model name and prompt schema version.
  2. Volatile user input is excluded from the reusable prefix key.
  3. Cache entries have a TTL or deploy-version invalidation rule.
  4. Evaluation compares cached and uncached outputs on at least 20 real tasks.
  5. Logs report hit rate, saved input tokens, first-token latency, and stale-cache rejects.

Common Failure Modes

  • Over-broad key: two different policy versions share a cache entry.
  • Under-broad key: every user message creates a unique key, so hit rate stays near zero.
  • Hidden volatility: the system prompt embeds today's date or account-specific state.
  • Silent stale behavior: the cache works technically, but no metric shows wrong reuse.
🙏

Fuente y agradecimientos

This is an original TokRepo runbook by William Wang. It aligns with the general prompt caching concepts described in the Anthropic prompt caching docs and the OpenAI prompt caching guide.

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados