# LLM Prompt Caching — Cache-Key Design Runbook > LLM prompt caching techniques for agents and apps. Covers stable prefixes, cache keys, TTLs, metrics, and cached-output validation. ## Install Copy the content below into your project: --- title: LLM Prompt Caching — Cache-Key Design Runbook asset_kind: knowledge target_tools: [codex, claude_code, cursor, gemini_cli] install_mode: single entrypoint: README.md --- # LLM Prompt Caching — Cache-Key Design Runbook Use this runbook when an agent or backend service is spending too much time and money recomputing the same LLM context. It collects practical LLM prompt caching techniques for cache-key design, TTLs, prefix boundaries, and regression checks. The goal is not to cache every answer. The goal is to isolate the stable prompt prefix, build a deterministic cache key, and bypass only the parts that are safe to reuse. ## Quick Use Start by splitting the request into stable and volatile sections: ```text stable_prefix: system policy product rules reusable examples schema contract volatile_tail: user message live page content current timestamp file diff secrets or account-specific data ``` Then compute a cache key only from the stable prefix and the model/runtime settings: ```js import { createHash } from "node:crypto"; function promptCacheKey({ model, system, examples, schemaVersion }) { const payload = JSON.stringify({ model, schemaVersion, system: system.trim(), examples, }); return createHash("sha256").update(payload).digest("hex"); } ``` ## What To Cache Good prompt-cache candidates: - Long system prompts that change only when the app ships. - Policy packs, output schemas, rubrics, and static examples. - Tool descriptions when the tool catalog is stable. - Retrieval snippets that are shared by many users and have a clear version. Bad prompt-cache candidates: - User-specific files, emails, tickets, or private context. - Prompts containing API tokens or session cookies. - Live market, legal, medical, or breaking-news facts. - Anything where a stale answer can cause a destructive action. The most useful LLM prompt caching techniques are boring: stable-prefix extraction, schema-versioned keys, TTL or deploy-version invalidation, and cached-vs-uncached evaluation. Avoid clever semantic cache reuse until those basics are measured. ## Validation Checklist Before enabling prompt caching, verify these gates: 1. The cache key includes model name and prompt schema version. 2. Volatile user input is excluded from the reusable prefix key. 3. Cache entries have a TTL or deploy-version invalidation rule. 4. Evaluation compares cached and uncached outputs on at least 20 real tasks. 5. Logs report hit rate, saved input tokens, first-token latency, and stale-cache rejects. ## Common Failure Modes - **Over-broad key**: two different policy versions share a cache entry. - **Under-broad key**: every user message creates a unique key, so hit rate stays near zero. - **Hidden volatility**: the system prompt embeds today's date or account-specific state. - **Silent stale behavior**: the cache works technically, but no metric shows wrong reuse. ## Source & Thanks This is an original TokRepo runbook by William Wang. It aligns with the general prompt caching concepts described in the [Anthropic prompt caching docs](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) and the [OpenAI prompt caching guide](https://platform.openai.com/docs/guides/prompt-caching). # LLM Prompt Caching:缓存键设计运行手册 当 Agent 或后端服务反复为同一段 LLM 上下文付费时,用这份手册。目标不是缓存所有答案,而是拆出稳定 prompt 前缀,构造确定性的 cache key,只复用安全的部分。 ## 快速使用 先把请求拆成两段: - 稳定前缀:system policy、产品规则、输出 schema、固定 examples。 - 易变尾部:用户消息、当前网页、时间戳、文件 diff、账号私有信息。 cache key 应该包含模型名、schema 版本、稳定 system prompt 和 examples;不应该包含用户私有输入,也不应该缓存含 token/cookie 的内容。 ## 判定清单 1. cache key 包含 model 和 prompt schema version。 2. 用户输入不进入可复用前缀 key。 3. cache 有 TTL 或随部署版本失效。 4. 至少用 20 个真实任务对比 cached / uncached 输出。 5. 日志记录 hit rate、省下的 input tokens、首 token 延迟和 stale reject。 --- Source: https://tokrepo.com/en/workflows/llm-prompt-caching-cache-key-design-runbook-bf4d41a0 Author: henuwangkai