判定清单
- cache key 包含 model 和 prompt schema version。
- 用户输入不进入可复用前缀 key。
- cache 有 TTL 或随部署版本失效。
- 至少用 20 个真实任务对比 cached / uncached 输出。
- 日志记录 hit rate、省下的 input tokens、首 token 延迟和 stale reject。
LLM prompt caching techniques for agents and apps. Covers stable prefixes, cache keys, TTLs, metrics, and cached-output validation.
这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。
npx -y tokrepo@latest install bf4d41a0-4a12-4f95-83e3-417a6ddae333 --target codex先 dry-run 确认安装计划,再运行此命令。
Helicone Cache short-circuits identical LLM requests at the proxy. Set Helicone-Cache-Enabled header, exact-match responses come back in ms at zero cost.
One-click prompt to upgrade your AI agent memory system to Karpathy LLM Wiki pattern. Send to Claude Code / Cursor / Windsurf — auto audits, compiles fragments, resolves contradictions, builds structured wiki.
MCP tool calling latency runbook for agents. Measures tools/list p95, separates server latency from network delay, and defines pause rules.
Embedding drift monitoring runbook for RAG and agent search. Uses golden queries, recall@K, rank delta, and rollback gates.