What is Prompt Caching?
Anthropic's prompt caching caches repeated content (system prompts, tool definitions, long documents) for reuse across requests; cache reads cost only 1/10.
TL;DR: Anthropic prompt caching. Cache system prompts / tools / documents. Read cost is 1/10. 5-minute TTL auto-renews. Must-use for production Claude apps. Up to 90% savings.
Cacheable Content
- System prompts — most common case
- Tool definitions — big wins with many tools
- RAG documents — multi-turn Q&A over the same doc
- Multi-turn conversation prefix — cache early context
Best Practices
- Cache the longest, most stable content first
- Cached content must be a prefix
- Monitor cache_read_input_tokens to confirm hits
- Minimum 1024 tokens
FAQ
Q: Does it affect quality? A: No — the model sees identical input either way.
Q: Does Claude Code use it? A: Yes — CLAUDE.md and tool definitions are cached automatically.