PromptsApr 8, 2026·3 min read

Anthropic Prompt Caching — Cut AI API Costs 90%

Use Anthropic's prompt caching to reduce Claude API costs by up to 90%. Cache system prompts, tool definitions, and long documents across requests for massive savings.

PR
Prompt Lab · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "You are an expert code reviewer... (long system prompt)",
        "cache_control": {"type": "ephemeral"},  # Cache this!
    }],
    messages=[{"role": "user", "content": "Review this code..."}],
)
# First request: full price
# Subsequent requests: 90% cheaper for cached portion

What is Prompt Caching?

Prompt caching lets you cache frequently reused content (system prompts, tool definitions, large documents) across API requests. Instead of re-processing the same tokens every time, Claude reads them from cache at 1/10th the cost. For applications with long system prompts or RAG context, this can reduce costs by 90%.

Answer-Ready: Anthropic's prompt caching reduces Claude API costs up to 90%. Cache system prompts, tool definitions, and documents across requests. Cached tokens cost 1/10th of input tokens. 5-minute TTL, auto-extended on hit. Essential for production AI applications with recurring context.

Best for: Teams running Claude API at scale with recurring system prompts. Works with: Claude Sonnet, Opus, Haiku via Anthropic API. Setup time: Add one field to existing code.

How It Works

Pricing Impact

Token Type Cost (Sonnet) Savings
Input (no cache) $3/M tokens Baseline
Cache write $3.75/M tokens -25% first time
Cache read $0.30/M tokens 90% savings
Output $15/M tokens No change

Cache TTL

  • Default: 5 minutes
  • Auto-extended: Each cache hit resets the 5-minute timer
  • Minimum cacheable: 1,024 tokens (Sonnet), 2,048 tokens (Opus)

Cacheable Content

1. System Prompts

system=[{
    "type": "text",
    "text": "Your 2000-token system prompt here...",
    "cache_control": {"type": "ephemeral"},
}]

2. Tool Definitions

tools = [
    {"name": "search", "description": "...", "input_schema": {...},
     "cache_control": {"type": "ephemeral"}},
]

3. Long Documents (RAG Context)

messages = [
    {"role": "user", "content": [
        {"type": "text", "text": "Here is the full codebase:\n" + large_document,
         "cache_control": {"type": "ephemeral"}},
        {"type": "text", "text": "Now answer: what does the auth module do?"},
    ]},
]

4. Multi-Turn with Cached Prefix

# Turn 1: Cache the document
msg1 = {"role": "user", "content": [
    {"type": "text", "text": document, "cache_control": {"type": "ephemeral"}},
    {"type": "text", "text": "Summarize this document"},
]}

# Turn 2: Document is already cached, only new question costs full price
msg2 = {"role": "user", "content": "Now list the key findings"}

Cost Calculator

Scenario Without Cache With Cache Savings
4K system prompt, 100 requests $1.20 $0.24 80%
10K RAG context, 50 requests $1.50 $0.19 87%
20K tools, 200 requests $12.00 $1.35 89%

Best Practices

  1. Cache the longest, most stable content first — System prompts and tool definitions change rarely
  2. Order matters — Cached content must be a prefix. Put cached content before dynamic content
  3. Monitor cache hits — Check usage.cache_creation_input_tokens and usage.cache_read_input_tokens in response
  4. Minimum size — Content must be >= 1,024 tokens to be cacheable
  5. Keep cache warm — If requests are >5 minutes apart, cache expires

FAQ

Q: Does caching affect response quality? A: No. Caching is purely an optimization — the model sees identical input.

Q: Can I cache across different conversations? A: Yes, if the cached prefix is identical (same system prompt + tools), it hits the cache regardless of the user message.

Q: Does Claude Code use prompt caching? A: Yes, Claude Code automatically uses prompt caching for CLAUDE.md content, tool definitions, and conversation history.

🙏

Source & Thanks

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets