What is Anthropic Prompt Caching — Cut AI API Costs 90%?

Use Anthropic's prompt caching to reduce Claude API costs by up to 90%. Cache system prompts, tool definitions, and long documents across requests for massive savings.

Is Anthropic Prompt Caching — Cut AI API Costs 90% free to use?

Yes. Anthropic Prompt Caching — Cut AI API Costs 90% is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Anthropic Prompt Caching — Cut AI API Costs 90%?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Anthropic Prompt Caching — Cut AI API Costs 90%

What is Prompt Caching?

Prompt caching lets you cache frequently reused content (system prompts, tool definitions, large documents) across API requests. Instead of re-processing the same tokens every time, Claude reads them from cache at 1/10th the cost. For applications with long system prompts or RAG context, this can reduce costs by 90%.

Answer-Ready: Anthropic's prompt caching reduces Claude API costs up to 90%. Cache system prompts, tool definitions, and documents across requests. Cached tokens cost 1/10th of input tokens. 5-minute TTL, auto-extended on hit. Essential for production AI applications with recurring context.

Best for: Teams running Claude API at scale with recurring system prompts. Works with: Claude Sonnet, Opus, Haiku via Anthropic API. Setup time: Add one field to existing code.

How It Works

Pricing Impact

Token Type	Cost (Sonnet)	Savings
Input (no cache)	$3/M tokens	Baseline
Cache write	$3.75/M tokens	-25% first time
Cache read	$0.30/M tokens	90% savings
Output	$15/M tokens	No change

Cache TTL

Default: 5 minutes
Auto-extended: Each cache hit resets the 5-minute timer
Minimum cacheable: 1,024 tokens (Sonnet), 2,048 tokens (Opus)

Cacheable Content

1. System Prompts

system=[{
    "type": "text",
    "text": "Your 2000-token system prompt here...",
    "cache_control": {"type": "ephemeral"},
}]

2. Tool Definitions

tools = [
    {"name": "search", "description": "...", "input_schema": {...},
     "cache_control": {"type": "ephemeral"}},
]

3. Long Documents (RAG Context)

messages = [
    {"role": "user", "content": [
        {"type": "text", "text": "Here is the full codebase:\n" + large_document,
         "cache_control": {"type": "ephemeral"}},
        {"type": "text", "text": "Now answer: what does the auth module do?"},
    ]},
]

4. Multi-Turn with Cached Prefix

# Turn 1: Cache the document
msg1 = {"role": "user", "content": [
    {"type": "text", "text": document, "cache_control": {"type": "ephemeral"}},
    {"type": "text", "text": "Summarize this document"},
]}

# Turn 2: Document is already cached, only new question costs full price
msg2 = {"role": "user", "content": "Now list the key findings"}

Cost Calculator

Scenario	Without Cache	With Cache	Savings
4K system prompt, 100 requests	$1.20	$0.24	80%
10K RAG context, 50 requests	$1.50	$0.19	87%
20K tools, 200 requests	$12.00	$1.35	89%

Best Practices

Cache the longest, most stable content first — System prompts and tool definitions change rarely
Order matters — Cached content must be a prefix. Put cached content before dynamic content
Monitor cache hits — Check usage.cache_creation_input_tokens and usage.cache_read_input_tokens in response
Minimum size — Content must be >= 1,024 tokens to be cacheable
Keep cache warm — If requests are >5 minutes apart, cache expires

FAQ

Q: Does caching affect response quality? A: No. Caching is purely an optimization — the model sees identical input.

Q: Can I cache across different conversations? A: Yes, if the cached prefix is identical (same system prompt + tools), it hits the cache regardless of the user message.

Q: Does Claude Code use prompt caching? A: Yes, Claude Code automatically uses prompt caching for CLAUDE.md content, tool definitions, and conversation history.

Anthropic Prompt Caching — Cut AI API Costs 90%

Use it first, then decide how deep to go

What is Prompt Caching?

How It Works

Pricing Impact

Cache TTL

Cacheable Content

1. System Prompts

2. Tool Definitions

3. Long Documents (RAG Context)

4. Multi-Turn with Cached Prefix

Cost Calculator

Best Practices

FAQ

Source & Thanks

Discussion

Related Assets

Magika — Google AI File Type Detection Tool

Cursor Tab — AI Autocomplete That Predicts Your Next Edit

Llama Index — Data Framework for LLM Applications