Is Anthropic Prompt Caching — Cut AI API Costs 90% free to use?

Yes. Anthropic Prompt Caching — Cut AI API Costs 90% is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Anthropic Prompt Caching — Cut AI API Costs 90%?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

PromptsApr 8, 2026·3 min read

Anthropic Prompt Caching — Cut AI API Costs 90%

Name: Anthropic Prompt Caching — Cut AI API Costs 90%
Author: Prompt Lab

Use Anthropic's prompt caching to reduce Claude API costs by up to 90%. Cache system prompts, tool definitions, and long documents across requests for massive savings.

Prompt Lab · Community

TL;DR

Anthropic prompt caching lets you cache system prompts and long contexts to cut Claude API costs by up to 90%.

§01

What it is

Anthropic prompt caching is an API feature that lets you cache frequently reused content (system prompts, tool definitions, long documents) across multiple Claude API requests. Cached tokens are read at a fraction of the cost of uncached input tokens, reducing total API spending significantly for applications that reuse the same context.

This feature targets developers building applications that send the same system prompt, tool definitions, or reference documents with every request. Chatbots, code assistants, and RAG pipelines benefit the most because they repeat large context blocks across conversations.

§02

How it saves time or tokens

Without caching, every API request processes the full system prompt and context from scratch. With caching, the first request pays the full price plus a small cache write fee, but all subsequent requests read cached tokens at a 90% discount. For a 10,000-token system prompt sent across 100 requests, you pay for 10,000 tokens once instead of 1,000,000 tokens total. Cached content also reduces latency because the model does not need to reprocess it.

§03

How to use

Add cache_control to content blocks you want cached:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=1024,
    system=[{
        'type': 'text',
        'text': 'You are an expert code reviewer... (long system prompt)',
        'cache_control': {'type': 'ephemeral'}
    }],
    messages=[{'role': 'user', 'content': 'Review this function.'}]
)

Cache tool definitions:

response = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=1024,
    tools=[{
        'name': 'search_codebase',
        'description': 'Search the codebase for patterns',
        'input_schema': {'type': 'object', 'properties': {'query': {'type': 'string'}}},
        'cache_control': {'type': 'ephemeral'}
    }],
    messages=[{'role': 'user', 'content': 'Find all TODO comments.'}]
)

Check cache usage in the response:

print(response.usage.cache_creation_input_tokens)  # Tokens cached on first call
print(response.usage.cache_read_input_tokens)       # Tokens read from cache

§04

Example

# Caching a long document for RAG
import anthropic

client = anthropic.Anthropic()

long_document = open('docs/api-reference.md').read()

response = client.messages.create(
    model='claude-sonnet-4-20250514',
    max_tokens=2048,
    system=[{
        'type': 'text',
        'text': f'Reference document:\n\n{long_document}',
        'cache_control': {'type': 'ephemeral'}
    }],
    messages=[{'role': 'user', 'content': 'What authentication methods does the API support?'}]
)

§05

Related on TokRepo

AI Tools for API — Tools for working with AI APIs efficiently
Prompt Library — Reusable prompts and templates

§06

Common pitfalls

Cache has a minimum token threshold (currently 1024 tokens for Claude Sonnet); content blocks smaller than this threshold will not be cached.
Cached content expires after a TTL (time-to-live) period; for ephemeral caching, the cache lasts approximately 5 minutes of inactivity. Plan your request frequency accordingly.
Cache write tokens cost 25% more than regular input tokens; caching only saves money when the same content is reused across multiple requests.

Frequently Asked Questions

How much does prompt caching save?+

Cache read tokens cost approximately 90% less than regular input tokens. For applications that reuse the same system prompt or context across many requests, this translates to significant cost reduction. The exact savings depend on your cache hit rate and the size of cached content.

What content can I cache?+

You can cache system prompts, tool definitions, and content within message blocks. Add a cache_control field with type 'ephemeral' to any content block you want cached. The content must meet the minimum token threshold.

How long does the cache last?+

Ephemeral caches last approximately 5 minutes of inactivity. Each cache hit refreshes the TTL. If no requests use the cached content within the TTL window, it expires and the next request incurs a cache write fee.

Does caching affect response quality?+

No. Caching only affects how input tokens are processed and billed. The model produces identical responses whether content is cached or not. Caching is a performance and cost optimization, not a quality tradeoff.

Which Claude models support prompt caching?+

Prompt caching is available on Claude Sonnet, Claude Opus, and Claude Haiku via the Anthropic API. Check the Anthropic documentation for the latest model support and minimum token thresholds.

Citations (3)

Anthropic Prompt Caching Documentation— Anthropic prompt caching reduces input token costs by up to 90%
Anthropic API Reference— Cache control is set via the cache_control field in content blocks
Anthropic Pricing— Claude API models and pricing

Related on TokRepo

AI API Tools Prompt Library Featured Workflows

🙏

Source & Thanks

Anthropic Prompt Caching Docs

Pricing page

Discussion

No comments yet. Be the first to share your thoughts.