How can I reduce Claude Code token costs?

Combine three strategies: install agent skills to replace 80-200 token repetitive prompts with 3-token slash commands, run /compact every 10-15 turns to reduce context from 80K to 12K tokens, and use specialized skills like Systematic Debugging to cut iteration loops. Teams adopting all three report 45-55% monthly token savings.

What's the single most impactful thing to reduce token costs?

Run /compact regularly. Context compression delivers the largest absolute savings because it reduces the cost of every subsequent message. Compacting 80K tokens down to 12K saves 68K tokens per following turn — after 3 more turns that's 204K tokens saved, worth about $3.06 at Opus pricing.

How much can I save by using agent skills?

Real benchmarks show 55-65% token reduction across common tasks: code review (45K to 18K tokens, 60% savings), backend debugging (92K to 35K, 62%), feature planning (68K to 28K, 59%). For a full-day session, 320K tokens drops to 145K — saving roughly $75/month per developer at Opus pricing, or $900/year.

When should I run /compact in Claude Code?

After completing a major task, when context exceeds 40,000 tokens, before switching topics in the same session, and after reading large files you no longer need in full. A good rule: compact every 10-15 turns. Each compaction saves thousands of tokens on every subsequent message.

How to Cut Claude Code Token Costs by 50% with Agent Skills

Learn how to reduce Claude Code token consumption by 50% using three proven strategies — reusable agent skills, context compression, and specialized debugging workflows that eliminate wasteful iteration loops.

The Token Cost Problem

Claude Code is powered by large language models, and every interaction consumes tokens. With Anthropic's current pricing, the numbers add up quickly:

Claude Opus input: $15 per million tokens
Claude Opus output: $75 per million tokens
Claude Sonnet input: $3 per million tokens
Claude Sonnet output: $15 per million tokens

A typical development session might involve 50,000–150,000 input tokens and 10,000–30,000 output tokens. At Opus pricing, that means a single focused session can cost $1.50–$4.50. Do that 20 times a month and you're looking at $30–$90 per month in pure token costs — and that's conservative. Complex projects with large codebases easily hit $150–$300 per month.

The root causes of high token consumption are:

Repetitive prompts — typing the same long instructions every time you want Claude to do a specific task
Context bloat — conversation history growing until Claude re-reads thousands of lines of irrelevant context
Iteration loops — Claude guessing at your intent, getting it wrong, then you correcting it across 5–10 back-and-forth messages

Each of these problems has a concrete solution. Let's walk through them.

Strategy 1: Replace Repetitive Prompts with Skills

Every time you type a multi-paragraph prompt explaining how you want Claude Code to review your code, run your tests, or format your commits, you're spending tokens on instructions that could be stored once and reused forever.

The Problem

Consider a typical code review prompt that developers type repeatedly:

Review this code for bugs, security issues, and performance problems.
Check for SQL injection, XSS vulnerabilities, and auth bypass risks.
Verify error handling covers all edge cases. Look for N+1 query
patterns. Check that all database transactions are properly committed
or rolled back. Format your findings as a markdown table with severity,
location, description, and suggested fix.

That's roughly 80 tokens every single time. If you run code reviews 5 times a day, that's 400 tokens per day just on the instruction — not counting the code itself. Over a month, that's 12,000 tokens on the same repeated instruction.

The Solution

Install an agent skill that encodes the instructions once. The skill sits in .claude/commands/ and activates with a short trigger:

tokrepo install e108cf5c-c34e-4d27-a694-66a693301e87

Now instead of typing 80+ tokens of instructions, you type:

/gsd-code-review

That's 3 tokens. The skill file is loaded once into context and reused across the entire session. Even better, the skill contains more thorough instructions than you'd bother typing manually — covering edge cases, output formatting, and verification steps.

Real Savings Calculation

Scenario	Without Skills	With Skills	Savings
Code review prompt	80 tokens x 5/day	3 tokens x 5/day + 500 token skill load	92% after first use
Debug workflow prompt	120 tokens x 3/day	3 tokens x 3/day + 800 token skill load	88% after first use
Planning prompt	200 tokens x 2/day	3 tokens x 2/day + 1,200 token skill load	85% after first use

💡

Recommended Skills for Cost Savings

Two skills that deliver the biggest return on token investment:

GSD (Get Shit Done) — replaces long project-planning prompts with structured /gsd-plan-phase, /gsd-execute-phase, and /gsd-next commands. A single planning prompt that would cost 200+ tokens now costs 3 tokens per invocation.
Planning Skill — encodes your planning methodology so you don't re-explain it each session. Especially valuable for teams where multiple developers need consistent planning output.

Browse the full skills collection for more options tailored to your workflow.

Strategy 2: Compress Context with Summarization

Even with skills installed, your conversation context grows with every message. Claude Code reads the entire conversation history on each turn — meaning a conversation with 50,000 tokens of history costs 50,000 input tokens per message, even if you're asking a simple question.

The Problem

Here's a real-world scenario:

Turn 1: You ask Claude to read 3 files (8,000 tokens of file content added to context)
Turn 5: You've exchanged 20,000 tokens of conversation
Turn 10: Context is at 45,000 tokens
Turn 15: Context is at 80,000 tokens — every new message now costs 80,000 input tokens just to process the history

At Opus pricing ($15/million input tokens), Turn 15 costs $1.20 in input tokens alone — for a single message. The cumulative cost of a 20-turn conversation can easily exceed $10.

The Solution

Use the /compact command strategically. This command tells Claude Code to summarize the conversation history, compressing it into a fraction of the original size while retaining the key decisions and context.

When to compact:

After completing a major task within a session
When you notice the context growing beyond 40,000 tokens
Before switching to a different topic in the same session
After reading large files that you no longer need in full

Token count before and after compaction:

Scenario	Before `/compact`	After `/compact`	Reduction
15-turn code review session	82,000 tokens	12,000 tokens	85%
Large file exploration	65,000 tokens	8,000 tokens	88%
Multi-phase project planning	120,000 tokens	18,000 tokens	85%

💡

Advanced Context Management

Beyond /compact, these practices reduce context bloat:

Start new conversations for new tasks — don't reuse a bloated session for unrelated work
Be specific about which files to read — "read lines 50–120 of server.ts" instead of "read server.ts" avoids loading thousands of irrelevant lines
Use .claudeignore — exclude node_modules/, build artifacts, and large data files from Claude's file search to prevent accidental context inflation
Front-load context — provide all relevant files and constraints in your first message rather than drip-feeding them across 5 messages, which forces Claude to re-read growing context each time

Strategy 3: Reduce Iterations with Specialized Skills

The most expensive token waste isn't prompt repetition or context bloat — it's iteration loops. When Claude misunderstands your intent, you spend 3–10 correction messages, each one re-processing the entire conversation context. A single misunderstanding in a large context can cost $5–$15 in wasted tokens.

The Problem

Debugging without structure is a classic token-burner:

Turn 1: "Fix this bug" (Claude tries approach A — fails)
Turn 2: "That didn't work, try X instead" (Claude tries approach X — partially works)
Turn 3: "Close, but you broke Y" (Claude fixes Y but reintroduces the original bug)
Turn 4: "No, you need to keep the fix from Turn 2 but also fix Y"
Turn 5: "Let me explain the full context again..."

Each turn costs more than the last because the context keeps growing. By Turn 5, you've spent 5x the tokens that a correct first attempt would have cost.

The Solution

Specialized skills encode methodology that gets it right the first time — or at least within 1–2 iterations instead of 5–10.

Systematic Debugging Skill — instead of ad-hoc "fix this" prompts, this skill walks Claude through a structured process:

tokrepo install 78ed006e-d10d-4efe-804b-2e19a76cf2bb

Then use /gsd-debug to trigger a systematic debugging workflow:

Reproduce — verify the bug exists and is consistent
Hypothesize — generate 3–5 possible root causes ranked by likelihood
Test — check each hypothesis with minimal code changes
Fix — apply the fix for the confirmed root cause
Verify — run tests to confirm the fix doesn't break anything

This structured approach typically resolves bugs in 2–3 turns instead of 5–10 — saving 60–70% of tokens on debugging tasks.

Skill Creator — when you find yourself writing the same complex prompt more than twice, this meta-skill helps you turn it into a reusable skill in minutes:

tokrepo install 0b7c0a41-97e1-4187-9cc5-4dc32d91a9cd

Use /skill-creator to generate a new skill from a description. The skill creator encodes best practices for skill writing — frontmatter, trigger conditions, instructions, and examples — so your custom skills work correctly on the first try instead of requiring 3–4 revision cycles.

⚠️

Before vs After: Real Token Usage Comparison

We tracked token usage across five common development tasks, comparing workflows without skills to workflows using the strategies described above.

Task	Without Skills (tokens)	With Skills (tokens)	Savings %
Code review (3 files)	45,000	18,000	60%
Debug a backend API bug	92,000	35,000	62%
Plan a new feature (5 phases)	68,000	28,000	59%
Create a new agent skill	34,000	12,000	65%
Full-day development session	320,000	145,000	55%

Dollar impact at Opus pricing ($15/$75 per million tokens):

Metric	Without Skills	With Skills
Average daily token usage	320,000	145,000
Daily cost (input @ $15/M)	$4.80	$2.18
Daily cost (output @ $75/M, est. 30K output)	$2.25	$1.13
Monthly cost (20 work days)	$141.00	$66.20
Annual savings	—	$897.60

That's nearly $900 per year in savings for a single developer — and the savings scale linearly with team size.

FAQ

Do agent skills themselves consume tokens?

Yes, but only once per session. When a skill is triggered, its Markdown content is loaded into the conversation context. A typical skill is 500–1,500 tokens. After that initial load, subsequent triggers in the same session cost only the 2–3 tokens of the slash command. The net savings far exceed the one-time cost — usually by Turn 2 or Turn 3 of using the skill.

What's the single most impactful thing I can do to reduce token costs?

Run /compact regularly. Context compression delivers the largest absolute savings because it reduces the cost of every subsequent message in the session. If your context is at 80,000 tokens and you compact to 12,000, you save 68,000 tokens on every following turn. After just 3 more turns, that's 204,000 tokens saved — worth $3.06 at Opus input pricing.

Can I combine all three strategies?

Absolutely — and they're designed to work together. Install skills (Strategy 1) to reduce repetitive prompts, use /compact (Strategy 2) to keep context lean, and rely on specialized skills (Strategy 3) to minimize iteration loops. Teams that adopt all three consistently report 45–55% reduction in monthly token costs compared to unoptimized workflows.

Next Steps

Ready to start saving tokens? Here are your next actions:

Install your first skill — start with GSD or the Planning Skill for immediate impact
Browse the full catalog — explore the skills collection to find skills for your specific workflow
Learn to build your own — read How to Create Your First Agent Skill and turn your most-repeated prompts into reusable skills
Compare your options — check Skills vs MCP vs Rules to understand when skills are the right choice
See what's popular — our 15 Best Claude Code Skills ranking highlights the highest-impact skills tested on real projects

Frequently Asked Questions