How to Reduce Token Usage in Claude: 10 Tactics to Cut API Costs in 2026
When I first started prompting on Claude’s API, I wasn’t concerned about tokens at all. I was focused on what the API could do. It performed well, it wasn’t lagging, and everything seemed to be fine.
Then the invoice came.
It wasn’t catastrophic. But it made me ask myself a question I should’ve asked much sooner: Where were all these tokens actually going?
Well, quite a few weren’t doing anything useful at all. Large amounts of information that wasn’t relevant any more due to the way the prompt was designed. Repetitive history dumps at every prompt input, regardless of the user’s actual query. Loading 18 different tool schemas into every prompt input, no matter how specific the request.
It made me see Claude in a new light, one where token efficiency isn’t an afterthought but is instead an integral part of the architecture from the get-go.
This guide contains the 10 most impactful tips for cutting your Claude token count down. This document has been written in a way that I wish others would’ve done for me, by including the reasoning behind it, too.

10 Proven Strategies to reduce Token Usage in Claude:
1) Convert PDFs to plain .md before uploading.
Claude reads markdown natively and skips all the formatting overhead a PDF carries.
2) Crop screenshots before sending them to Claude.
A full-screen grab costs around 1,300 tokens — a tight crop of the relevant area drops that below 100.
3) Use Chat for planning and thinking out loud.
The actual file building and heavy output work belongs in Cowork, where the context is managed differently.
4) Open every session with "Read my folder. Ask me questions first."
It forces Claude to orient itself before generating anything, which kills a surprising number of wasted turns.
5) When something goes wrong, fix only the broken piece.
Asking Claude to redo the entire output from scratch is one of the most expensive habits you can have.
6) Combine three related tasks into a single message instead of sending them one by one.
You burn one context reload instead of three, and the coherence is usually better anyway.
7) Use the Edit button on your previous message rather than typing a follow-up prompt.
A new message appends to the history; an edit replaces it, keeping the thread clean.
8) The moment you shift topics, open a fresh chat.
Carrying unrelated context into a new subject is the fastest way to hit limits you should never have reached.
9) Every 15 to 20 messages, paste a short summary of where things stand and start a new session.
It sounds tedious until you realise how much dead weight accumulates in a long thread.
10) Sonnet handles the routine stuff just fine.
Save Opus for the tasks where the difference in reasoning quality actually shows up in the output.