Every Word Costs Money. Here's How AI Actually Charges You.
Published: January 27, 2026 - 12 min read
This is Part 2 of the Tokenomics for Humans series. If you haven't read Part 1, I recommend starting there for context on why this series exists and who it's for.
In Part 1, I gave you homework.
I asked you to think about the last time you used Claude or ChatGPT. How long was your conversation? Did you upload any files? Did you ask multiple questions? Did the AI ever seem to "forget" something you told it?
If you did that exercise, you're about to have several "aha" moments.
If you didn't... well, that's okay. You're about to understand anyway.
Today, we're covering the single most important concept in this entire series: tokens.
If you understand tokens, you understand 50% of everything in that Deloitte report. You understand why your AI subscription costs what it costs. You understand why sometimes the AI seems "dumber" at the end of a long conversation. You understand why some tasks burn through your usage limits while others barely make a dent.
Let's demystify this.
The Problem: Computers Don't Read Like You Do
When you type "Hello, how are you?" you see seven words. Three short ones, four slightly longer ones. Your brain processes this instantly. You don't think about it.
But here's the thing: computers can't read words the way you do.
Think about it. A computer processes information as numbers. Everything, from the photo on your phone to the music you're streaming, is just numbers at the hardware level. Text is no exception.
So when you send a message to Claude or ChatGPT, the AI can't just "read" your words. It needs to convert your text into something it can actually process.
That's where tokens come in.
What is a Token?
A token is a chunk of text that the AI processes as a single unit.
Sometimes a token is a whole word. Sometimes it's part of a word. Sometimes it's just a punctuation mark or a space.
Let me show you what I mean.
When you type:
"Hello, how are you today?"
You see a simple question. Seven words, easy.
But the AI sees this:
TOKEN BREAKDOWN:
"Hello" → Token #1
"," → Token #2
" how" → Token #3
" are" → Token #4
" you" → Token #5
" today" → Token #6
"?" → Token #7
Total: 7 tokens
Notice something interesting? The spaces are attached to the words that follow them. "how" is actually " how" (with a leading space). The comma is its own token. The question mark is its own token.
A simple 7-word question = 7 tokens.
But here's where it gets more interesting. Let's try a longer word:
"Unbelievable"
TOKEN BREAKDOWN:
"Un" → Token #1
"believ" → Token #2
"able" → Token #3
Total: 3 tokens
One word. Three tokens. The AI broke it into chunks it recognizes from its training data.
The Quick Math
As a rough estimate: 1 token is approximately 4 characters or about 0.75 words.
So:
- 100 words is roughly 130-150 tokens
- A typical email (300 words) is roughly 400 tokens
- A 10-page document could be 3,000-4,000 tokens
This isn't exact. Some words tokenize efficiently (common words like "the" or "is" are single tokens). Others tokenize poorly (technical jargon, unusual names, non-English words often break into many small tokens).
But the rough math gives you a feel for scale.
From Words to Numbers: What the AI Actually Sees
Here's the part that really clicked for me.
Each token gets converted to a unique number. The AI doesn't see "Hello" as a word. It sees it as a specific number in its vocabulary.
WHAT YOU TYPE: "Hello, how are you today?"
WHAT YOU SEE: A friendly question
WHAT THE AI SEES: [15339, 11, 703, 527, 345, 2067, 30]
That sequence of numbers? That's your message in AI-speak.
The AI processes these numbers through its neural network, does a massive amount of mathematical computation, and generates a response. That response comes out as another sequence of numbers, which then gets converted back into words you can read.
THE FULL CYCLE:
Your words → Tokens → Numbers → AI Processing → Numbers → Tokens → AI's words
Every step in this process requires computational resources. And computational resources cost money.
Why Tokens Matter for Cost: The Billing Meter
Here's the critical insight that changes everything:
AI companies charge based on tokens, not words.
Every token you send (input) costs money. Every token the AI generates (output) costs money.
THE TOKEN BILLING METER
============================================================
YOUR MESSAGE AI'S RESPONSE
============ =============
"Explain what tokens are "Tokens are the fundamental
in simple terms" units that AI systems use
to process text. Think of
them as chunks..."
= 7 tokens IN = 150 tokens OUT
(you pay for these) (you pay for these too)
TOTAL COST = Input Tokens + Output Tokens
= 7 + 150
= 157 tokens billed
============================================================
This is why a short question with a long answer costs more than a long question with a short answer.
If you ask: "What is 2+2?" (4 tokens) And the AI responds: "4" (1 token) Total: 5 tokens
If you ask: "What is 2+2?" (4 tokens) And the AI responds: "The answer to 2+2 is 4. This is a basic arithmetic operation..." (50 tokens) Total: 54 tokens
Same question. Very different token costs.
Real Pricing: What Tokens Actually Cost
Let me give you some real numbers so this isn't abstract.
As of early 2026, here are approximate API prices for popular models:
| Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3 Opus | $15.00 | $75.00 |
Note: These are API prices, not subscription prices. More on that difference in Part 5.
Let's do some quick math.
If you send 100 messages per day, averaging 500 tokens each (input + output combined), that's:
- 50,000 tokens per day
- 1.5 million tokens per month
At Claude 3.5 Sonnet rates, that would cost roughly $4.50-$22.50 per month depending on your input/output ratio.
Now you understand why your $20/month Claude Pro subscription is actually a pretty good deal. You're paying a flat rate instead of per-token pricing.
But here's the catch: those subscription plans still have limits. And those limits? They're measured in... you guessed it... tokens.
What is Inference?
Now let's tackle another term you'll see constantly: inference.
Inference = the AI generating a response to your input.
That's it. Every time you send a message and the AI replies, that's one "inference."
The word comes from the idea that the AI is "inferring" what response would make sense based on:
- Your input
- Everything it learned during training
- The patterns it recognizes
When people talk about "inference costs," they're talking about the cost of the AI actually doing its job: reading your tokens and generating response tokens.
Simple vs. Complex Inference
Not all inferences are created equal.
Simple inference:
You: "What's 2 + 2?"
AI: "4"
Tokens used: ~5
Processing: Minimal
Cost: Tiny
Complex inference:
You: "Analyze why this business strategy might fail
and suggest three alternatives with pros and cons"
AI: [Considers multiple angles]
[Weighs various factors]
[Structures a detailed response]
[Writes 500+ words of analysis]
Tokens used: ~700
Processing: Extensive
Cost: Significantly higher
The complexity of the task affects how many tokens the AI uses to respond. More complex tasks = more output tokens = higher cost.
The Hidden Cost: Thinking Tokens
Some AI models have special "thinking" or "reasoning" modes. Claude has Extended Thinking.
Here's what most people don't realize: the thinking process consumes tokens even if you don't see it.
When Claude uses Extended Thinking, it might generate 5,000-10,000 tokens of internal reasoning before giving you a 500-token response. You only see the 500 tokens. But you're paying for all 10,500.
I wrote about this in detail in The 6 Hidden Token Drains Destroying Your Claude Quota. Extended Thinking can cost 5-10x more than you expect if you're not aware of it.
The Token Economy in Action
Let me paint a picture of how this all connects.
Imagine you're a freelance writer using Claude to help with client work. You have a Pro subscription ($20/month) with a monthly token limit.
Monday morning:
-
You upload a 15-page client brief (approximately 4,500 tokens)
-
You ask Claude to summarize it
- Input billed: 4,500 (document) + 10 (your request) = 4,510 tokens
- Output billed: 300 tokens
- Subtotal: 4,810 tokens
-
You ask three follow-up questions in the same conversation
- Each question re-reads the entire document + conversation history
- Question 1: 4,810 (previous) + 100 (question) + 200 (output) = 5,110 tokens
- Question 2: 5,110 (previous) + 100 (question) + 200 (output) = 5,410 tokens
- Question 3: 5,410 (previous) + 100 (question) + 200 (output) = 5,710 tokens
- Subtotal: ~16,230 tokens
Monday morning total: ~21,040 tokens billed
Notice how each follow-up question costs more than the last? That's because the AI re-reads everything each time. (This will make complete sense after Part 3.)
By end of Monday:
- You've used roughly 20,000+ tokens on one client project
- That's about 1.3% of a typical monthly limit
- And you have 20 more workdays in the month
Suddenly, hitting usage limits doesn't seem so mysterious, does it?
What This Means For You
If You're a Curious User
You now understand why you hit usage limits. It's not random. It's not the AI company being stingy. Every message you send and every response you receive is being measured in tokens.
Practical takeaway: Longer conversations cost exponentially more. If you find yourself running out of usage, try starting fresh conversations more often instead of continuing one massive thread. (More on this in Part 3.)
If You're a Freelancer or Solopreneur
You now have a framework for thinking about AI as a business expense with measurable units.
Practical takeaways:
- Complex tasks (strategy, analysis, long-form writing) consume significantly more tokens than simple tasks (quick questions, formatting, proofreading)
- If you're using AI heavily for client work, track which types of tasks burn through your limits fastest
- Consider whether your subscription tier matches your actual usage patterns
The Tokenization Trap: Why Some Things Cost More Than You'd Expect
Before we wrap up, let me share something that catches a lot of people off guard.
Not all text tokenizes equally.
Common English words are efficient. They're usually single tokens because they appeared frequently in the AI's training data.
But:
- Technical jargon often breaks into multiple tokens
- Names (especially unusual ones) can be 2-4 tokens each
- Non-English languages sometimes tokenize less efficiently
- Code can be unpredictable (common syntax = efficient, custom variable names = expensive)
If you're working in a specialized field with lots of technical terminology, you might be using more tokens than someone writing casual prose, even for the same word count.
Coming Up Next
Part 3: Context Windows
Remember how I mentioned that conversations cost exponentially more over time? That's because of something called the context window.
In Part 3, I'm going to explain:
- Why AI doesn't actually remember your conversation (it re-reads everything every time)
- Why your 10th message costs way more than your 1st message
- Why uploading a PDF and asking 10 questions means you paid for that PDF 10 times
- The "bucket that fills up" analogy that makes all of this click
This is where the token concept becomes really powerful. Once you understand context windows, you'll never think about AI the same way again.
Quick Reference: Token Cheat Sheet
Before you go, here's a quick reference you can come back to:
| Concept | What It Means |
|---|---|
| Token | A chunk of text the AI processes as one unit |
| Tokenization | The process of breaking text into tokens |
| Input tokens | Tokens in your message (you pay for these) |
| Output tokens | Tokens in AI's response (you pay for these too) |
| Inference | The AI generating a response (one "AI thinking session") |
| ~4 characters | Rough size of one token |
| ~0.75 words | Another way to estimate token count |
Your Homework for Part 3
Next time you use Claude or ChatGPT, pay attention to the length of your conversation.
Notice how the responses might get slightly slower as the conversation goes longer. Notice if the AI starts to seem less "aware" of things you mentioned earlier.
That's the context window at work. And understanding it is the key to using AI efficiently.
See you in Part 3.
As always, thanks for reading!