Deloitte Ran the Numbers for 3 Years. Here's When Self-Hosted AI Becomes 2x Cheaper.
Published: January 28, 2026 - 13 min read
This is Part 8 of the Tokenomics for Humans series. If you haven't read Part 7 on Jevons' Paradox, I recommend starting there.
In Part 5, I gave you three options for accessing AI: SaaS, API, and self-hosted.
In Part 6, I showed you that the invoice only tells part of the story.
In Part 7, I warned you that cheaper AI leads to more spending, not less.
Now it's time to put real numbers behind all of this.
Deloitte didn't just theorize about AI costs. They tracked actual spending across different approaches over three years. They watched organizations grow from pilots to production at scale.
The results are definitive. And if you're making infrastructure decisions, you need to see them.
The Study Setup
Deloitte analyzed three different approaches to AI infrastructure:
| Approach | What It Means | Example |
|---|---|---|
| API Access | Pay per token to AI providers | OpenAI API, Anthropic API |
| Neocloud | Rent GPU infrastructure, run open-source models | Lambda Labs, CoreWeave |
| Self-Hosted | Buy your own GPUs, run models on-premise | NVIDIA GPUs in your data center |
They tested with:
- Hardware: High-end NVIDIA GPUs (the same chips that cost $30,000-$40,000 each, as we discussed in Part 4)
- Models: Llama (open-source) and GPT-4o
- Time frame: Three years of growing AI adoption
The question: At what point does self-hosted become cheaper than paying per token?
Year 1: The Pilot Stage
Token volume: ~10 billion tokens
Use cases: Simple chatbots, FAQ assistants, basic automation
This is where most organizations start. You're experimenting. You're proving value. You're not yet committed to massive AI infrastructure.
YEAR 1 COSTS: PILOT STAGE (~10 BILLION TOKENS)
================================================================
APPROACH ANNUAL COST NOTES
-------- ----------- -----
API Access $240,000 Pay only for what you use
Neocloud $170,000 Cheaper per token than API
Self-Hosted $40,000 But requires upfront investment
================================================================
Wait, self-hosted is cheapest in Year 1?
Yes, but there's a catch. That $40,000 is the operational cost. It doesn't include the upfront capital investment of $500,000+ to build the infrastructure in the first place.
When you factor in the initial investment, self-hosted is the most expensive option in Year 1 for most organizations.
Year 1 Recommendation
For most organizations: API access.
Why? You're still learning. You don't know if AI will work for your use cases. You don't know how much you'll actually use. Paying per token gives you flexibility.
The $240,000 API bill sounds high, but it's much lower risk than a $500,000+ infrastructure investment that might not pay off.
YEAR 1: WHY API WINS
================================================================
API Self-Hosted
--- -----------
Upfront cost $0 $500K+
Year 1 operating $240K $40K
----- -----
Year 1 TOTAL $240K $540K+
Risk if AI fails Low Very High
(just stop) (stuck with hardware)
================================================================
API is 2x cheaper in Year 1 when you count upfront costs.
================================================================
Year 2: Growing Adoption
Token volume: ~300 billion tokens (30x Year 1!)
Use cases: Document search, summarization, content generation, workflow automation
This is where things get interesting. Your pilots worked. Teams are adopting AI. Usage is exploding.
Remember Jevons' Paradox? Here it is in action. Usage doesn't grow 2x or 3x. It grows 30x.
YEAR 2 COSTS: GROWING ADOPTION (~300 BILLION TOKENS)
================================================================
APPROACH ANNUAL COST CHANGE FROM YEAR 1
-------- ----------- ------------------
API Access $970,000 +304% ($730K more)
Neocloud $490,000 +188% ($320K more)
Self-Hosted $1,060,000 +2,550% (but context matters)
================================================================
Hold on. Self-hosted is the most expensive in Year 2?
Yes. This is the awkward middle period.
- You're using too much AI for API to be cheap
- But you haven't reached the scale where self-hosted infrastructure pays off
- Meanwhile, self-hosted costs spike because you're expanding capacity
Year 2: The Crossover Begins
Look at the per-token economics:
| Approach | Year 2 Cost | Cost per Billion Tokens |
|---|---|---|
| API | $970,000 | $3,233 |
| Neocloud | $490,000 | $1,633 |
| Self-Hosted | $1,060,000 | $3,533 |
Neocloud emerges as the middle-ground winner. You get better per-token economics than API without the massive self-hosted commitment.
But watch what happens next.
Year 3: Scale
Token volume: ~1 trillion tokens (100x Year 1!)
Use cases: Decision support, analytics, AI-powered products, autonomous agents
This is where the math changes completely.
YEAR 3 COSTS: SCALE (~1 TRILLION TOKENS)
================================================================
APPROACH ANNUAL COST CHANGE FROM YEAR 1
-------- ----------- ------------------
API Access $3,500,000 +1,358%
Neocloud $2,720,000 +1,500%
Self-Hosted $1,450,000 +3,525% (but cheapest!)
================================================================
Self-hosted is now 2x cheaper than API.
At 1 trillion tokens per year:
- API costs $3.50 per billion tokens
- Self-hosted costs $1.45 per billion tokens
That's $2.05 million in annual savings versus API.
Even compared to neocloud, self-hosted saves $1.27 million per year.
The 3-Year Picture
Let me show you all three years together:
3-YEAR COST COMPARISON
================================================================
YEAR API NEOCLOUD SELF-HOSTED WINNER
---- --- -------- ----------- ------
Year 1 $240K $170K $40K* (API**)
Year 2 $970K $490K $1,060K Neocloud
Year 3 $3,500K $2,720K $1,450K Self-Hosted
------ ------ ------
TOTAL $4,710K $3,380K $2,550K Self-Hosted
================================================================
* Self-hosted Year 1 excludes upfront investment
** API wins Year 1 when upfront costs are included
3-YEAR SAVINGS:
- Self-hosted vs API: $2,160,000 saved (46% cheaper)
- Self-hosted vs Neocloud: $830,000 saved (25% cheaper)
================================================================
Over three years, self-hosted saves over $2 million compared to API.
The Crossover Point
Here's the key insight from Deloitte's research:
The crossover happens around Year 3 for organizations with aggressive AI adoption.
THE CROSSOVER VISUALIZATION
================================================================
COST
^
|
| / API ($3.5M)
| /
| /
| /-----/ Neocloud ($2.7M)
| /
| /----------------------- Self-Hosted ($1.45M)
| / CROSSOVER
| / POINT
| / |
|----/-------------|--------------------------------------------
| |
+--------------------------------------------------------> TIME
Year 1 Year 2 Year 3
================================================================
Before crossover: API/Neocloud cheaper (flexibility matters)
After crossover: Self-hosted cheaper (scale matters)
================================================================
But here's the nuance: the crossover point depends on how fast you're growing.
- Slow adopters: Crossover might be Year 4 or Year 5
- Fast adopters: Crossover might be Year 2
- Massive scale from Day 1: Self-hosted might win immediately
The Strategic Decision Framework
Deloitte identified three things every organization must understand before making infrastructure decisions:
1. Know Your Workloads
Different AI tasks have vastly different token requirements.
| Workload Type | Token Intensity | Example |
|---|---|---|
| Simple Q&A | Low | FAQ chatbot |
| Document summarization | Medium | Meeting notes, report summaries |
| Complex analysis | High | Research synthesis, decision support |
| Autonomous agents | Very High | 24/7 automated workflows |
If most of your usage is simple Q&A, you might never hit the scale where self-hosted makes sense.
If you're deploying autonomous agents (like we discussed in Part 7), you'll hit scale much faster.
2. Understand Your Growth Trajectory
Plot where you are and where you're going:
GROWTH TRAJECTORY PLANNING
================================================================
CURRENT STATE LIKELY TRAJECTORY
------------- -----------------
"We're experimenting" --> Stay with API
Re-evaluate in 12 months
"Pilots are working, --> Consider neocloud
teams want more" Plan for potential self-hosted
"AI is core to our --> Self-hosted or hybrid
business" Build infrastructure team
"We're an AI company" --> Definitely self-hosted
Optimize relentlessly
================================================================
3. Don't Get Locked In
This is perhaps the most important strategic advice:
Avoid situations where you can't switch approaches.
- If you go API-only, ensure you can move to self-hosted later
- If you self-host, don't build in ways that lock you to specific hardware
- If you use neocloud, maintain the option to bring in-house
The AI landscape is changing rapidly. Today's best approach might not be tomorrow's best approach.
What This Means For You
If You're a CFO or Finance Lead
Use these numbers for multi-year planning.
Practical recommendations:
-
Model your 3-year AI trajectory. Don't just budget for Year 1. Where will you be in Year 3?
-
Identify your crossover point. Based on your growth rate, when does self-hosted become cheaper? That's when you need infrastructure ready.
-
Consider hybrid approaches. Many organizations use API for variable workloads and self-hosted for predictable, high-volume use cases.
-
Factor in the full TCO. Remember Part 6: hidden costs (staff, training, maintenance) are 3x the obvious costs. Self-hosted has higher hidden costs than API.
-
Build a transition plan. If you're heading toward self-hosted, start planning 18-24 months before you need it. Infrastructure takes time to build.
If You're a Tech-Forward Manager
Use these numbers to make the case for infrastructure investment.
Practical recommendations:
-
Track your token consumption religiously. You can't predict the crossover without knowing your growth rate.
-
Identify your highest-volume use cases. Which workloads would benefit most from self-hosted infrastructure?
-
Start building expertise now. Self-hosted requires ML engineering skills. If you'll need them in Year 3, start hiring in Year 1.
-
Pilot neocloud as a stepping stone. It's a good way to learn infrastructure management without the full capital commitment.
-
Present the 3-year picture. When asking for infrastructure investment, show the Year 3 savings, not just the Year 1 costs.
The Hybrid Reality
Most large organizations don't choose just one approach. They use all three:
HYBRID INFRASTRUCTURE STRATEGY
================================================================
WORKLOAD TYPE BEST APPROACH WHY
------------- ------------- ---
Experimentation API Flexibility
Variable/unpredictable API or Neocloud Pay for what you use
High-volume, predictable Self-hosted Best unit economics
Sensitive data Self-hosted Data doesn't leave
Burst capacity Neocloud Scale up temporarily
================================================================
One organization might use all three simultaneously.
The key is matching the approach to the workload.
================================================================
This is why visibility and governance matter so much. You need to know which workloads consume the most tokens so you can route them to the most cost-effective infrastructure.
The Bottom Line
Deloitte's 3-year study proves what intuition suggests:
- Small scale: API wins (flexibility over efficiency)
- Medium scale: Neocloud wins (balance of cost and flexibility)
- Large scale: Self-hosted wins (efficiency over flexibility)
The crossover point is real. At ~1 trillion tokens per year, self-hosted is 2x cheaper than API.
But getting to that point requires:
- The capital to invest upfront
- The expertise to manage infrastructure
- The scale to justify the investment
- The patience to wait for payback
For most organizations, the answer isn't "which one" but "which combination."
Coming Up Next
Part 9: Latency, Throughput, and Performance Optimization
We've spent a lot of time on costs. But cost isn't everything.
In Part 9, we'll cover:
- What latency and throughput actually mean
- Why some AI requests are instant and others crawl
- How to optimize for speed when milliseconds matter
Your Homework for Part 9
Think about your AI usage:
- Which AI features need to be fast? (Customer-facing, real-time)
- Which can afford to be slow? (Background processing, overnight jobs)
- Have you ever abandoned an AI tool because it was too slow?
Speed has a cost. Understanding that trade-off is the next piece of the puzzle.
See you in Part 9.
As always, thanks for reading!