Deloitte Ran the Numbers for 3 Years. Here's When Self-Hosted AI Becomes 2x Cheaper.

Published: January 28, 2026 - 13 min read

This is Part 8 of the Tokenomics for Humans series. If you haven't read Part 7 on Jevons' Paradox, I recommend starting there.

In Part 5, I gave you three options for accessing AI: SaaS, API, and self-hosted.

In Part 6, I showed you that the invoice only tells part of the story.

In Part 7, I warned you that cheaper AI leads to more spending, not less.

Now it's time to put real numbers behind all of this.

Deloitte didn't just theorize about AI costs. They tracked actual spending across different approaches over three years. They watched organizations grow from pilots to production at scale.

The results are definitive. And if you're making infrastructure decisions, you need to see them.

The Study Setup

Deloitte analyzed three different approaches to AI infrastructure:

Approach	What It Means	Example
API Access	Pay per token to AI providers	OpenAI API, Anthropic API
Neocloud	Rent GPU infrastructure, run open-source models	Lambda Labs, CoreWeave
Self-Hosted	Buy your own GPUs, run models on-premise	NVIDIA GPUs in your data center

They tested with:

Hardware: High-end NVIDIA GPUs (the same chips that cost $30,000-$40,000 each, as we discussed in Part 4)
Models: Llama (open-source) and GPT-4o
Time frame: Three years of growing AI adoption

The question: At what point does self-hosted become cheaper than paying per token?

Year 1: The Pilot Stage

Token volume: ~10 billion tokens

Use cases: Simple chatbots, FAQ assistants, basic automation

This is where most organizations start. You're experimenting. You're proving value. You're not yet committed to massive AI infrastructure.

YEAR 1 COSTS: PILOT STAGE (~10 BILLION TOKENS)
================================================================

APPROACH             ANNUAL COST         NOTES
--------             -----------         -----
API Access           $240,000            Pay only for what you use
Neocloud             $170,000            Cheaper per token than API
Self-Hosted          $40,000             But requires upfront investment

================================================================

Wait, self-hosted is cheapest in Year 1?

Yes, but there's a catch. That $40,000 is the operational cost. It doesn't include the upfront capital investment of $500,000+ to build the infrastructure in the first place.

When you factor in the initial investment, self-hosted is the most expensive option in Year 1 for most organizations.

Year 1 Recommendation

For most organizations: API access.

Why? You're still learning. You don't know if AI will work for your use cases. You don't know how much you'll actually use. Paying per token gives you flexibility.

The $240,000 API bill sounds high, but it's much lower risk than a $500,000+ infrastructure investment that might not pay off.

YEAR 1: WHY API WINS
================================================================

                    API             Self-Hosted
                    ---             -----------
Upfront cost        $0              $500K+
Year 1 operating    $240K           $40K
                    -----           -----
Year 1 TOTAL        $240K           $540K+

Risk if AI fails    Low             Very High
                    (just stop)     (stuck with hardware)

================================================================
   API is 2x cheaper in Year 1 when you count upfront costs.
================================================================

Year 2: Growing Adoption

Token volume: ~300 billion tokens (30x Year 1!)

Use cases: Document search, summarization, content generation, workflow automation

This is where things get interesting. Your pilots worked. Teams are adopting AI. Usage is exploding.

Remember Jevons' Paradox? Here it is in action. Usage doesn't grow 2x or 3x. It grows 30x.

YEAR 2 COSTS: GROWING ADOPTION (~300 BILLION TOKENS)
================================================================

APPROACH             ANNUAL COST         CHANGE FROM YEAR 1
--------             -----------         ------------------
API Access           $970,000            +304% ($730K more)
Neocloud             $490,000            +188% ($320K more)
Self-Hosted          $1,060,000          +2,550% (but context matters)

================================================================

Hold on. Self-hosted is the most expensive in Year 2?

Yes. This is the awkward middle period.

You're using too much AI for API to be cheap
But you haven't reached the scale where self-hosted infrastructure pays off
Meanwhile, self-hosted costs spike because you're expanding capacity

Year 2: The Crossover Begins

Look at the per-token economics:

Approach	Year 2 Cost	Cost per Billion Tokens
API	$970,000	$3,233
Neocloud	$490,000	$1,633
Self-Hosted	$1,060,000	$3,533

Neocloud emerges as the middle-ground winner. You get better per-token economics than API without the massive self-hosted commitment.

But watch what happens next.

Year 3: Scale

Token volume: ~1 trillion tokens (100x Year 1!)

Use cases: Decision support, analytics, AI-powered products, autonomous agents

This is where the math changes completely.

YEAR 3 COSTS: SCALE (~1 TRILLION TOKENS)
================================================================

APPROACH             ANNUAL COST         CHANGE FROM YEAR 1
--------             -----------         ------------------
API Access           $3,500,000          +1,358%
Neocloud             $2,720,000          +1,500%
Self-Hosted          $1,450,000          +3,525% (but cheapest!)

================================================================

Self-hosted is now 2x cheaper than API.

At 1 trillion tokens per year:

API costs $3.50 per billion tokens
Self-hosted costs $1.45 per billion tokens

That's $2.05 million in annual savings versus API.

Even compared to neocloud, self-hosted saves $1.27 million per year.

The 3-Year Picture

Let me show you all three years together:

3-YEAR COST COMPARISON
================================================================

YEAR        API          NEOCLOUD      SELF-HOSTED    WINNER
----        ---          --------      -----------    ------
Year 1      $240K        $170K         $40K*          (API**)
Year 2      $970K        $490K         $1,060K        Neocloud
Year 3      $3,500K      $2,720K       $1,450K        Self-Hosted
            ------       ------        ------
TOTAL       $4,710K      $3,380K       $2,550K        Self-Hosted

================================================================
* Self-hosted Year 1 excludes upfront investment
** API wins Year 1 when upfront costs are included

3-YEAR SAVINGS:
- Self-hosted vs API: $2,160,000 saved (46% cheaper)
- Self-hosted vs Neocloud: $830,000 saved (25% cheaper)
================================================================

Over three years, self-hosted saves over $2 million compared to API.

The Crossover Point

Here's the key insight from Deloitte's research:

The crossover happens around Year 3 for organizations with aggressive AI adoption.

THE CROSSOVER VISUALIZATION
================================================================

COST
 ^
 |
 |                                          / API ($3.5M)
 |                                        /
 |                                      /
 |                              /-----/ Neocloud ($2.7M)
 |                            /
 |            /----------------------- Self-Hosted ($1.45M)
 |          /     CROSSOVER
 |        /        POINT
 |      /           |
 |----/-------------|--------------------------------------------
 |                  |
 +--------------------------------------------------------> TIME
        Year 1     Year 2     Year 3

================================================================
   Before crossover: API/Neocloud cheaper (flexibility matters)
   After crossover: Self-hosted cheaper (scale matters)
================================================================

But here's the nuance: the crossover point depends on how fast you're growing.

Slow adopters: Crossover might be Year 4 or Year 5
Fast adopters: Crossover might be Year 2
Massive scale from Day 1: Self-hosted might win immediately

The Strategic Decision Framework

Deloitte identified three things every organization must understand before making infrastructure decisions:

1. Know Your Workloads

Different AI tasks have vastly different token requirements.

Workload Type	Token Intensity	Example
Simple Q&A	Low	FAQ chatbot
Document summarization	Medium	Meeting notes, report summaries
Complex analysis	High	Research synthesis, decision support
Autonomous agents	Very High	24/7 automated workflows

If most of your usage is simple Q&A, you might never hit the scale where self-hosted makes sense.

If you're deploying autonomous agents (like we discussed in Part 7), you'll hit scale much faster.

2. Understand Your Growth Trajectory

Plot where you are and where you're going:

GROWTH TRAJECTORY PLANNING
================================================================

CURRENT STATE                 LIKELY TRAJECTORY
-------------                 -----------------

"We're experimenting"    -->  Stay with API
                              Re-evaluate in 12 months

"Pilots are working,     -->  Consider neocloud
 teams want more"             Plan for potential self-hosted

"AI is core to our       -->  Self-hosted or hybrid
 business"                    Build infrastructure team

"We're an AI company"    -->  Definitely self-hosted
                              Optimize relentlessly

================================================================

3. Don't Get Locked In

This is perhaps the most important strategic advice:

Avoid situations where you can't switch approaches.

If you go API-only, ensure you can move to self-hosted later
If you self-host, don't build in ways that lock you to specific hardware
If you use neocloud, maintain the option to bring in-house

The AI landscape is changing rapidly. Today's best approach might not be tomorrow's best approach.

What This Means For You

If You're a CFO or Finance Lead

Use these numbers for multi-year planning.

Practical recommendations:

Model your 3-year AI trajectory. Don't just budget for Year 1. Where will you be in Year 3?
Identify your crossover point. Based on your growth rate, when does self-hosted become cheaper? That's when you need infrastructure ready.
Consider hybrid approaches. Many organizations use API for variable workloads and self-hosted for predictable, high-volume use cases.
Factor in the full TCO. Remember Part 6: hidden costs (staff, training, maintenance) are 3x the obvious costs. Self-hosted has higher hidden costs than API.
Build a transition plan. If you're heading toward self-hosted, start planning 18-24 months before you need it. Infrastructure takes time to build.

If You're a Tech-Forward Manager

Use these numbers to make the case for infrastructure investment.

Practical recommendations:

Track your token consumption religiously. You can't predict the crossover without knowing your growth rate.
Identify your highest-volume use cases. Which workloads would benefit most from self-hosted infrastructure?
Start building expertise now. Self-hosted requires ML engineering skills. If you'll need them in Year 3, start hiring in Year 1.
Pilot neocloud as a stepping stone. It's a good way to learn infrastructure management without the full capital commitment.
Present the 3-year picture. When asking for infrastructure investment, show the Year 3 savings, not just the Year 1 costs.

The Hybrid Reality

Most large organizations don't choose just one approach. They use all three:

HYBRID INFRASTRUCTURE STRATEGY
================================================================

WORKLOAD TYPE              BEST APPROACH           WHY
-------------              -------------           ---
Experimentation            API                     Flexibility
Variable/unpredictable     API or Neocloud         Pay for what you use
High-volume, predictable   Self-hosted             Best unit economics
Sensitive data             Self-hosted             Data doesn't leave
Burst capacity             Neocloud                Scale up temporarily

================================================================
   One organization might use all three simultaneously.
   The key is matching the approach to the workload.
================================================================

This is why visibility and governance matter so much. You need to know which workloads consume the most tokens so you can route them to the most cost-effective infrastructure.

The Bottom Line

Deloitte's 3-year study proves what intuition suggests:

Small scale: API wins (flexibility over efficiency)
Medium scale: Neocloud wins (balance of cost and flexibility)
Large scale: Self-hosted wins (efficiency over flexibility)

The crossover point is real. At ~1 trillion tokens per year, self-hosted is 2x cheaper than API.

But getting to that point requires:

The capital to invest upfront
The expertise to manage infrastructure
The scale to justify the investment
The patience to wait for payback

For most organizations, the answer isn't "which one" but "which combination."

Coming Up Next

Part 9: Latency, Throughput, and Performance Optimization

We've spent a lot of time on costs. But cost isn't everything.

In Part 9, we'll cover:

What latency and throughput actually mean
Why some AI requests are instant and others crawl
How to optimize for speed when milliseconds matter

Your Homework for Part 9

Think about your AI usage:

Which AI features need to be fast? (Customer-facing, real-time)
Which can afford to be slow? (Background processing, overnight jobs)
Have you ever abandoned an AI tool because it was too slow?

Speed has a cost. Understanding that trade-off is the next piece of the puzzle.

See you in Part 9.

As always, thanks for reading!

Deloitte Ran the Numbers for 3 Years. Here's When Self-Hosted AI Becomes 2x Cheaper.