Back to Blog
13 min
decoded

Deloitte Ran the Numbers for 3 Years. Here's When Self-Hosted AI Becomes 2x Cheaper.

10 billion tokens in Year 1. 1 trillion in Year 3. Deloitte tracked the real costs across API, neocloud, and self-hosted. The results will change how you think about AI infrastructure.

TokenomicsAI EconomicsBattle Against Chauffeur KnowledgeDecision MakingInfrastructure

Deloitte Ran the Numbers for 3 Years. Here's When Self-Hosted AI Becomes 2x Cheaper.

Published: January 28, 2026 - 13 min read

This is Part 8 of the Tokenomics for Humans series. If you haven't read Part 7 on Jevons' Paradox, I recommend starting there.


In Part 5, I gave you three options for accessing AI: SaaS, API, and self-hosted.

In Part 6, I showed you that the invoice only tells part of the story.

In Part 7, I warned you that cheaper AI leads to more spending, not less.

Now it's time to put real numbers behind all of this.

Deloitte didn't just theorize about AI costs. They tracked actual spending across different approaches over three years. They watched organizations grow from pilots to production at scale.

The results are definitive. And if you're making infrastructure decisions, you need to see them.


The Study Setup

Deloitte analyzed three different approaches to AI infrastructure:

ApproachWhat It MeansExample
API AccessPay per token to AI providersOpenAI API, Anthropic API
NeocloudRent GPU infrastructure, run open-source modelsLambda Labs, CoreWeave
Self-HostedBuy your own GPUs, run models on-premiseNVIDIA GPUs in your data center

They tested with:

  • Hardware: High-end NVIDIA GPUs (the same chips that cost $30,000-$40,000 each, as we discussed in Part 4)
  • Models: Llama (open-source) and GPT-4o
  • Time frame: Three years of growing AI adoption

The question: At what point does self-hosted become cheaper than paying per token?


Year 1: The Pilot Stage

Token volume: ~10 billion tokens

Use cases: Simple chatbots, FAQ assistants, basic automation

This is where most organizations start. You're experimenting. You're proving value. You're not yet committed to massive AI infrastructure.

YEAR 1 COSTS: PILOT STAGE (~10 BILLION TOKENS)
================================================================

APPROACH             ANNUAL COST         NOTES
--------             -----------         -----
API Access           $240,000            Pay only for what you use
Neocloud             $170,000            Cheaper per token than API
Self-Hosted          $40,000             But requires upfront investment

================================================================

Wait, self-hosted is cheapest in Year 1?

Yes, but there's a catch. That $40,000 is the operational cost. It doesn't include the upfront capital investment of $500,000+ to build the infrastructure in the first place.

When you factor in the initial investment, self-hosted is the most expensive option in Year 1 for most organizations.

Year 1 Recommendation

For most organizations: API access.

Why? You're still learning. You don't know if AI will work for your use cases. You don't know how much you'll actually use. Paying per token gives you flexibility.

The $240,000 API bill sounds high, but it's much lower risk than a $500,000+ infrastructure investment that might not pay off.

YEAR 1: WHY API WINS
================================================================

                    API             Self-Hosted
                    ---             -----------
Upfront cost        $0              $500K+
Year 1 operating    $240K           $40K
                    -----           -----
Year 1 TOTAL        $240K           $540K+

Risk if AI fails    Low             Very High
                    (just stop)     (stuck with hardware)

================================================================
   API is 2x cheaper in Year 1 when you count upfront costs.
================================================================

Year 2: Growing Adoption

Token volume: ~300 billion tokens (30x Year 1!)

Use cases: Document search, summarization, content generation, workflow automation

This is where things get interesting. Your pilots worked. Teams are adopting AI. Usage is exploding.

Remember Jevons' Paradox? Here it is in action. Usage doesn't grow 2x or 3x. It grows 30x.

YEAR 2 COSTS: GROWING ADOPTION (~300 BILLION TOKENS)
================================================================

APPROACH             ANNUAL COST         CHANGE FROM YEAR 1
--------             -----------         ------------------
API Access           $970,000            +304% ($730K more)
Neocloud             $490,000            +188% ($320K more)
Self-Hosted          $1,060,000          +2,550% (but context matters)

================================================================

Hold on. Self-hosted is the most expensive in Year 2?

Yes. This is the awkward middle period.

  • You're using too much AI for API to be cheap
  • But you haven't reached the scale where self-hosted infrastructure pays off
  • Meanwhile, self-hosted costs spike because you're expanding capacity

Year 2: The Crossover Begins

Look at the per-token economics:

ApproachYear 2 CostCost per Billion Tokens
API$970,000$3,233
Neocloud$490,000$1,633
Self-Hosted$1,060,000$3,533

Neocloud emerges as the middle-ground winner. You get better per-token economics than API without the massive self-hosted commitment.

But watch what happens next.


Year 3: Scale

Token volume: ~1 trillion tokens (100x Year 1!)

Use cases: Decision support, analytics, AI-powered products, autonomous agents

This is where the math changes completely.

YEAR 3 COSTS: SCALE (~1 TRILLION TOKENS)
================================================================

APPROACH             ANNUAL COST         CHANGE FROM YEAR 1
--------             -----------         ------------------
API Access           $3,500,000          +1,358%
Neocloud             $2,720,000          +1,500%
Self-Hosted          $1,450,000          +3,525% (but cheapest!)

================================================================

Self-hosted is now 2x cheaper than API.

At 1 trillion tokens per year:

  • API costs $3.50 per billion tokens
  • Self-hosted costs $1.45 per billion tokens

That's $2.05 million in annual savings versus API.

Even compared to neocloud, self-hosted saves $1.27 million per year.

The 3-Year Picture

Let me show you all three years together:

3-YEAR COST COMPARISON
================================================================

YEAR        API          NEOCLOUD      SELF-HOSTED    WINNER
----        ---          --------      -----------    ------
Year 1      $240K        $170K         $40K*          (API**)
Year 2      $970K        $490K         $1,060K        Neocloud
Year 3      $3,500K      $2,720K       $1,450K        Self-Hosted
            ------       ------        ------
TOTAL       $4,710K      $3,380K       $2,550K        Self-Hosted

================================================================
* Self-hosted Year 1 excludes upfront investment
** API wins Year 1 when upfront costs are included

3-YEAR SAVINGS:
- Self-hosted vs API: $2,160,000 saved (46% cheaper)
- Self-hosted vs Neocloud: $830,000 saved (25% cheaper)
================================================================

Over three years, self-hosted saves over $2 million compared to API.


The Crossover Point

Here's the key insight from Deloitte's research:

The crossover happens around Year 3 for organizations with aggressive AI adoption.

THE CROSSOVER VISUALIZATION
================================================================

COST
 ^
 |
 |                                          / API ($3.5M)
 |                                        /
 |                                      /
 |                              /-----/ Neocloud ($2.7M)
 |                            /
 |            /----------------------- Self-Hosted ($1.45M)
 |          /     CROSSOVER
 |        /        POINT
 |      /           |
 |----/-------------|--------------------------------------------
 |                  |
 +--------------------------------------------------------> TIME
        Year 1     Year 2     Year 3

================================================================
   Before crossover: API/Neocloud cheaper (flexibility matters)
   After crossover: Self-hosted cheaper (scale matters)
================================================================

But here's the nuance: the crossover point depends on how fast you're growing.

  • Slow adopters: Crossover might be Year 4 or Year 5
  • Fast adopters: Crossover might be Year 2
  • Massive scale from Day 1: Self-hosted might win immediately

The Strategic Decision Framework

Deloitte identified three things every organization must understand before making infrastructure decisions:

1. Know Your Workloads

Different AI tasks have vastly different token requirements.

Workload TypeToken IntensityExample
Simple Q&ALowFAQ chatbot
Document summarizationMediumMeeting notes, report summaries
Complex analysisHighResearch synthesis, decision support
Autonomous agentsVery High24/7 automated workflows

If most of your usage is simple Q&A, you might never hit the scale where self-hosted makes sense.

If you're deploying autonomous agents (like we discussed in Part 7), you'll hit scale much faster.

2. Understand Your Growth Trajectory

Plot where you are and where you're going:

GROWTH TRAJECTORY PLANNING
================================================================

CURRENT STATE                 LIKELY TRAJECTORY
-------------                 -----------------

"We're experimenting"    -->  Stay with API
                              Re-evaluate in 12 months

"Pilots are working,     -->  Consider neocloud
 teams want more"             Plan for potential self-hosted

"AI is core to our       -->  Self-hosted or hybrid
 business"                    Build infrastructure team

"We're an AI company"    -->  Definitely self-hosted
                              Optimize relentlessly

================================================================

3. Don't Get Locked In

This is perhaps the most important strategic advice:

Avoid situations where you can't switch approaches.

  • If you go API-only, ensure you can move to self-hosted later
  • If you self-host, don't build in ways that lock you to specific hardware
  • If you use neocloud, maintain the option to bring in-house

The AI landscape is changing rapidly. Today's best approach might not be tomorrow's best approach.


What This Means For You

If You're a CFO or Finance Lead

Use these numbers for multi-year planning.

Practical recommendations:

  1. Model your 3-year AI trajectory. Don't just budget for Year 1. Where will you be in Year 3?

  2. Identify your crossover point. Based on your growth rate, when does self-hosted become cheaper? That's when you need infrastructure ready.

  3. Consider hybrid approaches. Many organizations use API for variable workloads and self-hosted for predictable, high-volume use cases.

  4. Factor in the full TCO. Remember Part 6: hidden costs (staff, training, maintenance) are 3x the obvious costs. Self-hosted has higher hidden costs than API.

  5. Build a transition plan. If you're heading toward self-hosted, start planning 18-24 months before you need it. Infrastructure takes time to build.

If You're a Tech-Forward Manager

Use these numbers to make the case for infrastructure investment.

Practical recommendations:

  1. Track your token consumption religiously. You can't predict the crossover without knowing your growth rate.

  2. Identify your highest-volume use cases. Which workloads would benefit most from self-hosted infrastructure?

  3. Start building expertise now. Self-hosted requires ML engineering skills. If you'll need them in Year 3, start hiring in Year 1.

  4. Pilot neocloud as a stepping stone. It's a good way to learn infrastructure management without the full capital commitment.

  5. Present the 3-year picture. When asking for infrastructure investment, show the Year 3 savings, not just the Year 1 costs.


The Hybrid Reality

Most large organizations don't choose just one approach. They use all three:

HYBRID INFRASTRUCTURE STRATEGY
================================================================

WORKLOAD TYPE              BEST APPROACH           WHY
-------------              -------------           ---
Experimentation            API                     Flexibility
Variable/unpredictable     API or Neocloud         Pay for what you use
High-volume, predictable   Self-hosted             Best unit economics
Sensitive data             Self-hosted             Data doesn't leave
Burst capacity             Neocloud                Scale up temporarily

================================================================
   One organization might use all three simultaneously.
   The key is matching the approach to the workload.
================================================================

This is why visibility and governance matter so much. You need to know which workloads consume the most tokens so you can route them to the most cost-effective infrastructure.


The Bottom Line

Deloitte's 3-year study proves what intuition suggests:

  1. Small scale: API wins (flexibility over efficiency)
  2. Medium scale: Neocloud wins (balance of cost and flexibility)
  3. Large scale: Self-hosted wins (efficiency over flexibility)

The crossover point is real. At ~1 trillion tokens per year, self-hosted is 2x cheaper than API.

But getting to that point requires:

  • The capital to invest upfront
  • The expertise to manage infrastructure
  • The scale to justify the investment
  • The patience to wait for payback

For most organizations, the answer isn't "which one" but "which combination."


Coming Up Next

Part 9: Latency, Throughput, and Performance Optimization

We've spent a lot of time on costs. But cost isn't everything.

In Part 9, we'll cover:

  • What latency and throughput actually mean
  • Why some AI requests are instant and others crawl
  • How to optimize for speed when milliseconds matter

Your Homework for Part 9

Think about your AI usage:

  1. Which AI features need to be fast? (Customer-facing, real-time)
  2. Which can afford to be slow? (Background processing, overnight jobs)
  3. Have you ever abandoned an AI tool because it was too slow?

Speed has a cost. Understanding that trade-off is the next piece of the puzzle.

See you in Part 9.


As always, thanks for reading!

Share this article

Found this helpful? Share it with others who might benefit.

Continue Reading

Enjoyed this post?

Get notified when I publish new blog posts, case studies, and project updates. No spam, just quality content about AI-assisted development and building in public.

No spam. Unsubscribe anytime. I publish 1-2 posts per day.

Want This Implemented, Not Just Explained?

I work with a small number of clients who need AI integration done right. If you're serious about implementation, let's talk.