Token Consumption Surges 1000x, Cloud Giants Start to Panic

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

04/10 2026 561

In the spring of 2026, the earnings season for North American cloud computing giants turned into a collective 'day of reckoning.'

Microsoft's Intelligent Cloud division surpassed $50 billion in quarterly revenue, with Azure growing 39% year-over-year—yet its stock plummeted nearly 10% post-earnings. Amazon AWS achieved its fastest revenue growth in 13 quarters, only to see shares tank 11% the next day. Google Cloud's revenue soared 48%, yet its stock still reversed from gains to losses after hours.

The reason boils down to one word: money. Or more precisely, spiraling AI bills.

A closer look at the earnings reports reveals Microsoft's quarterly capital expenditures hit $37.5 billion, up 66% year-over-year. While no full-year guidance was given, analysts warn annual spending could exceed $100 billion at current trends. Amazon announced $200 billion in 2026 spending; Google planned $175-185 billion—nearly double its 2025 outlay.

Combined, the three giants' expenditures top $500 billion, equivalent to Norway's 2024 GDP.

What's fueling capital market anxiety? Poor cloud growth? Hardly. The paradox is that heavier usage by major clients makes cloud bills more prone to 'blowouts.' A stealthy war over 'how to charge' is quietly unfolding in Silicon Valley. The outcome will reshape value distribution across the AI supply chain.

I. Is the Token Model Punishing Heavy Users?

An undeniable truth: The token-based pricing model was one of AI's greatest enablers.

In early 2024, China's daily token usage stood at 100 billion; by late 2025, it surged to 100 trillion; by March 2026, it exceeded 140 trillion—a 1,000x+ increase in two years.

Meanwhile, as AI evolved from a 'toy' to a 'production tool' with the 'Little Lobster' craze, the token model's flaws became apparent.

Take AI agents: A traditional chatbot consumes hundreds to thousands of tokens per query. But an autonomous AI agent requires multi-round reasoning, repeated tool calls, and extensive context reading. Industry insiders estimate token consumption balloons by dozens of times for agents, and 100-1,000x for complex tasks compared to casual conversations.

OpenAI shut down its video generation tool Sora in March 2026 partly due to financial losses. The Beijing News cited SemiAnalysis estimates that Sora's daily operating costs neared $15 million, with annual costs hitting $5.4 billion.

An OpenAI project lead admitted: 'The current economic model is entirely unsustainable.' Video generation consumes far more compute than text/image generation—one video's GPU resources could power dozens of ChatGPT queries, severely straining core business resources.

Under the token model, heavier AI usage leads to runaway bills. NVIDIA CEO Jensen Huang joked that even NVIDIA engineers would soon have 'annual token budgets'—it would seem odd if a highly paid engineer didn't consume significant tokens annually.

When an industry reaches the point where 'more usage equals more fear,' it signals a fundamental flaw in the pricing model.

'The more you use, the more I earn' sounds ideal. But CFOs dare not approve scaled budgets when AI bills fluctuate wildly. The token model is penalizing its most valuable clients—heavy users with deep scenarios—contradicting cloud providers' long-term interests in expanding the market.

Against this backdrop, North American cloud providers unveiled their new weapon: PTU (Pre-allocated Throughput Units).

Simply put, customers pre-purchase a fixed amount of compute capacity, paying monthly/quarterly/annual fees unrelated to actual token consumption. While the token model is 'pay-as-you-go,' PTU operates like a 'monthly buffet.' Customers gain cost certainty; cloud providers lock in client relationships.

The underlying strategic logic has flipped.

Under tokens, it's a zero-sum game: Customer savings equal cloud provider losses, and vice versa. But fear of cost overruns causes clients to curb usage, stifling cloud revenue growth.

Under PTU, it becomes a positive-sum game: Clients, armed with budget certainty, expand AI adoption; cloud providers achieve more sustainable revenue growth. Essentially, risk shifts from clients to cloud providers in exchange for deeper customer binding.

Guosen Securities draws parallels to China's mobile internet pricing evolution.

In the 2G era, data was billed per KB (0.01 yuan/KB), making usage painful. The 3G era shifted to MB-based plans (150 yuan for 3GB), encouraging broader adoption. The 4G era's 'speed upgrades and rate cuts' triggered unlimited data plans, like Tencent's 2016 'King Card' (19 yuan/month for Tencent app exemptions), shifting users from 'buying data' to 'buying services.' The 5G era further evolved to 'speed-tiered pricing,' de-emphasizing data volume.

Each pricing shift redistributed industry influence. Operators abandoned 'per-KB' profits but saw average user data usage explode from 30MB to over 10GB, expanding the market hundreds of times over.

Today's cloud providers face the same choice. They're sacrificing short-term margins for long-term contract certainty. Guosen Securities notes PTU transitions will shift cloud gross margin structures from 'highly volatile' to 'resilient'—short-term pressure for long-term health and stability.

II. The Triopoly's Divergent Strategies

While Microsoft, AWS, and Google all promote PTU, their core strategies differ sharply.

Microsoft leverages ecosystem bundling. Its arsenal includes Windows, Office 365, and GitHub. It launched the 'Azure AI Commitment Plan,' encouraging 1-3 year contract commitments. This quarter, Microsoft's commercial remaining performance obligations surged to $625 billion, doubling year-over-year, with 45% from OpenAI's new $250 billion deal.

Microsoft's calculus is clear: The ultimate pricing power comes from making AI costs untraceable. When AI becomes a button in Word, budgets merge with software subscriptions. But over-reliance on a single client (OpenAI) sparks market concerns—any OpenAI distress would directly hit Microsoft.

AWS relies on cost leadership. Its edge comes from custom Trainium/Inferentia chips and the world's largest cloud infrastructure. It aggressively promotes 'AI/ML Savings Plans,' offering significant discounts versus on-demand pricing.

Amazon CEO Andy Jassy declared at earnings: 'Achieving 24% YoY growth on $142 billion annualized revenue is vastly different from competitors posting higher percentage gains on much smaller bases.'

AWS's supply chain efficiency creates an impenetrable moat. It welcomes price wars, having the industry's lowest unit compute costs. Executives repeatedly emphasized 'rapid monetization of new capacity' at earnings—betting scale effects will eventually overwhelm competitors.

Google bets on performance premiums. With the deepest AI tech stack (7th-gen TPU, Gemini model with 750M MAU), Google Cloud grew 48% YoY in Q4, outpacing rivals.

It extended 'commitment discounts' to AI platforms, targeting performance-sensitive clients. Google pursues a luxury tech strategy—not chasing volume but ensuring high-margin clients remain dependent. The Apple partnership is pivotal: Google became Apple's preferred cloud provider, co-developing foundation models to reach global users via Apple devices.

These strategies reflect three distinct economic moats: Microsoft relies on switching costs, AWS on scale, and Google on tech leadership. To assess who benefits most from PTU, evaluate whether their moat remains effective during long-term contract lock-ins.

However, PTU's impact won't stop at cloud providers and major clients—it will ripple upstream to chipmakers and downstream to app developers.

Chipmakers benefit first. Under tokens, cloud providers' compute purchases were 'pulsed'—rushing orders during traffic spikes, leaving resources idle during lulls. PTU's long-term contracts enable smoother, more predictable upstream orders.

Microsoft plans to boost AI compute by 80%+ in 2026 and double data centers in two years. NVIDIA and peers can now plan capacity more calmly, boosting supply chain efficiency.

Downstream, AI app developers face consolidation. With major clients locking up resources, smaller players' resource pools may shrink. The barrier for AI startups rises—projects that once launched via pay-as-you-go tokens now face higher initial costs. Meanwhile, tooling companies that optimize PTU utilization (AI workload scheduling, cost management SaaS) will see structural opportunities.

Guosen Securities predicts that after a one-year digestion period for this pricing shift, long-term contract trends will significantly enhance revenue/profit growth certainty.

III. Conclusion

The shift from tokens to PTU marks AI's transition from 'wild west' experimentation to 'precision farming' commercial maturity.

Recall how mobile internet evolved from per-KB charges to unlimited data—it was pricing maturity that spawned trillion-dollar markets like short video, live streaming, and cloud gaming. Today's AI billing growing pains are paving the way for the next 'TikTok-scale' AI-native applications.

Of course, PTU won't be the endpoint. With Model-as-a-Service (MaaS) rising, AI billing may evolve toward 'pay-for-business-outcomes.' The battle for pricing power remains the core thread to watch in AI's evolution.

In this process, the true winners will be those who transform 'customer locking' into 'customer serving.' When pricing power shifts from zero-sum to positive-sum, AI commercialization will finally come of age.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links