Surge in Token Usage Turns AI Cloud into a Lucrative Business

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

04/03 2026 469

Author｜Lin Yi Editor｜Key Point Jun

The hottest term in the AI industry this past March was Token. Several key events unfolded almost simultaneously:

In China, Liu Liehong, Director of the National Data Bureau, announced at the China Development Forum that the daily average Token invocation volume in China has surpassed 140 trillion, a thousandfold increase from 100 billion two years ago.

Overseas, NVIDIA founder Jensen Huang stated at the GTC conference that Tokens will become the most core and valuable commodity in the future digital world, with Token throughput becoming a key operational metric tracked by CEOs of global enterprises.

Also in March, Alibaba Cloud set an aggressive target during its earnings call: to achieve annual revenue of over $100 billion from cloud and AI commercialization within five years, implying a compound annual growth rate of approximately 47%. Additionally, ByteDance's cloud computing arm, Volcano Engine, reported that its Doubao large model handles over 100 trillion Token invocations daily, ranking among the top three globally.

As the underlying infrastructure of the AI era, cloud computing is growing increasingly important, and AI cloud is emerging as a truly lucrative business.

What defines a lucrative business? In the tech industry, the criteria can be summarized by three key indicators: declining marginal costs due to economies of scale, high switching costs resulting from customer ecosystem lock-in, and high gross margins with recurring revenue built on standardized products.

Amazon AWS, Microsoft Azure, and Google Cloud meet all three criteria. They have constructed high-barrier, high-profit business models by offering standardized IaaS, PaaS, and SaaS services: larger resource pools reduce costs, customers find it difficult to migrate once onboarded, and software subscriptions generate consistent high-margin cash flow. In fiscal year 2025, these three cloud providers reported profits of $45.6 billion, $54 billion, and $13.91 billion, respectively.

However, China's cloud computing industry has taken a vastly different path. Over the past decade, despite market expansion, domestic cloud providers have been trapped in a cycle of heavy asset investment, low margins, and intense competition, struggling to achieve profitability. This stems from unique IT consumption habits, a weak SaaS ecosystem, and large government and enterprise customers' preference for highly customized solutions. During the traditional IaaS phase, as computing, storage, and network resources offered by cloud providers became highly commoditized, market competition often devolved into price wars. To secure major government and enterprise clients in non-internet sectors, cloud providers undertook extensive low-margin, labor-intensive customized development and on-premises deployment work. This transformed cloud computing—a lightweight service that should have benefited from significant economies of scale—into a traditional IT project-based business reliant on manpower and hardware stack.

Until the current AI wave provided domestic cloud providers with an opportunity to restructure their business models: packaging large models into callable, billable standardized cloud services for sale to enterprises and developers, becoming a new growth engine.

From Price Wars to Price Hikes

AI has first driven structural growth in the cloud computing industry. In the first quarter of 2025, China's cloud infrastructure services spending reached $11.6 billion, up 16% year-on-year, with AI-related demand becoming the primary driver for enterprises migrating to the cloud. According to an Omdia report, China's AI cloud market is expected to reach $51.8 billion in 2025, up 148% year-on-year, and surpass $193 billion by 2030. (Note: Definitions of AI cloud vary among providers.)

But this growth was preceded by fierce price wars. In May 2024, ByteDance's Volcano Engine initiated a wave of large model price reductions with its Doubao model, followed by Alibaba Cloud and Baidu Intelligent Cloud. Token pricing for large models plummeted by over 90% in less than a year, with inference computing margins for some cloud providers turning negative. Their strategy was "growth through losses," as establishing API invocation habits among developers and enterprise clients first would secure future advantages.

The tide began to turn in early 2026. Overseas, Amazon AWS and Google Cloud announced price hikes, followed by domestic providers Alibaba Cloud, Baidu Intelligent Cloud, and Tencent Cloud. On March 18, Alibaba Cloud and Baidu Intelligent Cloud simultaneously announced price increases:

Alibaba Cloud raised prices by up to 34%: Adjustments were made to AI computing power and storage products. Computing cards like the self-developed T-Head Zhenwu 810E saw price increases of 5%-34%, while high-performance computing file storage product CPFS rose by 30%. New prices took effect on April 18, 2026.

Baidu Intelligent Cloud raised prices by up to 30%: AI computing power-related product services increased by approximately 5%-30%, while parallel file storage rose by about 30%. These changes also took effect on April 18, 2026.

The most direct trigger for these price hikes was the surge in Token demand. While simple large model conversations consume limited Tokens, the 2026 explosion of AI agents and maturation of multimodal models significantly expanded the AI cloud market. The popularity of agent products like Claude Code and OpenClaw made tech companies realize that a single agent task often involves multiple rounds of internal reasoning, tool invocation, and task execution, resulting in significantly higher Token consumption than ordinary AI conversations. Computing power demand shifted from "cloud-based training" to a dual-wheel drive of "training + inference," causing severe shortages in existing AI computing resources.

This shift in computing power supply and demand directly drove changes in commercial billing models.

From IaaS Computing Leasing to MaaS Token Economics

During the traditional IaaS phase, cloud providers' core business model was acting as "sub-landlords," renting out underlying computing resources, storage, and network bandwidth—a highly commoditized market.

The emergence of Tokens disrupted this landscape. Tokens represent the smallest semantic units processed by AI models for language, images, audio, and video. Every user interaction with a large model is ultimately broken down into Tokens for computation. By billing based on Tokens, cloud providers transitioned from "selling hardware usage rights" to "selling intelligent services."

This model offers distinct advantages: First, it eliminates hardware commoditization concerns. Users no longer care about the underlying GPU type but whether equivalent Tokens can complete tasks. Second, it naturally amplifies economies of scale. Larger computing pools improve concurrent scheduling efficiency, reducing marginal costs per Token. Finally, standardized API interfaces create ecosystem lock-in, as migration costs become prohibitively high once invocation habits are established. Cloud services truly become as accessible as utilities—turn on and use.

Cloud providers are also redirecting scarce AI computing resources toward high-value Token businesses. For example, Tencent Cloud rapidly consolidated resources over the past month to launch the "Longxia" product matrix covering cloud, consumer, and enterprise editions, directly upgrading its original MaaS large model service platform to TokenHub and introducing a unified Token Plan service.

The proliferation of AI agents has transformed previously ad-hoc capabilities into high-frequency, automated services, dramatically increasing cloud providers' Token throughput and positioning MaaS businesses to potentially account for 30% or more of cloud providers' total revenue in the future.

According to Caijing Magazine, in late December 2025, Liu Weiguang, Senior Vice President of Alibaba Cloud Intelligence Group and President of its Public Cloud Business Unit, stated in a small-scale briefing that MaaS revenue could potentially reach 30% or more of cloud providers' total revenue. Additionally, Amazon AWS management disclosed during its Q3 2025 earnings call that it aims to make Bedrock the world's largest inference platform, with revenue contributions comparable to its core computing product EC2, expected to exceed 30% of total revenue.

This represents the "recurring, high-margin, replicable" revenue structure required for top-tier cloud businesses.

The Decisive Factor in AI Cloud: Full-Stack Cost Competition

While promising, competition in the AI cloud sector is intensifying.

Overseas, transitioning to AI cloud has become a common goal for Amazon AWS, Microsoft Azure, Google Cloud, and Oracle OCI. Domestically, tech cloud providers like Alibaba Cloud, Baidu Intelligent Cloud, Tencent Cloud, Volcano Engine, and Huawei Cloud are also strengthening their AI capabilities. Capital expenditures among these providers continue to reach new highs.

In our view, AI cloud competition is not purely about computing power but full-stack cost efficiency. The deciding factor lies not in who has more GPUs but who can achieve the lowest "cost per Token."

The competition among the four major U.S. cloud providers validates this logic. Google is the most vertically integrated player, with its Gemini series models trained and deployed on self-developed TPUs, integrating chips, models, and cloud services while controlling both cost and pricing. Amazon has delivered over 1.4 million self-developed Trainium 2 chips, offering 30%-40% better cost-performance than comparable NVIDIA GPUs. In contrast, Oracle serves as a cautionary tale: lacking self-developed chips, it relies entirely on NVIDIA for computing power, with capital expenditures exceeding operating cash flow while heavily dependent on OpenAI as a single customer, leaving it in the most vulnerable position.

Chinese cloud providers face similar competitive dynamics, compounded by geopolitical pressures, making the landscape even more complex.

Alibaba Cloud holds dual advantages of scale and vertical integration, with the deepest moat. Its Bailian MaaS platform aggregates dozens of mainstream models like Tongyi Qianwen and DeepSeek; it has deployed over 470,000 AI chips for actual business use, with more than 60% serving external commercial clients. Over the next three years, Alibaba plans to invest over RMB 380 billion in cloud and AI infrastructure.

Baidu Intelligent Cloud prioritizes deep penetration into core processes of vertical industries like energy, finance, and automotive over simply chasing Token traffic volume. Leveraging its self-developed Kunlunxin chips, Wenxin large models, and Qianfan platform's "chip-cloud-model-agent" full-stack self-developed system, it has ranked first in both project count and contract value in China's large model bidding market for two consecutive years.

Volcano Engine pursues an aggressive MaaS-first strategy. ByteDance's massive internal application ecosystem—including Douyin, video creation tools, and the Seedance video generation model—amortizes fixed infrastructure costs, enabling Volcano Engine to maintain aggressive Token pricing. According to LatePost, Volcano Engine previously set a 2026 target of over RMB 10 billion in MaaS revenue, which has now been raised following model releases like Seed 2.0 and Seedance 2.0 and the continued popularity of OpenClaw.

Tencent Cloud has undergone a difficult transformation over the past few years. Around 2022, it proactively cut low-margin turnkey projects to focus on high-margin self-developed PaaS/SaaS products, establishing "being integrated" rather than "total integration" as its core strategy. While this temporarily pressured market share, it improved revenue structure: by 2025, IaaS accounted for 40%, PaaS 40%, and SaaS 20%, with PaaS and SaaS maintaining gross margins of 50%-70%, far higher than IaaS's 10%-15%. After 12 years, it achieved Large scale profitability for the first time, with Pony Ma citing this as a core accomplishment in the earnings report.

The Cost and Efficiency of Token Generation Determine Everything

AI has shifted cloud computing's billing unit from commoditized computing resources to differentiated intelligent services. The explosive growth of Tokens has removed revenue ceilings for MaaS-layer income for the foreseeable future. The economies of scale and ecosystem lock-in enabled by standardized APIs are granting leading cloud providers pricing power to some extent.

AI has improved the cloud computing business model, but opportunities will likely belong to only a few players: those with sufficient cash flow to sustain hundred-billion-dollar computing power investments; those capable of self-developing chips or deeply integrating domestic computing power to build cost control capabilities outside NVIDIA's ecosystem; and those with self-developed models and MaaS engineering capabilities, as model strength directly determines Token throughput per card, unit Token costs, and ultimately gross margins.

As Jensen Huang said: The cost and efficiency of generating Tokens determine tech companies' revenue and survival.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links