09/05 2025
420
When news broke in the AI circle that OpenAI was quietly testing Google's TPU, the entire industry sensed something different. Today, cloud service giants like Google, Amazon AWS, and Meta are collectively betting on self-developed ASICs, with Nomura Securities even predicting that ASIC shipments will surpass Nvidia GPUs for the first time in 2026. Is this war without smoke the twilight of general-purpose computing power or the dawn of specialized chips?
Author | Fang Wensan
Image Source | Network
The Delicate Imbalance from [Value Monopoly] to [Quantity Catch-Up]
In the current AI server market, Nvidia still firmly holds the [throne of value].
Data shows that its AI GPUs account for over 80% of market value, while ASICs account for only 8%-11%.
However, if we shift our focus to shipments, a key metric, the balance is quietly tilting.
By 2025, Google's self-developed TPU chip shipments are expected to reach 1.5 million to 2 million units, while Amazon AWS's Trainium 2 ASIC is projected at around 1.4 million to 1.5 million units. Combined, their scale approaches 40%-60% of Nvidia's AI GPU shipments for the same period.
More disruptively, with Meta planning to mass-produce 1 million to 1.5 million MTIA chips in 2026 and Microsoft initiating large-scale ASIC deployment in 2027,
Nomura Securities judges that ASIC shipments as a whole are expected to surpass Nvidia GPUs at some point in 2026.
Behind this change is the urgent need of cloud service providers for [cost reduction and efficiency improvement].
Take Google, which processes massive amounts of data daily, as an example. Its latest-generation TPU is deeply optimized for the Transformer architecture, with computing efficiency improved by over 30% compared to the previous generation.
Amazon AWS's Trainium 2 focuses on distributed training scenarios, supporting parallel computing for models with hundreds of billions of parameters.
In specific scenarios, these ASICs' performance is approaching or even partially surpassing that of Nvidia's A100 GPU.
For cloud service providers, the significance of self-developed ASICs goes far beyond [replacement].
Google processes billions of search requests daily, AWS supports the cloud computing needs of millions of enterprises worldwide, and Meta's social platform generates massive amounts of interactive data every second. These scenarios have relatively fixed AI tasks, precisely matching ASICs' core advantage of [customization].
According to calculations, ASICs' power consumption can be controlled within 30% of GPUs for the same computing power. For cloud service providers needing to deploy tens of thousands of cards, the annual electricity savings are equivalent to the annual power generation of a small power plant.
ASIC Customization Rewrites the Economic Rules of Computing Power
If the chip world is compared to a toolbox, ASICs are [professional craftsmen] tailored for specific tasks, while GPUs are [all-rounders] capable of handling multiple scenarios.
This positioning difference is infinitely amplified in today's large-scale commercialization of AI large models.
ASICs' core advantage lies in their [extreme adaptation] to specific algorithms. Take large model inference as an example. Once a model is deployed, its algorithm logic (such as the attention mechanism in Transformer) and computing process (input/output format, precision requirements) remain fixed for a long time.
ASICs can directly [solidify] these logics into hardware architecture, removing redundant modules in GPUs used for general-purpose computing and allowing hardware resources to serve target tasks 100%.
Google's TPU v5e has three times the energy efficiency of Nvidia's H100, while AWS's Trainium 2 offers 30%-40% higher cost-effectiveness than the H100 in inference tasks, directly reflecting this optimization.
A more intuitive comparison is operational costs. An NVIDIA GPU consumes about 700 watts, with electricity costs of approximately 0.56 yuan per hour when running large models (at 0.8 yuan/kWh). In contrast, an ASIC chip with the same computing power can control power consumption within 200 watts, with electricity costs of only 0.16 yuan per hour for the same task.
For applications like ChatGPT, which require hundreds of thousands of inference chips, this gap means billions of yuan in annual cost savings.
The rise of ASICs also coincides with the [stage dividend] of the AI industry. Currently, large models are shifting from the [wild growth] training phase to the [large-scale deployment] inference phase.
Barclays predicts that by 2026, inference computing demand will account for over 70% of total general artificial intelligence computing demand, 4.5 times that of training demand.
The [algorithm solidification] characteristic of inference scenarios perfectly aligns with ASICs' [specialization], which is the core logic behind accelerating layouts by giants like Google and Meta.
From a historical perspective, this [general-purpose to specialized] iteration is not unfamiliar.
Bitcoin mining initially used CPUs, later shifted to GPUs, but truly achieved industrialization with Bitmain's ASIC miners, whose mining efficiency per unit of energy consumption was a thousand times that of GPUs.
As AI model architectures move from rapid iteration to relative stability, ASICs are repeating a similar [efficiency revolution].
The Practical Dilemmas and Hidden Concerns of ASIC Scaling
Despite ASICs' significant advantages, large-scale deployment is not without challenges. Behind this computing power race lie multiple challenges related to capacity, technology, and risks.
Capacity bottlenecks are the first hurdle. Take Meta's MTIA chip, planned for mass production in 2026, as an example. It relies on TSMC's CoWoS technology, but current CoWoS wafer capacity can only support 300,000 to 400,000 units, far below its shipment target of 1 million to 1.5 million units.
More critically, if Google, AWS, Microsoft, and other vendors expand production simultaneously, high-end packaging capacity will become a [choke point] restricting ASIC scaling.
Although TSMC plans to increase CoWoS capacity by 50% in 2025, capacity construction to actual production takes 12-18 months, making it difficult to alleviate supply-demand imbalances in the short term.
Technical thresholds are equally high. Large-size CoWoS packaging imposes extremely high requirements on chip design and material consistency, with system debugging cycles lasting 6-9 months.
Even technically mature Google needs to invest significant resources to solve heat dissipation and signal interference issues.
Meta's MTIA T-V1 chip adopts 36-layer high-specification PCBs and a hybrid liquid-air cooling system, with complexity comparable to aerospace-grade equipment. Any design flaws could lead to production delays.
A more hidden risk lies in ASICs' [specialization trap]. AI model architectures are not static. If the architecture shifts from Transformer to a new type in the future, previously invested ASICs may face the risk of [instant obsolescence].
Jensen Huang once bluntly stated, '[A] perfect ASIC performs exceptionally well on certain tasks but terribly on others. Once AI's workload changes, it becomes useless.'
This is why Google's Gemini model is still deployed on Nvidia GPUs—using a [general-purpose + specialized] hybrid architecture to hedge against technological iteration risks.
The [butterfly effect] in the supply chain also cannot be ignored. If cloud service providers like Meta and AWS concentrate on pulling goods, key materials such as high-end ABF substrates, HBM3E memory chips, and liquid cooling components are prone to shortages, further driving up costs and slowing down mass production.
In the second half of 2024, HBM memory prices rose by 30% due to surging demand, a scenario that may repeat during the ASIC expansion boom.
Nvidia's Technology, Ecosystem, and Counterattack
Facing ASIC challenges, Nvidia has not remained idle but built [triple barriers] through technological iteration and ecosystem reinforcement.
At COMPUTEX 2025, Nvidia introduced NVLink Fusion technology, opening its interconnect protocol to allow third-party CPUs or xPUs to collaborate seamlessly with its GPUs.
This strategy, seem [a compromise], actually expands ecosystem coverage through open interfaces while maintaining dominance in computing cores.
In terms of hardware parameters, Nvidia's H100 offers about 20% higher computing density than same period ASICs, with NVLink interconnect bandwidth 1.5 times that of self-developed ASICs, remaining irreplaceable in complex tasks like training large models with hundreds of billions of parameters.
The ecosystem barrier is Nvidia's [trump card]. Over 90% of global enterprise AI solutions are developed based on CUDA, creating deep path dependence for developers from model training to deployment.
Even if ASIC computing power approaches that of GPUs, enterprises would need to invest billions or even tens of billions of yuan to reconstruct their software ecosystems. These [conversion costs] form the most solid moat.
As Morgan Stanley analyzes, '[The] CUDA ecosystem is like a highway network with all cars running on it. To switch roads requires rebuilding the entire network.'
Supply chain control is equally critical. Nvidia is the largest buyer of HBM memory, accounting for over 70% of SK Hynix's capacity. Through deep cooperation with TSMC, it has secured the largest allocation of CoWoS packaging capacity.
This [resource-grabbing] ability allows Nvidia to dominate the pace of the computing power race.
While ASIC vendors are anxious about capacity, Nvidia has reduced marginal costs through large-scale procurement, maintaining high gross profit margins.
Jensen Huang's [ecosystem warfare] strategy is also paying off. By opening NVLink Fusion, Nvidia has incorporated vendors like MediaTek and Marvell into its [circle of friends], forming a heterogeneous computing ecosystem of [GPUs + third-party xPUs].
This [self-centered, open-cooperation] model not only addresses ASIC challenges but also consolidates its core position in the industrial chain.
Coexistence of General-Purpose and Specialized Solutions Likely the Future Landscape
The rise of ASICs does not mean the decline of GPUs but rather the beginning of the AI computing power market shifting from [one-pole dominance] to [diverse coexistence].
The ultimate outcome of this transformation is more likely a dual-track parallelism of [general-purpose GPUs + customized ASICs].
In the short term, ASICs serve as [incremental supplements] rather than [stock replacements].
Nvidia still holds absolute dominance in the high-end training market (such as large models with hundreds of billions of parameters), with its technological accumulation and ecosystem advantages difficult to shake in the short term.
ASICs, meanwhile, are rapidly penetrating specific scenarios, becoming important choices for cloud service providers to reduce costs and improve efficiency.
2025-2026 will be a transitional period of dual-track parallelism, with the market featuring [GPUs dominating value and ASICs growing in quantity].
In the long term, the market will exhibit [hierarchical competition].
Nvidia will continue to lead the general artificial intelligence computing power market, supporting frontier model exploration and complex task processing.
ASICs will dominate vertical scenarios, maximizing efficiency through customization.
For sovereign AI systems, ASICs may become an important path to break through supply restrictions but require overcoming multiple barriers related to technological accumulation, talent reserves, and ecosystem construction.
From an application perspective, the division of labor between the two will become clearer: GPUs handle [0 to 1] innovation exploration, while ASICs handle [1 to N] large-scale deployment.
Just as supercomputers are used for frontier scientific research while dedicated servers support daily data processing, the AI computing power market will also form a balance between [innovation and efficiency].
Industry data also supports this trend. Morgan Stanley predicts that the AI ASIC market will grow from $12 billion in 2024 to $30 billion in 2027, with a compound annual growth rate of 34%, while the GPU market will still maintain over 20% growth during the same period.
This means that ASICs' rise is expanding the [pie] of the AI computing power market rather than simply seizing GPU market share.
The Essence of the Computing Power Revolution: Balancing Efficiency and Innovation
The game theory between ASICs and GPUs is essentially a microcosm of the AI industry's evolution from [general-purpose computing power] to [specialized efficiency].
As the training cost of large models has surged from the ten-million-dollar level during the GPT-3 era to the billion-dollar level for Grok3, efficiency has become an unavoidable core proposition, providing fertile ground for ASICs' rise.
However, the uncertainty of technological innovation makes the flexibility of general-purpose GPUs indispensable.
The future AI computing power landscape will not be a zero-sum game of [either-or] but rather a symbiotic ecosystem where [each excels in its own way].
Nvidia will continue to dominate the general-purpose computing power market with its technological, ecological, and supply chain advantages.
Giants like Google, AWS, and Meta will build barriers in vertical scenarios through ASICs.
Vendors like Broadcom and Marvell will also carve out a niche in the customized chip field.
The deeper significance of this transformation lies in redefining the cost structure and technological route of the computing power economy.
Partial Source References:
The Light of Computing Power: 'ASIC Chips Surge: Is Nvidia Panicking?'
Securities Star: 'Is the Era of ASICs Approaching?'
Hard AI: 'After Google, Meta's Demand Explodes: Will ASICs Surpass Nvidia GPUs Next Year?'
Elecfans: 'OpenAI Unleashes a 'King Bomb': One APP Outperforms Entire Offices as ASICs Begin to Overtake GPUs?'
China Electronics News: 'ASIC Shipments to Surpass GPUs by 2026? The Era of ASICs Accelerates'
EE World: 'In the DeepSeek Era, ASIC Chips Are Crowned Kings'
Semiconductor Industry Vertical and Horizontal: 'Is the GPU Throne Shaking? ASICs Rewrite the Rules'