"New Big Five" Intensify Competition, Reshaping the Landscape of Basic Large Model Players

05/14 2025 330

Large model companies are locked in a fresh round of intense ranking battles.

Amidst the burgeoning landscape of AI applications, the battle for supremacy in basic large models rages on, defying expectations of a slowdown and intensifying in its competitiveness:

According to Guangzhui Intelligence's incomplete statistics, since the start of 2025, major Chinese large model companies, including Baidu, ByteDance, Alibaba, DeepSeek, and the "Six Little Tigers," have unveiled over 45 basic large models (excluding industry-specific models), averaging one new model every 3.3 days.

In the first half of the year, China's top-tier large model firms are closing the gap with their overseas counterparts.

Mirroring the overseas "Big Five" comprising OpenAI, Google, Anthropic, X.AI, and Meta—a mix of three tech giants and two startups—China's large model landscape is differentiating into a "3+2" constellation of Alibaba, ByteDance, DeepSeek, Stepwise Stars, and Wise Spectrum, emerging as the country's "New Infrastructure Big Five."

Gods in Battle: "New Infrastructure Big Five" Unveil 30 Large Models in Six Months

The first half of the year witnessed a fierce "gods in battle" scenario in the realm of basic large models.

Based on the number of models released by China's "Big Five" in the first half, these five companies collectively launched 32 large models. In terms of release volume, Alibaba and Stepwise Stars emerged as the two most competitive, with Stepwise Stars unveiling 11 models and Alibaba 9, together accounting for over half of the total from the five companies.

Aligning with the industry's development trends, open source, inference, and multi-modality have become pivotal keywords:

From a temporal perspective, Alibaba, Wise Spectrum, and DeepSeek have long been committed to the open source path.

Notably, Alibaba stands as the unparalleled "king of open source competition," having released multiple models on open source communities since 2023. In terms of both the number and variety of open source models, Alibaba leads the pack.

This position has made Alibaba a leading open source SOTA provider across multiple fields. The hybrid inference model Qwen3, launched on April 29, emerged as the world's strongest open source model, with costs amounting to just 35% of DeepSeek-R1.

Before Qwen3, DeepSeek sparked enthusiasm for inference models in China, serving as the "catfish" that stirred the waters.

Unlike the aforementioned four companies, DeepSeek is the sole "specialist" not building a large model matrix. For instance, the inference large model DeepSeek R1, released during the Spring Festival, not only delivered top performance but also achieved exceptionally low costs, with training expenses amounting to just 1/10 of GPT-4 and input costs as low as 2%, setting a benchmark for subsequent inference large models.

Multi-modal large models represent a key R&D direction for large model companies this year.

For example, Stepwise Stars, known as the "king of multi-modal competition," has so far released 22 basic models, 16 of which are multi-modal large models.

Among these, Stepwise Stars jointly launched the open source text-to-video large model Step-Video-T2V with Geely, which at the time became the world's largest and most performant open source video generation large model. Meanwhile, Step-Audio is the industry's first product-level open source speech interaction model.

ByteDance, with its comprehensive layout, has also begun to ascend to the first tier in the multi-modal domain this year. Take the text-to-image model Seedream 3.0 as an example; apart from enhancing image quality and generation efficiency, it also bolsters the application capabilities of AI-generated images in commercial realms. Consequently, the text-to-image capability of its corresponding product Jimeng AI has gained widespread popularity.

However, compared to large language models, the industry's development of multi-modal large models is far from sufficient. As Jiang Daxin notes, "In the realm of multi-modal models, there has yet to be a GPT-4 moment."

In Jiang Daxin's view, the crux lies in the industry's lack of an integrated architecture for understanding and generation in the multi-modal domain. While large language models have achieved this, multi-modal large models still rely on different models for understanding and generation. For computer vision, this is a persistent problem that has eluded a solution for decades.

Compared to the large language model ChatGPT and the inference model DeepSeek-R1, the multi-modal domain still holds the potential for the next blockbuster model.

Towards AGI: What is the Only Path Forward?

With AI applications in full swing and the marginal benefits of Scaling Law diminishing, is it still worthwhile to continue investing in basic large models?

100 days ago, DeepSeek's release undoubtedly provided the industry with a definitive answer.

"Everything is changing so rapidly, and every morning, the release of new models and products could potentially overturn established perceptions," exclaimed a startup founder to Guangzhui Intelligence.

As mentioned earlier, over the past six months, the pace of model releases has not waned but accelerated. Concurrently, the technological dividends stemming from improvements in model capabilities have propelled firms like DeepSeek to the forefront, while companies lacking technological advantages have lost their chance to remain in the top tier as investment enthusiasm wanes.

To sustain their presence, whether large corporations or startups, the scramble for funds and talent remains the central theme of 2025.

Today, computing power, talent, and capital are still the three crucial metrics for gauging the standing of large model companies. For large firms, funds are naturally not an issue, but startups must secure sufficient investment to fund the company's early-stage R&D.

In terms of funds, large companies are naturally more abundant. However, among startups, only those favored by state-owned enterprises, such as "Beijing Team" Wise Spectrum and "Shanghai Team" Stepwise Stars, can continue to attract capital during the large model companies' winter.

Take Wise Spectrum as an example; it successively secured investments from three state-owned enterprises in Hangzhou, Zhuhai, and Chengdu in March, totaling 1.8 billion RMB. Last December, Stepwise Stars secured hundreds of millions of dollars in funding, completing its Series B round.

Regarding talent, the current "Big Five" are exhibiting a siphon effect. Take ByteDance as an example; from 2023 to 2025, the company has poached numerous R&D leaders from home and abroad, including Wu Yonghui, once the Vice President of Research at Google DeepMind, who joined ByteDance this year as the head of basic research for the large model team Seed.

Based on substantial accumulations of funds and talent, the aforementioned Big Five have gradually established their advantages: "King of Open Source" Alibaba leverages its ecosystem to attract B-end users, while ByteDance complements its basic model landscape and relies on applications like Doubao and Kouzi to feed back model upgrades. DeepSeek has emerged as the king of cost-effectiveness with its performance and low price. Wise Spectrum's large model boasts significant advantages in government and enterprise deployments, and Stepwise has become the "king of multi-modal competition" by unveiling multiple SOTA models.

The goal of these enterprises is to continuously elevate the "intelligence ceiling" of large models and harness the overflow model capabilities to support breakthroughs in AI application capabilities.

Take intelligent agents (Agents) as an example; their key capabilities lie in multi-modality, slow thinking, and memory.

With multi-modal understanding capabilities, agents with large models as their technical foundation can "read" and comprehend information on mobile phone and computer screens, enabling AI to replace humans in operating smart terminals. Meanwhile, reasoning capabilities empower AI to disassemble tasks based on user needs and progress according to each planned step, ultimately completing the task.

Google DeepMind CEO Demis Hassabis believes that the path to AGI is becoming clearer, but to truly achieve this goal, multiple technical bottlenecks still need to be overcome, and multiple key capabilities need to be integrated.

Within a limited timeframe, those with more comprehensive hard indicators and stronger base model capabilities will have the opportunity to genuinely obtain an AGI entry ticket.

Trends in Large Model Commercialization: Open Source and Vertical Scenario Deployment

Commercialization is an unavoidable proposition for basic large model companies, and their commercialization strategies often build upon their technological strategies.

In 2025, open source and vertical scenario applications have emerged as two key directions for model commercialization.

First, let's discuss open source. Open source Chinese large models have already captured half of the global market. Currently, on the open source community HuggingFace, 12 out of the top 30 popular models originate from Chinese companies, including Stepwise Stars' latest music model ACE-Step, DeepSeek's R1 and Prover-v2, Alibaba's Qwen3 series, ByteDance Seed's small parameter code model, and Tencent's AI video model.

After open sourcing, the commercialization avenues open to large model companies become more diverse: Domestic companies represented by DeepSeek and Alibaba adopt a more open protocol, and such models generally follow three models. The most intuitive is paid API calls, where cloud vendors charge "utilities" by providing GPU services. Additionally, customized adjustments and technical services around open source models also constitute a viable model.

However, the number of enterprises and individuals who can directly utilize open source models is small, and most require an "out-of-the-box" complete product. Therefore, the application of AI in vertical scenarios is gaining traction.

The most popular are undoubtedly intelligent agents (Agents) across various industries, from government and enterprise to finance and healthcare; intelligent agents are ubiquitous. But the current focus is on the integration of intelligent agents and smart terminals.

Why have intelligent agents + smart terminals become a key implementation direction?

"Cars not only possess high-value software and hardware systems but also have close ties with users, making them ideal AI carriers," said Wu Huixiao, CTO of Great Wall Motors. Similarly, this rule applies to products like mobile phones and embodied intelligence.

For manufacturers, developing Agents, akin to multi-modality and reinforcement learning capabilities, is also one of the cornerstones for large model enterprises to pave the way to AGI. In the five stages of AGI outlined by OpenAI, Agents correspond to the L3 stage, where AI possesses autonomous operation capabilities. Building upon L3, AI can further pursue autonomous learning capabilities.

Therefore, for large model companies, the commercialization strategy of developing Agents is a step extending from their technological foundation.

Stepwise Stars and Wise Spectrum, the two northern and southern giants of large models, have coincidentally targeted the smart terminal track.

Wise Spectrum launched the intelligent agent AutoGLM capable of running on mobile phones last year, enabling Agents to take over various application deployments and fulfill user needs.

This year, Stepwise Stars further expanded the scope of Agent deployment in smart terminals. During the open house in February, the company unveiled Agent applications in four domains: automobiles, mobile phones, embodied intelligence, and IoT.

Currently, various large model companies are vying for orders from smart terminal customers. Take Stepwise Stars as an example; this year, it has secured collaborations with manufacturers such as OPPO, Qianli Technology, Geely Auto Group, and Zhiyuan Robotics.

The integration of Agent capabilities is also becoming a selling point for smart terminal products. For instance, OPPO's Find N5 and Find X8 smartphones, equipped with "One-click All-round Search" and "One-click Screen Inquiry," have achieved impressive sales. It is reported that Find X8 has become the highest-selling product in the Find series for the same period in history.

Compared to other businesses, the collaboration of intelligent agents + smart terminals has also generated considerable revenue for enterprises. According to "Intelligent Emergence" reports, with the signing of large orders from Samsung and others, Wise Spectrum's revenue surpassed 100 million RMB in less than a month after the holiday.

From the perspective of this generation of AI large model companies' commercialization, all are aiming to avoid repeated private customization in traditional ToB tracks and hope to standardize products as much as possible with the help of technological dividends, thereby achieving higher gross profit margins.

Whether it's a "self-service hotpot restaurant" providing open source tools or a "private kitchen restaurant" for vertical intelligent agents, the prospects for large model commercialization are becoming increasingly optimistic.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.