06/06 2025
490
Article | AI Relativity
Indeed, just as with the launch of Sora, when it comes to AI video generation, do we still need to look to foreign vendors for groundbreaking innovations?!
At the 2025 Google I/O Developer Conference, Google unveiled Veo 3, another heavyweight in the realm of large video generation models. It has been roughly six months since the previous Veo 2 was released, and Veo 3's updates are truly impressive. It not only seamlessly integrates video and audio, including music, background sound effects, and even character dialogues, which are naturally generated and synchronized with lip movements on screen.
Large video generation models have fully entered the "era of sound." With Veo 3's enhanced understanding and simulation of physical laws, the realism and immersion of current AI video generation have reached new heights.
Given such advancements, is there still a chance for domestic large video generation models to surpass their foreign counterparts? Let's set aside the results for now and focus on the industry's journey in the six months following Veo 2's release. On authoritative global evaluation rankings like VBench Leaderboard and Artificial Analysis, the competitive landscape in this field is dynamic. Domestic vendors such as Kuaishou's Keling 1.6pro, Keling 2.0, Alibaba's Tongyi Wanxiang, and Shengshu Technology's Vidu Q1 have successively topped the charts.
As the primary medium for content consumption today, videos enjoy immense traffic and popularity across various domains. Even within the realm of large AI models, competition in the video generation track appears fiercer than in other subfields, with vendors engaging in particularly thrilling "mutual beatdowns."
Domestic models "beating up" overseas models? The "spiral" of mutual competition among large video generation models
Recently, the authoritative global evaluation ranking Artificial Analysis released the latest ranking for large video generation models. After Kuaishou's Keling 1.6pro topped the list, Keling 2.0 once again claimed the top spot in the Image to Video track with an Arena ELO benchmark test score of 1124, surpassing numerous mainstream domestic and foreign video generation models.
Keling 2.0 is the latest product released by Keling AI in April. Since Keling AI's launch last year, it has undergone over 20 iterations. With such frequent updates, Keling AI has swiftly risen to the forefront of global large video generation models, showcasing impressive performance.
According to multiple win-loss rate evaluations within Keling AI, Keling 2.0's win-loss ratio against Google's Veo 2 is as high as 205%, and compared to OpenAI's Sora, it reaches 367%, indicating a significant gap. Simply put, before Google released Veo 3, domestic models maintained a substantial lead.
In fact, in the field of video generation models, Kuaishou's Keling has repeatedly topped authoritative evaluation rankings with an overall strength ranking of Top 1. The rise of domestic forces in this area is not isolated. We can still see the presence of other domestic players in the sub-rankings.
For instance, Shengshu Technology's Vidu Q1 has also topped the sub-rankings for text-to-video and image-to-video on authoritative domestic and foreign evaluation rankings like VBench Leaderboard and SuperCLUE, outperforming foreign models such as Runway and Sora, demonstrating the powerful and stable performance of domestic models.
Earlier this year, in January, Alibaba's Tongyi Wanxiang 2.1 held the top spot on the VBench Leaderboard. Compared to current video generation models, Tongyi Wanxiang 2.1 has achieved impressive results in complex motion processing, realistic physics restoration, text semantic understanding, etc., gradually shedding the AI feel and trending towards authenticity.
In summary, since OpenAI released Sora, igniting the field of video generation, competition in this subfield has been particularly intense. The rise of domestic models has seen them comprehensively wrestling with foreign models across various rankings. Today, one tops the list, and tomorrow, another knocks it down, forming a "spiral" of mutual competition.
Simultaneously, it is precisely under this scenario that AI video generation has made significant progress. From initial "meme" images to today's high degree of coordinated integration and consistency among audio, video, characters, actions, and backgrounds, AI videos are becoming increasingly realistic and immersive.
The crucial battle for large video generation models: traffic is king, who will reign supreme?
Compared to "benchmarking" on various authoritative evaluation rankings and repeatedly surpassing peers to gain industry discourse power, large video generation models face an even more critical battle: leveraging current online users' enthusiasm for video content consumption to swiftly achieve a creation "breakthrough" on social media platforms, facilitating product promotion, user education, and commercial exploration.
Unlike Sora, which was not open for external use upon its initial release, Google immediately launched the Flow platform on the day of Veo 3's release, opening it up to the market. This is not only due to improved technical maturity but, more importantly, vendors are now well aware that large video generation models need to create buzz and attract traffic through the creation and meme-making of a vast number of market users, thereby enhancing product awareness and occupying the market faster.
In short, large video generation models must both "benchmark" and "go viral." Increasing market data indicate that content consumption dominated by AI videos is accelerating, which may alleviate the current commercial dilemma faced by large video generation models.
Currently, topics related to Douyin's AI special effects have garnered over 3.6 billion views, Kuaishou's AIGC advertising revenue scale has increased by 12 times, with peak daily consumption exceeding 20 million yuan. The first paid AI short film, "Mysteries of Xing'anling," has sparked enthusiasm in the market. Video creations with themes such as AI+cute kids and AI+pets have attracted significant user attention and platform traffic, rapidly expanding advertising placement and merchandising capabilities. According to insiders, the advertising quote for a single piece of content in this field has reached 2,000-8,000 yuan, with earning potential still on the rise.
This is not only an exploration of the commercialization of large video generation models but also a transformation and upgrade of the video content creation industry chain. According to Kuaishou's third-quarter report for 2024, Keling AI's monthly turnover exceeded 10 million yuan and has established in-depth cooperation with leading brands such as Yili, vivo, and Lenovo.
This commercial competition centered on user and market attention is destined to be a game where "traffic is king." Currently, Jimeng AI, under Douyin, is replicating Doubao's path and has firmly held a top-10 spot on the Apple App Store, even topping the list at its peak. Meanwhile, according to QuestMobile data, from the end of December 2024 to mid-February 2025, Jimeng's weekly active user count increased from approximately 760,000 to nearly 2 million, achieving nearly triple growth. Compared to the "benchmarking" of other large models, Jimeng AI's "go viral" strategy has already yielded initial results in the market.
Judging solely by market feedback, the commercialization path of domestic large video generation models appears broader and faster than that of foreign vendors. Why is this?
Google's Veo 3 requires users to subscribe to the Ultra membership system for access, priced at $125, equivalent to RMB 902.52. Additionally, the Ultra membership is not unlimited and requires the consumption of a certain number of AI points, allowing only about 85 videos to be output per month with a paid membership.
Those familiar with AI video generation know that it is generally difficult for current large models to produce a "usable" video in one go. The pricing strategies and membership systems of foreign vendors fundamentally limit users' ability to mass-produce videos.
In contrast, domestic vendors adopt a strategy combining a free version with a membership system. They not only provide daily points to attract ordinary users to try it out, but the pricing of the membership system is also lower on average than that of foreign vendors. With abundant output, it encourages users to mass-produce videos, combined with Douyin and Kuaishou's open content systems, achieving efficient viral spread and triggering content consumption and high-frequency interaction.
Although there is still room for improvement in detail generation quality and industry competition, domestic large video generation models have taken the lead in establishing a commercial path based on good performance, low-threshold product experience, and a complete content consumption industry chain. As of March this year, Jimeng AI's monthly active user count has reached 8.93 million, achieving solid market data to support its commercialization efforts.
Written at the end
Today, whether it's foreign vendors or domestic players, they are all striving to use large video generation models to open up new avenues for content creation and consumption. Google's Veo 3 has ushered AI video into the era of sound, while domestic models like Keling, Jimeng, Tongyi Wanxiang, Hailuo, and Vidu are seeking matches and collisions in market supply and demand relationships through extensive user creation, thereby achieving commercial success.
In the second half of 2025, it is foreseeable that it may not be long before Google's Veo 3 is surpassed by domestic vendors. Stronger models will continue to "benchmark," and simultaneously, they must also achieve "going viral" to promote the maturity of large video generation models through a dual-track approach, which will become the norm in the future.
*All images in this article are sourced from the internet