08/19 2025
507
By She Zongming
In the PC Internet era, the primary interface for technology was the Web; in the mobile Internet era, it was the App. What about the AI era?
Bill Gates' prophecy two years ago provided the answer: AI Agents will be the biggest racetrack in AI. "Agents (intelligent agents) will not only revolutionize how people interact with computers but will also disrupt the software industry, sparking the most significant computing revolution since typing commands and clicking icons."
If his assertion at the time seemed somewhat ahead of its time, the growing consensus today that "the era of AI Agents has arrived" is like the flick of a switch.
Three months ago, Microsoft CEO Satya Nadella stated at Microsoft's 2025 Build Conference: "We have entered the era of AI Agents and are witnessing how AI systems are assisting us in solving problems in innovative ways."
Interestingly, Musk, who was in attendance at the time, has recently prepared to establish a subsidiary named MacroHard under his AI company xAI, with a name reminiscent of Microsoft, aiming to build an AI Agent ecosystem.
As the "coolest Agent trend" sweeps through Silicon Valley, Chinese tech enterprises across the ocean are also actively positioning themselves in this racetrack that will shape the AI application ecosystem landscape for years to come, leveraging forward-looking technological layouts and unique paths, striving to transform from followers to leaders.
01
Looking back at the history of modern technological evolution, technological development often follows the path of "technological breakthrough - industrial focus - scenario implementation," and AI is no exception.
At the beginning of 2023, ChatGPT's emergence heralded the start of the arms race for large AI models. Over two years later, under the guise of the term "Year of the Agent," the focus of competition among global tech giants is shifting from large model parameters to Agents.
The reason is straightforward: as the marginal benefits of expanding large model parameters diminish, transforming AI from a "passive response tool" into an "active planner and executor" has become a new industry proposition. Agents are the key carriers for AI to advance from "perceptual intelligence" to "cognitive intelligence" and the core bridge connecting large model technology with real-world scenarios.
As an AI application form that can autonomously understand tasks, plan steps, and invoke tools, Agents transform AI from an isolated technological module into a "productivity unit" deeply embedded in enterprise operation systems, solving the fragmentation and low input-output ratio issues in traditional AI application scenarios and propelling the application and implementation of AI technology on the industrial front.
In Silicon Valley, following GPT-4, OpenAI swiftly launched GPT-4o Agent, attempting to break the limitation of large models being "dialogue-only" by connecting tools like code interpreters and web browsers; Microsoft deeply integrates Copilot into the Windows system and Office suite, proposing the "Agent for Everyone" strategy; Google, on the other hand, is betting on "multi-agent collaboration" and has released the Gemini Agent Suite.
▲Tech giants worldwide are racing to develop AI Agents.
In China, tech enterprises are also intensifying their efforts. Among them, Baidu stands out: from releasing the Wenxin Intelligent Agent platform AgentBuilder in 2024 to launching the world's first content operating system "Cangzhou OS" in April this year, pioneering the industry's first AI Agent GenFlow1.0 that meets all scenarios and covers the entire link, and unveiling the universal super agent App Xinxiang, Baidu's deep cultivation trajectory in the Agent racetrack is evident.
At Baidu AIDAY on August 18, Baidu Wenku and Baidu Netdisk jointly released the world's first universally accessible AI Agent GenFlow2.0, injecting robust momentum into the competition of domestic AI in the global Agent field with breakthroughs such as "universal access across all terminals," "parallel tasks," and "traceable memory."
Behind this tacit shift lies the recalibration of the AI industry's understanding of AI's value: the value of AI lies not in spectacle but in application. In the AI era, enterprises need AI assistants that can autonomously generate financial reports and break down project plans, while individuals need AI assistance that can synchronize email processing and organize materials, all driving AI to evolve from "talkative" to "capable and diligent."
02
Although AI Agents are expected to experience explosive growth, it must be acknowledged that the industry still faces a gap between ideal and reality: most Agent products on the market remain stuck at the "single-round dialogue + plugins" stage, failing to transition from laboratory toys to productivity tools.
In April this year, Gartner released a report stating that the market is being flooded with the so-called "agent shuffle" phenomenon, where vendors repackage ordinary AI assistants or ChatBots as "agents," but these products actually lack true autonomous intelligence capabilities.
In reality, there is still a significant gap between many current Agents and users' real expectations, specifically reflected in several aspects:
1. Insufficient task decomposition capabilities. Many Agents exhibit logical breaks when handling complex tasks.
If asked to "generate a quarterly analysis report on the new energy vehicle market, including policy analysis, competitor data, and trend predictions," it may omit key modules like "policy analysis" or limit "competitor data" to a single brand, often requiring significant manual revisions to the final output.
This is because the task planning algorithms of many current Agents still rely on simple rule matching, lacking the ability to deeply understand and dynamically adjust complex requirements, and unable to decompose complex goals into ordered subtasks such as "data collection - analysis modeling - content generation - format conversion" like humans.
2. Uncontrollable result quality. Many Agents generate content with frequent low-level errors.
I once used an educational Agent to help me generate lecture slides, and the PPT contained the conclusion that "the medium is the message, as proposed by Neil Postman." This error is a clear example of the Agent's lack of specialized knowledge bases and inability to safely access private resources (leading to one-sided content), relying solely on its large model training data (leading to outdated information).
3. Bottlenecks in efficiency and collaboration. Some Agents take a long time and are prone to getting stuck when handling cross-domain complex tasks, and some cannot seamlessly interface with users' existing tools (such as documents, cloud disks, and professional software), becoming information islands.
To put it bluntly, it's because these Agents rely too heavily on single model capabilities and serial working modes, making it difficult to handle demands such as parallel multitasking and dynamic adjustments.
▲Most Agents on the market have numerous capability deficiencies.
Theoretically, an Agent should be like an AI butler with superpowers, drawing a mind map (planning) after receiving a task, checking if the steps are correct (verification), and then executing them in an orderly manner. It should have digital eyes to "recognize the path and avoid obstacles" (environmental perception), an AI brain to weigh the pros and cons (autonomous decision-making), a toolbox of tools to "book tickets and call cars" (tool invocation), and the ability to start and finish (complete task loops).
But too many Agents are either visually impaired or intellectually impaired, with unsatisfactory usability and stability. This leads to insufficient willingness among enterprises to entrust their core business and important tasks to Agents.
Pain points like these actually correspond to opportunity windows for the industry. As the Agent race intensifies, whoever can take the lead in solving issues such as usability, stability, and resource integration will be able to establish dual barriers of technology and ecology in the Agent era, gaining the initiative in defining industry standards. And GenFlow2.0's problem-solving orientation is very clear.
03
Unlike current Agents that can only run on webpages or clients and require invitation codes or beta access, Baidu Wenku GenFlow2.0 has two typical characteristics: universal access across all terminals and immediate availability.
It is currently synchronized on Baidu Wenku's webpage and Baidu Wenku APP, allowing users to use it out of the box without waiting in line.
When I tried it on mobile, I found that it had a very user-friendly feature - changing the traditional "waterfall" task display to "side-by-side" progress visualization. After giving instructions, I could intuitively see the division of labor among each Agent (e.g., Agent A is responsible for data search, Agent B is responsible for PPT generation).
My inner monologue after the experience was: Baidu should give the product manager a raise. Compared to endlessly scrolling down, this operation is more in line with the daily habits of ordinary users, right?
It's worth noting that being the "world's first universally accessible Agent" is not the only advantage of GenFlow2.0 - it has made several industry firsts, including parallel mode, memory mode, and full-process intervention mode.
▲When I used GenFlow to analyze the evolution of the Agent market landscape from 2024 to 2025, I paused halfway to add new requirements.
First, let's talk about parallel mode. When I asked GenFlow2.0 to analyze the Agent market landscape in 2025, generate a comparison table, and create a competitive analysis PPT, it automatically scheduled multiple expert-level Agents such as "Market Analysis Agent," "Data Visualization Agent," "PPT Generation Agent," and "Netdisk Retrieval Agent" to work in parallel rather than sequentially.
It is understood that GenFlow2.0 relies on Baidu Wenku and Netdisk's self-developed Multi-Agent infrastructure, enabling "100+ expert Agents to process in parallel" for multi-agent collaboration, transforming AI task execution from waiting to immediate availability, achieving minute-level delivery (multiple complex tasks can be completed in 3 minutes), and raising the efficiency ceiling.
Next, let's talk about memory mode. A few days ago, I asked GenFlow2.0 to help me generate an analysis of the development path of Agent technology, and today I asked it to "analyze the Agent market landscape in 2025." It automatically invoked historical data, eliminating the need to re-explain the problem background and avoiding redundant work.
Behind this is GenFlow2.0's pioneering "long-short-temporary" three-level memory hub, which can remember user dialogues, operation preferences, file interaction records, modification traces, etc., across multiple rounds of tasks, achieving "the more you use it, the more it understands you."
Now, let's talk about the full-process intervention mode. When I was generating content with GenFlow2.0, I proposed halfway through to "include the Agent market landscape in 2024," and it immediately adjusted to incorporate what I said.
This is also where Wenku GenFlow2.0 differs from other Agents: while general Agents follow the process of "write Prompt (prompt word) - long wait - find error - rewrite Prompt - new round of waiting," it is "say something - watch it work - make changes anytime - immediate availability," with real-time intervention capabilities allowing users to pause, backtrack, supplement instructions, or add files at any node in the task flow.
These breakthroughs are obviously not just single-point technological upgrades.
04
AI technological progress is certainly not linear but exponential, but it will never be built from scratch. GenFlow2.0's ability to claim many "firsts" in the Agent field is inseparable from the support of three dimensions: "specialized accumulation + ecological collaboration + full-stack layout."
From the perspective of specialized accumulation, Baidu's exploration of Agents has long formed progressive breakthroughs.
The success of single Agents in Baidu Wenku, such as PPTs, picture books, raw images, and research reports, in vertical scenarios paved the way for Cangzhou OS and GenFlow1.0 to achieve basic multi-Agent scheduling for the first time, and also provided a fulcrum for GenFlow2.0 to achieve the leap from "usable" to "user-friendly."
It can be said that the release of GenFlow2.0 is not "from 0 to 1" but "from 100 to 1" - installing hundreds of mature Agents that have been market-validated into the same "aircraft carrier battle group," with the path of "specialized breakthrough - system integration - experience upgrade" thus taking shape.
From the perspective of ecological collaboration, GenFlow2.0 has constructed a dual cycle of "Baidu's own ecology + third-party cooperation ecology."
Within the Baidu ecosystem, it is fully integrated with Wenku and Netdisk's "three libraries (Wenku's public domain professional material library + user-authorized Netdisk private database + user memory library) and one platform (Baidu Academic Platform) and three tools (reader, editor, player)," and deeply connected with products such as Luobo Kuaipao (intelligent transportation), digital human live streaming (content creation), and Miaoda (no-code development).
If asked to generate an "outing plan for the National Day holiday," the system can automatically invoke Baidu Maps to generate an interactive itinerary H5 and synchronously book airport transfer services through Luobo Kuaipao.
▲FenFlow2.0 is inseparable from the support of Baidu's full-stack AI layout and forms a linkage with other Baidu AI products.
In terms of external ecosystem compatibility, GenFlow 2.0 seamlessly integrates with the MCP protocol, which has been natively incorporated into Honor MagicOS. This enables users to harness its full capabilities with a single tap on their phone's negative one screen, facilitating a seamless transition from "PPT creation on mobile phones, editing on tablets, to presentation on computers." WPS, DingTalk, and Feishu are also currently undergoing grayscale testing.
The openness of the MCP protocol allows GenFlow 2.0 to be embedded into any application, much like LEGO blocks. This means its multifaceted capabilities are no longer confined to a single app but instead permeate throughout users' daily lives and work scenarios.
From a full-stack perspective, Baidu's comprehensive AI capabilities provide robust support for GenFlow 2.0 from the ground up. This ranges from the Kunlun Chip at the hardware level (supplying computational power) to PaddlePaddle at the framework level (ensuring efficient multi-agent scheduling) to the Wenxin Large Model at the model level (featuring a hybrid expert model architecture and multimodal understanding capabilities). Baidu stands as one of the few companies globally that have achieved full-stack AI self-research. This technological depth supports the "end-to-end optimization" loop.
05
History may not repeat itself, but it often rhymes. Over two decades ago, Windows transformed DOS from command lines to icons; a decade later, iOS/Android shifted from keyboards to touchscreens. Today, Agents introduce a novel human-computer interaction mode and task execution paradigm.
In the future, those who can better address user needs and capture their attention with Agent products that balance usability and stability will secure a spot for the next decade.
Currently, GenFlow 2.0 mirrors the early days of Android. Just as Android unified hardware, applications, and services atop the Linux kernel years ago, creating a formidable ecological barrier, GenFlow 2.0 now unifies computational power, models, data, Agents, and humans through the most basic interface of natural language.
For users, the practical value of AI is maximized when complex tasks can be accomplished with a simple voice command. This practicality stems from a profound understanding of user needs, ultimately leading to a groundbreaking innovation in user experience.
▲The paramount importance for AI Agents lies in their practicality and ease of use.
Such innovations will not only reshape users' expectations for Agents but also reinvigorate China's competitiveness in this domain.
Global competition in the Agent space has intensified, with Silicon Valley giants aiming to replicate their dominance from the PC and mobile internet eras into the Agent era.
Against this backdrop, the significance of GenFlow 2.0's multiple breakthroughs transcends mere technological advancements. It serves as a frame of reference and lays the groundwork for domestic AI to catch up and lead in Agent standards.
Unlike OpenAI's Agent ecosystem, which focuses on general capabilities, and Microsoft's, which centers around the office system, Baidu's GenFlow 2.0 offers seamless multi-scenario switching, universal applicability, data security and controllability through public and private domain knowledge integration, and efficiency enhancements via parallel processing, traceable memory, and full-process intervention. This demonstrates that domestic Agents can establish their own innovative leadership and differentiated advantages, positioning themselves as a global productivity platform that rivals OpenAI, Microsoft, and Google.
It is foreseeable that in the near future, as highly usable Agents revolutionize the landscape of "AI applications," the tides of AI will wash over every familiar shore with a fresh, vibrant rhythm. Let us observe and anticipate this transformation.