05/22 2026
358


Do you remember the two mysterious Chinese AI models, A and B, that appeared in AI evaluations in early May?
These were test results released by developer Toyama Nao. Model A, which surpassed Gemini 3.1 Pro and Claude Opus 4.6 in extreme scoring, sparked widespread speculation.

Now, with the opening of the main forum at the 2026 Alibaba Cloud Summit, the true identity of Model A has been revealed: Alibaba Cloud officially released its new-generation flagship model, Qwen3.7-Max.
However, the biggest difference at this summit compared to previous ones is that the focus is no longer on showcasing parameter scale, context length, or chat experiences. Instead, there is a clear and aggressive direction: All in Agent.
Rather than calling it a product launch, it’s better described as a technical discussion between Alibaba and all AI users.
CTO Li Feifei stated bluntly in her main forum speech: The value of the cloud is shifting from large-scale management and operation of computing power to large-scale management and operation of intelligence.
Zhou Jingren, who took over Qwen after Lin Junyang's departure in March, was even more direct: Large models have shifted from 'human value alignment' to 'task alignment.'
These two brief statements correspond to a long-term strategic vision: Alibaba's AI strategy is dismantling the boundaries between models, computing power, security, and applications, reconstructing them as standardized components within an Agent framework.
This article will dissect the hidden signals revealed at the summit from a developer's perspective.
01 Everything as an Agent Component
If the competition among large model companies over the past two years has focused on 'whose model is stronger,' then from this summit, Alibaba's answer is clear: Models are just the starting point, and Agents are merely the visible endpoint for now. This is not just a slogan but is reflected in two concrete ways.
On one hand, models are being redefined: In the Agent framework, models cannot just be the 'brain' but must also serve as the intelligence hub.
The positioning of Qwen3.7-Max is clear. It is a new-generation flagship model designed for the Agent era, with its core capabilities revolving entirely around Agents:
Long-term autonomous execution: The model demonstrated sustained reasoning without performance degradation over 35 hours and more than 1,000 tool invocations in kernel optimization experiments.
Cross-framework generalization: The model performs consistently whether deployed on Claude Code, OpenClaw, or Alibaba's own Qwen Code.
Native tool invocation: The model supports MCP integration and multi-Agent collaboration, enabling direct control of office software, cloud services, and even higher-form (embodied intelligence) physical robots.
This represents the biggest shift in the new generation of models—no longer isolated 'brains' but designed as central processors for Agents. This aligns with the core requirements of Agent design: the ability to plan, invoke tools, reflect, correct errors, and adapt to various operating environments.

According to official benchmark data, Qwen3.7-Max's test results are close to Anthropic's previous flagship model, Claude Opus 4.6, and domestic top-tier models. Of course, these are mostly self-reported by Alibaba, and there remains a few percentage points' gap with Claude Opus 4.6 in some programming benchmarks.

Combined with third-party test results from Artificial Analysis, Qwen3.7-Max ranks fifth globally in intelligence and seventh in programming capabilities, both top in China. Its Agent capabilities are slightly inferior to Xiaomi and Zhipu, but the gap is negligible.
On the other hand, cloud infrastructure is being reconstructed: evolving from AI-native cloud to Agent-native cloud.
Li Feifei proposed two key concepts worthy of industry attention during her speech: AI Native Cloud and Agent Native Cloud. These two concepts are not hierarchical inclusions in the literal sense:

AI Native Cloud focuses on producing tokens, making them cheap and efficient through full-chain optimization of pre-training, post-training, and inference (e.g., KV Cache hit rates exceeding 90%).
Agent Native Cloud focuses on transforming tokens into actions, providing support for Agents in six directions: runtime sandbox, orchestration, governance, security, memory, and data plane.
The introduction of these two concepts indirectly refutes the view that intermediate variables like DAU, token consumption, and DAA should be the sole standards for measuring Agent output value. Meanwhile, this is also a very pragmatic idea. After all, rather than focusing too early on how to evaluate Agent value, it’s better to first clarify how Agents should deliver value.
More notably, Li Feifei announced on the spot that all Alibaba Cloud products will complete control plane transformations this year to achieve the 'Skillification,' 'MCPification,' and 'CLIification' required for Agent applications. Agents will replace humans as the primary users of cloud products. In the future, enterprises invoking products like OSS storage, PolarDB databases, and DataWorks data platforms will no longer need manual console clicks or script writing—all will be driven by Agent natural language.
From these two changes, it is clear that Alibaba has shifted from a comprehensive coverage strategy to fully embracing Agents as the design origin. Models, hardware, security frameworks, and storage are no longer independent product lines but will become plugins within the Agent system.
From a programmer's perspective, this perfectly aligns with the philosophy of operating system design and development: Agents are the applications, and the underlying infrastructure provides standardized APIs and runtimes.
02 Enterprise Market Clearly Defined as the Main Battlefield
Reviewing the announcements at the main forum, although not explicitly stated, Alibaba Cloud's service focus has clearly shifted toward enterprise clients. Individual programmers might even feel a sense of alienation while listening.
This is not personal bias. Li Feifei spent significant time at the summit discussing the 'six major challenges' and 'six solutions,' covering topics like sandbox isolation, identity authentication (Token Vault), and task-level security control—all non-functional issues of utmost concern to enterprise IT departments.

Compared to domestic and foreign competitors, who often emphasize the importance of individual developers and small development teams at launches, Alibaba aims to quickly leverage its existing foundation to capture the enterprise market. There are three underlying reasons for this:
1. Payment willingness and scenario complexity.
Consumer-grade AI assistants have limited payment rates, as demonstrated by the 'Buy a Thousand Questions, Get a Milk Tea' event in February. Only enterprises are willing to pay high subscription fees for 'saving a development team' or 'automating compliance processes,' even if the actual results may fall short of expectations.
In Zhou Jingren's speech, an easily overlooked detail was that Qwen3.7 deeply participated in a 35-hour autonomous chip kernel optimization process. If its capabilities can replace the overtime hours of senior engineers, its commercial value is self-evident.
2. Alibaba Cloud's existing ecological advantages.
As China's largest cloud service provider, Alibaba Cloud already has millions of enterprise clients. These clients have been using products like RDS, OSS, and MaxCompute for years, accumulating data and usage habits that have subtly transformed into extremely high migration costs.
From a technical standpoint, seamlessly embedding Agents into existing cloud products is also easier to form a commercial closed loop than building a B-end app from scratch.
3. Security and governance as sources of pricing power.
Setting aside a few top international models and focusing on the domestic market, the gradual homogenization of AI capabilities projected onto Agents is an undeniable fact. From OpenRouter's invocation volumes, the sole factor influencing individual developers' or small teams' choices is price—limited-time free models dominate the rankings for weeks or even longer.
However, for enterprises, the real issue affecting procurement decisions is: Are they brave enough to let Agents automatically operate production databases? Products like Alibaba Cloud's Agent Security Center, Agent ID Guard, and AI Safety Railing 2.0 directly address security concerns, essentially providing insurance for enterprise-level risk-taking. In other words, establishing standards in security governance equates to higher pricing power at this stage.
Alibaba's strategy is essentially to enhance efficiency through Agents and reduce risks through security systems. Li Feifei's 'six major challenges' are less technical issues and more must-haves on enterprise procurement checklists.
03 The Evolutionary Form of Vibe Coding: Vision Coding
Technically, Vibe Coding represents the earliest and most basic form of Agents. However, after months of development and experimentation, Vibe Coding has become the most successful, commercially viable, and mature form.
Yet this technology, positioned by major AI companies as 'benefiting all humanity,' has shown extreme polarization. Programmers and researchers have long embraced the convenience of Vibe Coding, but few people are willing to use various Agents to execute tasks. The cold reality is that most human-AI interactions still occur within web-based dialog boxes.
At the Alibaba Cloud Summit, one term stood out: Vision Coding.
Currently undefined in academia, a live demo better illustrates the concept: A user uploaded a video to the AI showing a whiteboard with roughly drawn boxes using a marker. The user pointed to the top box and said, 'When I click here,' and to the bottom box, 'This should display a landscape image.' The AI then generated a webpage with a matching layout.

While this sounds remarkable, it is not a newly introduced feature at this launch. The AI protagonist in the demo was Qwen3.5 Omni, previously released by Alibaba in March. This may not be the first AI product capable of such functions, but the term 'Vision Coding' was formally introduced here.
Like Vibe Coding, which provides programming capabilities for non-professionals, Vision Coding differs fundamentally:
Vibe Coding relies heavily on users precisely describing their needs in natural language. Even the most powerful models, like Claude Opus 4.7 and GPT-5.5, cannot bypass this requirement. If a user says, 'Make me a cool 3D webpage,' the results are often unpredictable—and for projects far more complex than webpages, the outcomes can be disastrous. What is touted as 'zero-threshold' actually has a threshold in expressive ability.
Vision Coding allows users to interact with AI through sketches, gestures, and vague spoken instructions. In this process, users need not worry about precise expression. Instructions like 'make this bigger' or 'move that button here'—the kind that product managers give and programmers dread—are readily accepted by the AI. The model simultaneously understands visual layout, spatial relationships, and vague intentions, lowering the threshold to 'able to speak and draw simple sketches to develop.'
From my perspective, Vision Coding represents a more advanced, inclusive, and practically valuable form of Vibe Coding. This evolution, which genuinely enhances development efficiency, is underpinned by a qualitative leap in multimodal fusion: Alibaba's visual Agent can not only 'understand interfaces' but also 'operate interfaces' and finally 'generate interfaces.' This visual-action closed loop (closed loop) leads the country, far more valuable than benchmark test scores.
Of course, blind technological optimism is unwise—Vision Coding will not be the ultimate form of 'zero-threshold development' or 'everyone a programmer.' However, the judgment that multimodality is a foundational function for Agents is undeniable. Information in the real world is inherently high-dimensional and multimodal: financial reports = text + tables, meetings = voice + PPTs, environments = vision + touch. Abandoning multimodality would confine Agents to the purely textual virtual world forever.
04 Conclusion
Finally, as a leader in the open-source industry, Alibaba continues to make significant investments in the open-source ecosystem. The Qwen3.6 model has been downloaded over 30 million times since its open-source release, with more than 1,200 derivative models developed.
In fact, the role of the Bailian platform is already evolving: originally, it served as a gateway for model APIs; now, it needs to become an all-in-one platform for the development, deployment, and operation of Agents.
This is similar to Apple's Appstore, where the model represents iOS, Skills represent Apps, and Agents represent user scenarios. Alibaba Cloud provides the infrastructure and security reviews, while third-party developers can sell their own Agent services. If this ecosystem model proves successful, Alibaba can naturally transition from 'selling computing power' to 'selling Agent solutions.' However, whether the business model can be fully upgraded largely depends on the capabilities of the foundational model.
Looking back at the entire summit, Alibaba Cloud has sent a clear signal: it aims to move beyond being merely a 'cloud service provider + large model company' and strives to become an infrastructure builder in the era of intelligent agents.
This is not just a leading judgment but a consensus among all domestic AI companies. To transition from domestic leadership to international leadership depends on whether the next-generation model can truly narrow the gap with Claude and GPT, as well as whether the Agent ecosystem can attract a sufficient number of third-party developers. To achieve these goals, Alibaba may still have a long way to go.
However, in addressing the future direction of AI, Alibaba's answer is commendable: focusing solely on Agents, prioritizing the enterprise market, and not abandoning multimodality.
AI is no longer just an additional feature in the cloud; instead, the cloud itself is being rewritten by AI.
The Agent is the one holding the pen. This rewriting has only just begun.
