08/06 2025
520
After more than two years of rapid development, AI is hurtling towards the Agent era.
As AI transitions from 'passive response' to 'active decision-making', AI Agents emerge as the central hub connecting the digital and physical worlds.
From enterprise Agents handling customer service tickets autonomously, to academic Agents coordinating multi-step scientific research experiments, to personal Agents managing smart home ecosystems, these intelligent agents—equipped with reasoning, planning, memory, and tool-using capabilities—are reshaping industrial landscapes.
Supporting their intelligence is a sophisticated infrastructure that encompasses not only algorithms and models but also the entire lifecycle support system, from research and development to deployment, and from collaboration to operation and maintenance.
In 2025, the infrastructure for AI Agents (Agent Infra) will reach a pivotal point of explosive growth. Breakthroughs in open-source large models like DeepSeek and Qwen provide Agents with potent cognitive 'brains', while the thriving Model Context Protocols (MCP) ecosystem endows them with flexible 'limbs'.
According to IDC's predictions, 80% of global enterprises will deploy Agents within the year. The co-evolution of 'brains' and 'limbs' necessitates a comprehensive upgrade of the supporting 'body', making Agent Infra the focal point of technological advancements.
Agent Enterprise Applications Face Five Major Challenges
Products leveraging AI capabilities to automate workflows have long existed. Prior to the advent of generative AI, RPA-type products were prevalent.
However, constrained by the weak AI capabilities of the time, RPA could only automate simple, single workflows, lacking true intelligence and unable to solve complex, composite problems.
It was only with the emergence of generative AI and various truly intelligent Agent applications that people began to experience significant efficiency gains from AI automation.
An Agent is essentially an AI that can invoke various tools. For example, Manus uses prompts to control AI models and orchestrate complex workflows, enabling AI models to utilize various tools to accomplish intricate tasks.
However, both research-oriented Agent applications like DeepResearch and general-purpose Agent applications like Manus are typically provided to end-users through web pages or apps.
This delivery method is ill-suited for professional AI developers, AI entrepreneurs, and enterprise users. They require Agents to use proprietary data, seamlessly integrate into their businesses, and continuously deliver value to their operations.
When commercializing the use of Agents, the first issue encountered is terminal performance. When a powerful Agent runs on the user's local terminal, various issues arise.
Foremost among them is the limitation of AI inference computing power. An Agent comprises a powerful AI model and a suite of toolchains for invocation.
Running a powerful AI model typically necessitates dedicated AI computing power, provided by GPUs or AI-specific chips. Few consumer-grade PCs or mobile phones can deploy high-precision large models. Thus, currently, a substantial number of Agent companies utilize cloud computing power, with both model training and inference done in the cloud.
The second issue is the computing power required for task execution. Agent tasks are characterized by high concurrency and high computing power demands. After an enterprise deploys Agents locally, as the business volume supported by the Agents begins to grow rapidly, more computing power is immediately required, and the speed of local deployment cannot keep pace; conversely, when the business is idle, there is less demand for computing power, resulting in significant resource waste for the enterprise.
For example, Manus initially used virtual machines on local servers to perform tasks, which led to insufficient performance and unstable services when a large number of users flooded in, affecting its initial reputation to some extent.
The third issue is the hassle of configuring AI tools. If an Agent cannot invoke tools, it will struggle to solve complex problems.
For instance, to build a sales Agent, it needs to invoke the CRM to obtain customer information, call upon the internal knowledge base to automatically introduce products to customers, and also invoke various communication tools to directly reach customers.
There are already numerous intelligent computing centers across China that can initially alleviate the issue of computing power limitations. However, these intelligent computing centers only provide computing power and do not offer the various toolchains needed to build Agents.
For enterprises to customize Agents that are tightly coupled with their businesses, they need to build their own toolchains. This is a complex project that requires high development costs on the one hand and, on the other hand, necessitates a significant development time before the Agent is officially deployed, which can actually slow down the business development speed of the enterprise.
After resolving the issues of computing power limitations and AI tool configuration, professional AI developers and enterprise users will immediately encounter the fourth issue of permission conflicts.
The purpose of developing and deploying Agents is to embed them into their own businesses. This process, in addition to invoking various tools, also requires close cooperation with various software in the business.
Taking a sales Agent as an example, when it invokes the CRM, internal knowledge base, and external communication tools, it not only occupies local computing resources but, more troublingly, it preempts access and operation permissions from human employees.
When Agents do not collaborate with humans but instead consume each other's resources, it may actually lower the overall work efficiency of the entire team.
For enterprise users, there is also a major issue of poor security. The purpose of enterprises using Agents is to enhance their businesses or improve employee efficiency, which necessitates the use of the company's internal data.
However, the execution of Agent tasks is a black box, and the execution process is not transparent to users. It is possible for it to modify, delete, or perform other operations on the local computer file system, leaving behind junk files that bloat the system in minor cases or causing file loss or data leakage in severe cases.
Furthermore, there are inherent security risks when Agents invoke tools.
According to statistics, more than 43% of MCP service nodes have unauthenticated Shell invocation paths, and more than 83% of deployments have MCP configuration vulnerabilities; 88% of AI component deployments do not enable any form of protection mechanism.
As the use of Agents becomes more prevalent in the future, the importance of security and trust will be even more critical in the AI era than it was in the Internet era.
When truly using locally deployed Agents, enterprises will also face the issue of Agents lacking long-term memory.
Without semantic memory and contextual memory, Agents can complete one-time tasks, which will severely limit their scope of use in enterprise businesses.
When enterprise users apply Agents to their businesses, if they can endow them with long-term memory, then in addition to being able to complete multiple tasks, enterprises can also iterate on the Agents based on these memories, making them have a deeper understanding of the business or users and becoming more capable in specific tasks.
Agent Infra Arrives on the Scene
Currently, cloud vendors are racing to launch new-generation Agent Infra technology architectures.
For example, AWS has launched AgentCore (preview version), a fully managed runtime deeply customized and optimized based on the Lambda FaaS infrastructure, addressing key limitations of standard Lambda for Bedrock Agents, such as long-running execution, state recording, session isolation, etc.
Azure has introduced AI Foundry Agent Service, integrating Functions FaaS event-driven capabilities, enabling Agent Service to leverage the event-driven, scalability, and flexibility of Serverless computing to build and deploy Agents more easily.
Google Cloud has launched Vertex AI Agent Builder, which, although not officially confirmed, is widely speculated to be highly dependent on and optimized for Cloud Run (Cloud Functions 2nd Gen is already built on Cloud Run) to support long-running, concurrent, and stateful requirements.
Alibaba Cloud has introduced Function Compute Function AI, officially optimized based on the Serverless x AI runtime of FC FaaS, introducing model services, tool services, and Agent services. Developers can independently choose one or more of models, runtimes, and tools to build and deploy Agents in a modular design.
PPIO has launched China's first Agentic AI infrastructure service platform - AI Agent. The AI Agent platform product is divided into general and enterprise editions.
The general edition is supported by a distributed GPU cloud base, releasing China's first Agent sandbox compatible with the E2B interface, as well as model services that are more suitable for Agent construction.
The Agent sandbox is a cloud-based secure runtime environment specifically designed for Agents to execute tasks, supporting dynamic invocation of various tools such as Browser use, Computer use, MCP, RAG, Search, etc., endowing Agents with safe, reliable, efficient, and agile 'hands and feet'. Currently, this sandbox has integrated renowned open-source projects such as Camel AI, OpenManus, and Dify.
These technologies collectively aim for the same goal - providing Agents with a 'body' that is more elastic, has lower latency, stronger security, and longer sessions, supporting them to transition from the lab to tens of millions of enterprise scenarios.
When the loop of cognition and action is completed, the technological generational gap in Agent Infra will determine the speed and quality of enterprise AI innovation and transformation.
The evolution of the Agent development paradigm places new demands on the underlying infrastructure.
The new-generation Agent Infra from major cloud vendors focuses on technological breakthroughs such as long-running execution, session affinity, session isolation, enterprise-grade IAM and VPC, and model/framework openness, essentially to meet the common needs of three core Agent forms.
The first is the strong demand for LLM Agents to continuously invoke toolchains. LLM Agents need to continuously invoke toolchains to complete complex reasoning, which may span several minutes or even hours.
The execution duration limit of traditional Serverless (e.g., AWS Lambda's 15-minute cap) would forcibly interrupt tasks, so the new-generation Agent Infra must break through this limit and support long-running execution.
At the same time, to maintain the consistency of context across multiple rounds of dialogue, session affinity is required to ensure that the same request is routed to the same computing instance to avoid state loss.
Secondly, Workflow Agents rely on state management. Automated workflows (such as data processing pipelines) need to persistently record execution states.
The stateless nature of traditional Serverless cannot save intermediate results, while the new-generation Agent Infra ensures the atomicity and recoverability of workflows by providing stateful sessions. Session isolation ensures that tasks do not interfere with each other in multi-tenant or high-concurrency scenarios, meeting enterprise-grade security and compliance requirements.
Thirdly, Custom Agents require flexibility and ecosystem integration. Custom Agents need to integrate heterogeneous tools (APIs, domain models, databases, Code Interpreters, Browser Use, etc.), requiring the new-generation Agent Infra to support model/framework openness (such as AutoGen, LangChain, AgentScope).
A closed architecture would limit the scalability of Agent capabilities, while cloud vendors can provide plugin-based integration interfaces by decoupling the computing layer from the framework layer.
While retaining the core advantages of Serverless (fully managed, maintenance-free, lightweight, elastic, and more economical), the new-generation Agent Infra addresses the core needs of LLM Agents for continuous reasoning, Workflow Agents for complex state transitions, and Custom Agents for flexible customization through key functions (long-running execution, session affinity/session isolation) and technological breakthroughs (state persistence, cold start optimization, open integration).
This signifies a shift in Agent development from 'manually piecing together traditional components' to 'utilizing native Infra for efficient, secure, and scalable development and deployment', a brand-new technical path.
As Agent applications further accelerate, Agent Infra has become an area actively explored by model companies, cloud vendors, and startups. In addition to cloud giants, startups also have considerable opportunities in this field.
Firstly, identify links within existing Infra that have AI-native requirements. This requirement can be that Agent development places higher performance demands on certain aspects of this link, such as a Sandbox requiring faster cold start speeds and stronger isolation; this requirement can also be for better integration with AI workflows, with more AI-native features, such as adding RAG functionality or better integration with certain languages or SDKs commonly used by AI developers.
Secondly, seize new pain points in Agent development. Agent development aims to maximize ROI in terms of R&D and time investment, with a significant demand for Infra products that reduce development thresholds and engineering efforts. Therefore, an Infra that is highly usable and reasonably priced has the potential to be widely adopted. Moreover, the Agent ecosystem emphasizes co-construction, and continuous innovation in Infra is vigorously promoting the construction of such an ecosystem.
When creating an Agent becomes as straightforward as assembling LEGO bricks, and the network of Agent collaborations extends to every facet of society, the question of whether this is a trend or a bubble will become obsolete. Instead, we will embrace it as the dawn of a new era.