Will OpenAI Overshadow AI Agent Startups?

07/21 2025 484

"Listen—that's the sound of countless startups vanishing into thin air."

Author | Xue Xingxing

Editor | Zhang Wen

Cover | 2001: A Space Odyssey

Similar to its text-to-image update in March, OpenAI has once again attempted to prematurely conclude the AI Agent startup race.

On the morning of July 18, Beijing time, OpenAI released ChatGPT Agent. This AI can autonomously plan execution steps, invoke various tools, and complete multi-step tasks ranging from data scraping to table generation, itinerary planning to hotel bookings, all based on user instructions.

OpenAI Tweet Screenshot

This is also the direction currently being explored by most AI Agent startup projects. What Manus, in its promotional video four months ago, claimed to be the first general AI Agent, ChatGPT Agent has already achieved.

OpenAI founder Sam Altman said this was the first time he "truly felt AGI (Artificial General Intelligence)." OpenAI researchers stated that ChatGPT Agent is the most powerful AI Agent model to date.

——Indeed, OpenAI refers to ChatGPT Agent as a model, not a product. Unlike systems like Manus that rely on context management and tool chain orchestration, OpenAI has trained a dedicated model capable of task planning, cross-tool invocation, and document generation within a single system. The model is currently part of the o3 series but has not been given a separate name.

Entrepreneurs in the AI era face technological iterations faster than any other historical period, and a single underlying model update can often decimate an innovative product in a vertical field.

Li Xiang, founder of Ideal Auto, previously said on social media that at the consumer level, companies like OpenAI that possess the strongest base models will not leave room for vertical application startups. "The essence of software is functionality, requiring scenarization and verticalization. The essence of AI is capability, and strong capability can consume everything, making it the most convenient for users."

Even Zhu Xiaohu, an advocate for AI application innovation, said on social media that large models will consume 90% of Agents. Users on the X platform also asked how other entrepreneurs could compete with OpenAI if it subsequently opens the API for the ChatGPT Agent model.

"Listen—that's the sound of countless startups vanishing into thin air."

wrote a highly upvoted comment under the OpenAI launch video.

Manuses choose to confront head-on

At least for now, Manus has not shown any signs of concession.

Immediately after the OpenAI launch, Manus retweeted on X, saying, "Welcome to the game." Flowith, another Chinese AI Agent startup, also retweeted, emphasizing that they had launched an AI Agent product a year ago.

As the startup that first publicly proclaimed the general AI Agent slogan in the past six months, Manus's reaction was much stronger than other companies. Just three hours after the launch event, Manus released ten comparative tests with ChatGPT Agent, declaring that they would compete head-on with OpenAI.

These comparisons partly came from the demonstration clips shown by OpenAI that day and partly from real user usage on social platforms. Scenarios included data organization, route planning, online shopping, financial analysis, restaurant reservations, etc. The test results released by Manus were almost entirely superior—not only faster in response but also emphasizing "task completion," such as neater tables, richer illustrations, and more polished PPTs.

Comparison video released by Manus with ChatGPT Agent

For example, in OpenAI's demonstration of "planning a three-day tennis trip to Palm Springs," OpenAI provided a simple itinerary, while Manus generated a travel poster with a destination-themed design.

Manus-released test comparisons

Another example is analyzing San Francisco's financial reports for the past four years, where OpenAI output an Excel file, while Manus provided a complete presentation document with charts and key points summarized. "Manus completes the entire project, not just providing data," Manus commented.

Another Chinese company, Genspark, also responded prominently. Founder Eric Jing wrote on X, "I never thought there would come a day when, as a small company with only 24 people, we could be ahead...ahead of OpenAI." He stated that with the same prompts, Genspark's response time was shorter, costs were lower, and the quality of generated results was "several times higher".

On July 19, Genspark also shared nine comparison examples with ChatGPT Agent on social platforms, showing that their output documents had richer data dimensions and more aesthetically pleasing layouts. In addition to cases similar to those in the Manus comparison test, such as travel itinerary planning and financial data analysis, they also shared a comparison of video generation capabilities, pointing out that ChatGPT Agent failed to complete the task.

Video generation case shared by Genspark

User feedback on social media was not as intense as it was when OpenAI previously updated its text-to-image functionality. Some critics pointed out that ChatGPT Agent's task completion rate was low, and task generation speed was slow, with some complex tasks taking 20 minutes or longer to complete.

OpenAI also seems aware of the current speed issue with ChatGPT Agent, as several promotional videos they filmed showed employees closing their laptops after giving instructions and returning later to check the results.

"Even if it takes 15 minutes or half an hour, it's still a significant speed-up compared to doing it manually yourself," said OpenAI researcher Isa Fulford. She said this is a usage scenario where "you can initiate tasks in the background and come back later to check the results," while OpenAI's search team focuses more on low-latency scenarios.

While OpenAI may emphasize the duration of continuous reasoning and thinking the model can engage in, OpenAI researcher Zhang Xichen said that ChatGPT Agent achieved a maximum continuous reasoning time of 2 hours in internal testing, "We should have a leaderboard to record how long the model can continuously think."

In response to criticism that the generated documents or PPTs were not aesthetically pleasing, OpenAI researchers suggested on X that users first let ChatGPT Agent complete the research work and then have it output a PPT file. ChatGPT generates standard pptx format, and users can also apply their desired design templates uniformly in PowerPoint.

Although OpenAI emphasizes that they specifically trained a dedicated model for ChatGPT Agent, some critics accuse it of being more like a combination of the previously launched Operator (browser interaction capability) and Deep Research (in-depth research capability). Operator enables ChatGPT to interact directly with websites via the browser, read and understand web content, while Deep Research excels at analyzing and summarizing information.

In fact, the current team members of ChatGPT Agent are from the previous Operator and Deep Research departments, with a team size of approximately 20-35 people. OpenAI stated that ChatGPT Agent is a natural continuation of Operator and Deep Research functionalities, "We found that many queries users attempted through Operator were actually more suitable for Deep Research, so we combined the advantages of both."

OpenAI said that this launch only marks the first step in directly integrating agent functionality into ChatGPT, and they plan to regularly and gradually update more functionalities.

Two Technical Routes

Compared to the continuous engineering iterations and prompt optimizations around output quality and delivery experience that startups have undertaken over the past six months, the ChatGPT Agent recently released by OpenAI can be described as rough in terms of the final presentation of tasks.

Startups are attempting to present users with an Agent product that is more complete and easier to use. Taking Manus as an example, in the past two months, the company has added various capabilities such as PPT generation, video generation, and audio generation to its product. The official website also lists many ready-made templates and user case studies. Even though the realization of these capabilities relies on external models, startups have done a better job than OpenAI in terms of ease of use.

Templates shared on Manus's official website

However, aside from these application experience innovations, in terms of comparing the capabilities of the underlying models, ChatGPT Agent clearly has an advantage through its end-to-end trained unified model. OpenAI conducted numerous academic tests for ChatGPT Agent, with some test results even surpassing OpenAI o3 or GPT-4o, reaching the industry's highest level.

For example, in the Humanity's Last Exam evaluation, ChatGPT Agent achieved a new high of 41.6% (pass@1), approximately twice that of OpenAI o3. In the DSBench test, ChatGPT Agent significantly outperformed GPT-4o, and its performance in data analysis tasks was significantly better than human-level performance.

Humanity's Last Exam test results

On the SpreadsheetBench platform, which specifically measures spreadsheet editing capabilities, ChatGPT Agent set a new industry high, with performance double that of GPT-4o. OpenAI claims that in their internal benchmarks, ChatGPT Agent's capabilities roughly equate to those of an investment banking analyst with 1 to 3 years of experience.

In summary, OpenAI emphasizes the improvement in underlying model capabilities brought by ChatGPT Agent, while startups, limited by technology and funding, tend to focus more on application innovation.

In the early hours of July 19, Manus co-founder Ji Yichao posted that Manus would continue to bet on in-context learning rather than end-to-end agents.

He said that early in the Manus project, they were contemplating whether to train an end-to-end agent using open-source models or build an agent based on the context learning capabilities of cutting-edge models. The emergence of models like GPT-3 made them realize that in-context engineering was the right direction, as these models' capabilities far surpassed their previous internal models.

"If model progress is rising tides, we hope Manus will be the boat, not the pillar fixed to the seabed," said Ji Yichao. This allows them to deliver improvements in hours rather than weeks and always keeps their free product orthogonal to the underlying model.

He shares a lot of Manus' experience in context engineering in this technical document, such as the need to design around KV caching and the use of system files as context. These engineering innovations have significantly improved Manus' response speed and cost advantage.

Ji Yichao gave an example, using KV caching can greatly reduce the generation time and inference cost of the first token. For instance, when using Claude Sonnet, the cost of cached input tokens is 10 times lower than that of uncached tokens.

Technical document shared by Ji Yichao

Innovations in context engineering can indeed improve the performance of agents. The non-profit AI research institution Epoch AI tested the performance of ChatGPT Agent on the FrontierMath math test set and reported that ChatGPT Agent only achieved a 27% accuracy rate on Tier 1-3 math problems, with lower scores on more difficult problems.

However, when ChatGPT Agent was allowed to attempt each problem 16 times, its score increased significantly from 27% to 49%. Epoch AI stated that this indicates that better prompt design or task structure support may significantly enhance the performance of current models.

Epoch AI Test Results

"How you shape the context ultimately determines how your agent behaves: its speed, the effectiveness of its recovery, and the scope of its expansion," said Ji Yichao.

Coexisting with the Future of Agents

The official launch of ChatGPT Agent marks the entry of AI agents into an era of competition among giants. Its impact on society will not be any less significant than the initial impact of the explosion of large models, making it a reality that AI is taking over human jobs.

This change is already happening quietly. Tech giants such as Microsoft and Amazon are undergoing intensive layoffs. Microsoft CEO Satya Nadella stated earlier this year that 20% to 30% of Microsoft's code is generated by AI. A financial technology company, Klarna, announced as early as last year that their AI agent handled two-thirds of the company's customer service chat work within just one month of deployment, equivalent to the workload of 700 full-time human customer service representatives.

Market research firm MarketsandMarkets predicts that the global AI agent market will grow from $5.1 billion in 2024 to $47.1 billion by 2030, with a compound annual growth rate (CAGR) of 44.8%. Deloitte forecasts that by 2025, 25% of companies using generative AI will begin piloting agents, and this will increase to 50% by 2027.

The swift adoption of AI agents has sparked concerns among industry experts. Unlike previous large models that merely dispensed information, these agents now possess full-fledged capabilities for thinking and acting. For instance, the ChatGPT Agent can navigate websites to assist users in placing orders, autofill credit card details, and access private information such as calendars, emails, and cloud storage. For users of these AI agents, this translates to entrusting their sensitive data to a "black box," leaving them more susceptible to potential attacks.

During the launch event, OpenAI highlighted the risks associated with the ChatGPT Agent. They emphasized that the agent would always secure user consent before executing any critical actions, "granting users ultimate control." Furthermore, OpenAI has integrated safety measures, including active supervision (Watch Mode) and proactive risk mitigation strategies.

OpenAI's Statement

Sam Altman issued a lengthy tweet cautioning users to exercise caution when using the ChatGPT Agent following its launch.

"Agents mark a new pinnacle in AI system capabilities, capable of executing remarkable and intricate tasks on their own. They merge the principles of Deep Research and Operator but surpass these literal descriptions significantly—they can ponder for extended periods, utilize various tools, continue thinking, and then act, repeating this cycle," said Sam Altman.

Sam noted that while the precise impacts are uncertain, there is a risk that malicious actors may attempt to "deceive" users' AI agents, leading them to disclose private information and perform unpredictable, inappropriate actions. "We advise users to grant agents only the minimum access necessary to complete tasks, thereby mitigating privacy and security risks," Sam emphasized, adding that he personally would not utilize the ChatGPT Agent for high-risk users or scenarios involving substantial personal information.

However, for OpenAI, which has transformed into a commercially successful enterprise, privacy or security risks will not impede the rapid iteration of AI agents.

Prior to the ChatGPT Agent's launch, the Financial Times reported that OpenAI was planning to integrate a payment and checkout system within ChatGPT, requiring merchants to pay commissions to OpenAI for orders completed through the platform. The Financial Times stated that OpenAI had already demonstrated early versions of this system to partner e-commerce platforms like Shopify.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.