Google Unleashes Agents on the Frontend, Pours Resources into TPUs on the Backend: A Closer Look at Google I/O Conference

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

05/20 2026 533

Flood of Updates: How is Google Positioning Itself in Chips, Models, and Agents?

Berkshire Hathaway, once known for its skepticism about AI, has started to increase its investments in AI.

Google became one of the AI companies that Berkshire Hathaway significantly invested in during Q1. In the first quarter, Berkshire Hathaway increased its holdings of Class A shares in Google's parent company Alphabet by 36.4 million shares, a surge of approximately 204% quarter-over-quarter, with the market value of its holdings rising to $15.6 billion.

As investment firms pour hot money into Google at double the rate, Google is demonstrating with real data just how impressive its growth trajectory can be when AI truly becomes accessible to the general public.

"Two years ago, we processed a total of 9.7 trillion Tokens per month. By last year's I/O Conference, this number had grown to approximately 480 trillion; now, it has skyrocketed sevenfold to reach a monthly level of 3,200 trillion," said Google CEO Sundar Pichai at the Google I/O Conference, which kicked off at 1 a.m. Beijing time on May 20.

He then presented a series of data points showing exponential growth in user numbers and Token consumption: Google's model API now processes about 19 billion Tokens per minute, a sixfold increase from the previous quarter; the Gemini App has over 900 million monthly active users, doubling from 400 million last year, while the total number of daily user requests has increased sevenfold.

Amid skyrocketing Token consumption, Google is leveraging a barrage of new product launches to capitalize on the most promising avenue for Token monetization: agents.

At this year's I/O Conference, Google's mantra was "everything can be an agent":

For personal assistants, there's the 24/7 cloud-based agent Gemini Spark, designed to rival [a competitor]; for Vibe Coding, there's Antigravity 2.0, which supports the simultaneous operation of multiple agents; for search experiences, there are multiple agents working in tandem to help users accurately find information; the Gemini App has also introduced various agents across different dimensions to meet user needs.

With models as its technological backbone and applications driving Token consumption to double, Google has also doubled its investment in AI infrastructure.

Last year, Google Cloud invested over $30 billion in AI infrastructure. Pichai announced at the conference that this year's investment would be about six times that of last year, roughly between $180 billion and $190 billion.

Investing heavily in infrastructure is to prepare for the ever-increasing consumption, but Google's ambitions extend beyond Google Cloud. According to foreign media reports, Google and Blackstone Group will jointly create a new AI cloud company centered around Google's TPU chips, potentially signaling Google's challenge to chip companies like Nvidia.

As one of the few companies with a comprehensive layout in chips, models, and products, Google has unveiled a grand blueprint at this year's I/O Conference.

Dual Release of Video and Text Models: Cost-Effectiveness Under Scrutiny

Models and agents took center stage as the most captivating announcements at this year's I/O Conference.

True to form, Google, which likes to showcase its AI models first at the I/O Conference, unveiled two models this year: the multimodal model Gemini Omni Flash (hereinafter referred to as Omni) and the fast yet affordable Gemini 3.5 Flash.

As a model launching a new series, Omni supports the creation of any content from any input. It now allows users to input text, images, files, and videos into the model, currently only supporting video output, with plans to expand to text, audio, and other content forms later.

By simultaneously training data in different forms, Omni has shown improved performance in understanding physical laws. In a test by Guangzhui Intelligence, Gemini Omni Flash was asked to generate a video of a "white billiard ball hitting a red ball into the pocket." Compared to the performance of the previous generation Veo, Omni demonstrated progress in understanding mechanics. The white ball achieved the requirement of slowly coming to a stop after hitting the red ball, correcting the issue in the Veo version where the ball continued to fly erratically after being hit.

Omni also drew inspiration from the popular image editing model Nano Banana, being trained as a video editing model that supports modifications based on textual descriptions. During the on-site demonstration, Google showcased cases such as "turning a sculpture into bubbles" and "making fish swim in a direction according to a drawn path on an image."

The Gemini 3.5 Flash model, which was exposed a few days ago, also became a mainstay of the I/O Conference as expected.

In Google's view, the new model is "fast and affordable." While surpassing the Gemini 3.1 Pro in certain model performance aspects (such as Agentic capabilities and multimodality), it seems to lay the groundwork for various agents.

In terms of Token output speed per second, Gemini 3.5 Flash is four times faster than some overseas cutting-edge models. The test results were very intuitive (intuitive): writing a simple game code was almost as fast as "spraying" it out. Pichai also mentioned on-site that with the integration of its own programming product Antigravity, the overall speed could be 12 times faster.

Admittedly, from an absolute performance perspective, the Flash model cannot compete with the latest flagship models from other companies. However, according to Google, its cost is half that of cutting-edge models.

Taking Claude Sonnet 4.6 as an example, with a price of $3 per million Tokens for input and $15 for output, the latter offers a significant price advantage at $1.5 (50%) per million for input and $9 (66.7%) for output.

Pichai did the math on-site, stating that many companies have an annual Token budget exceeding 1 trillion. If they migrate their load to Gemini 3.5 Flash, they could save $1 billion annually.

Although Google sang high praises for its new model, for users who have been paying attention to the pricing of Google's Flash series, no matter how good Gemini 3.5 Flash is, its price positioning exceeds the "affordable" expectations of the Flash series.

Compared to the previous generation Gemini 3 Flash, which was priced at $0.5 per million Tokens for input and $3 for output, the new Flash model's pricing has tripled. Moreover, according to the current AA rankings, Gemini 3.5 Flash's comprehensive score is lower than that of Gemini 3.1 Pro. From the above benchmark results, it can also be seen that its performance in the HLM (Human Last Mile, reflecting the model's ability to handle complex tasks) is inferior to the latter, leaving the new model in an awkward position of "not as good as the flagship model but with doubled pricing."

To test which model, Gemini 3.5 Flash or Gemini 3.1 Pro, is better for programming, Guangzhui Intelligence conducted a test by asking both models to create a simple Magic Tower game. Both versions of the code encountered issues where certain map routes were blocked, making the game unplayable. However, when asked to modify them, Gemini 3.1 Pro directly fixed the map route planning issue, while the Flash version failed to modify through dialogue. Google's claim of "comprehensive performance transcend (surpassing)" may need to be questioned.

In addition to optimizing model capabilities, Google also attempted to further enhance user experience through visualization. For example, when we tested Gemini 3.5 Flash, Gemini selectively presented some dialogues in a visual format using AI programming. However, during the test, we waited for over 5 minutes without seeing the final visual effect generated, and the prolonged waiting time actually reduced the overall experience.

As for how the new flagship model performs, the answer will be revealed next month with the release of Gemini 3.5 Pro.

Full Lineup of New Agents: Google's AI Subscription Business

"We have entered a new era of agents," began a Google product manager, introducing the absolute protagonist of this year's I/O Conference: agents, which dominated most of the presentation time.

Compared to the two models, which only released Flash versions for users to try out, Google's main focus was on agents. Throughout the conference, agents appeared in almost every introduction segment. The underlying models for these products are precisely the Gemini Flash 3.5 models mentioned above.

To be honest, with Claude having Cowork and domestic companies also following suit with agents, Google is not the earliest entrant. However, with a series of products under its belt, Google indeed needs more time to consider how to integrate agents into each product.

The agent most directly comparable to [a competitor] is Gemini Spark, a cloud-based agent running on Google Cloud.

It can be said that Google is also cautious in its approach to mass consumers, making attempts while prioritizing safety. For example, when linking with Google products, it is turned off by default, and users can choose to manually enable it.

From the current perspective, Spark's killer feature is its seamless integration with Google products such as Gmail, Calendar, Drive, Docs, Sheets, Slides, YouTube, and Maps. Through its vast application ecosystem, it can achieve high-privilege advantages similar to [a competitor's] model.

In Google's on-site demonstration, Spark could help draft emails, extract table information and organize it into documents, and set up schedule reminders based on the extracted content. All these functions are closely related to Google's ecosystem. The functional demonstrations appeared relatively simple and basic (basic), reflecting Google's ongoing exploration in the agent field. Google also mentioned that it would first integrate internal capabilities and then update MCP in the coming weeks to incorporate third-party functional linkages.

However, we cannot directly experience the Spark product yet. Google disclosed that starting this week, Gemini Spark will be available to a small number of users and will later be accessible to Google AI Ultra users and some enterprise users. It will soon appear in email and the Gemini App and may also be available in the browser this summer.

Regarding Google's flagship AI search transformation, the integration of agent capabilities has also become a highlight of AI search optimization.

It supports the activation of multiple agents to help users search for information, such as creating an agent to collect stock information in a specific financial field or enabling AI to track rental information updates in real-time. This service will be available to AI Pro and Ultra subscription users this summer.

Agent programming capabilities have also been integrated into AI search.

Users can directly construct visual effects using search results. For example, when users search for complex terms, AI can directly create visual animations, essentially providing an animated demonstration, allowing users to understand not only through text but also through hands-on experience. This is a free service expected to be available to users this summer.

In terms of shopping, the agents introduced by Google mainly serve to optimize the experience and standardize the payment ecosystem.

Take Universal Cart as an example; it not only provides price comparisons and discount information but also conducts cross-platform price comparisons and offers shopping suggestions to users. An example that left a deep impression on the author is that when a user wants to assemble a computer, the shopping cart function can proactively identify hardware compatibility issues, such as a mismatched CPU and motherboard or insufficient power supply, thereby providing shopping guidance to users.

In optimizing Gemini products, certain agents with specific functions have become complimentary additions to enhance user experience.

For example, Daily Brief helps organize daily news and to-do lists; the multimodal creation agent in Google Flow supports multiple creation tasks simultaneously, such as generating 16 different shooting effect videos for a single image.

By aggressively promoting agents, Google aims to enrich its paid services on the consumer side by integrating agent capabilities, encouraging more users to pay for agents.

At the conference, Google announced the opening of a new $100 monthly tier for its AI subscription's Ultra service, where the agent capabilities mentioned above will be available at this lower-priced tier. Additionally, Google's AI programming product Antigravity has also been upgraded to support multi-agent tasks and will be integrated into the AI Ultra subscription.

Following OpenAI's recent announcement of halving its highest-tier subscription price from $200 to $100 per month, Google has also made a move. At the conference, Google announced that it would reduce its monthly subscription price by $50 from $250.

Google's agent blitz and pricing adjustments, from a consumer perspective, are all preparations for the increasingly competitive subscription business.

AI Infrastructure Investment Multiplies by 6: Google's AI Ecosystem Closes the Loop

The release of two models and a series of agent updates are like vibrant flowers, but the soil they are rooted in—AI infrastructure—although only briefly mentioned, is becoming a key business for Google to tell a coherent physical AI story and boost its market value.

Google's fervor for AI infrastructure investment is directly reflected in its financial commitment. With an investment more than six times that of last year, exceeding $18 billion, although Google has the financial resources, this remains a significant investment for the company.

"We have been investing for the present and the future," said Pichai.

"Two eighth-generation chips, 8t and 8i, previously announced at the Google Cloud conference, are designed for large model training and inference, respectively. The former boosts computing performance per unit by threefold, enabling Google's model training speed to 'shift from days to weeks' through the coordinated use of over 1 million globally distributed chips. The latter enhances model inference speed—during the live demonstration, the Gemini's Flash model achieved an output speed of nearly 1,500 Tokens per second."

"While benefiting its own operations, Google's TPU is also becoming a sought-after commodity among major AI companies in Silicon Valley."

"First, in February this year, Meta halted its in-house chip development efforts and began collaborating with AMD and Google. According to The Information, Meta and Google reached a multi-billion-dollar deal for leasing TPUs, with Meta also in talks to purchase TPU services next year. Anthropic has also signed a long-term agreement with Google and Broadcom, committing to procure approximately 5 gigawatts of TPU computing power for model training over the next five years."

"Beyond establishing a vast chip sales business, Google is further expanding its AI cloud services through TPUs."

"In May, Blackstone was reported to have partnered with Google, investing $5 billion in a joint venture, with Blackstone contributing $5 billion and Google providing hardware and software services, including TPU-specific chips. The new company plans to achieve 500 megawatts of computing power by 2027."

"At this point, Google's chip-cloud service-model-product ecosystem has become increasingly complete. As AI drives exponential growth in Token consumption, improvements at each layer of the business enable Google to secure impressive revenue in the AI sector."

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links