07/28 2025
482
As global tech titans vie for a stake in embodied intelligence—a burgeoning trend in AI—SenseTime, a cornerstone enterprise in China's AI landscape, has sounded the charge for a comprehensive advance.
Once renowned for its pioneering computer vision technology and leading the 'Four Little Dragons of AI,' this tech powerhouse is now mounting a decisive counterattack through the strategic fusion of 'large models + robots,' having weathered the transformational turbulence of the large model era.
Image source: pixabay
From intensive capital maneuvers to the assembly of top talent, from the reimagining of technical pathways to the forging of ecological alliances, SenseTime's embodied intelligence strategy transcends mere business expansion; it is a battle for survival and transformation.
Why is SenseTime all in on embodied intelligence, a strategic pivot at the cusp of a trend?
The embodied intelligence race is intensifying, with Ant Group directly establishing 'Ant Lingbo Technology,' Meituan leading investments in Tashi Zhihang and Xinghai Map, and JD.com continually investing in Qianxun Intelligence and Zhongqing Robotics, among others.
The overseas battlefield is equally fierce, with Google's RT-2 model, Figure AI's Helix system, and NVIDIA's World Model vying for dominance in physical world interaction.
SenseTime, alongside Megvii, CloudWalk, and Yitu, was once a benchmark in China's AI industry, known as the 'Four Little Dragons of AI.' With its leading computer vision technology, it achieved remarkable success in fields like security and smart cities. Upon listing on the Hong Kong stock market in 2021, its market value soared to over HK$150 billion on the first day.
However, entering the large model era, this cohort of AI enterprises strong in visual technology collectively encountered development bottlenecks. SenseTime's 2024 financial report revealed annual revenue of RMB 3.772 billion but a net loss of RMB 4.307 billion, with the loss exceeding total revenue.
Similarly dire results were seen at CloudWalk Technology, whose 2024 revenue declined by 36.7% year-on-year, and net loss expanded to RMB 696 million; Megvii and Yitu also faced business contractions, with the latter closing offices in multiple cities and nearly halting its medical sector.
Particularly under the wave of large models, companies like OpenAI, DarkSide of the Moon, and DeepMind surged to prominence with language large models, while the technological mainstay of the 'Four Little Dragons' remained focused on computer vision, with core revenues heavily reliant on government projects in security and transportation, accounting for over 70% on average.
Clearly, the backdrop of SenseTime's strategic transformation is a survival-driven fight to the finish.
From another perspective, SenseTime's foray into this field is also a long-planned 'genetic extension.' Its core team has been preliminarily established, drawing members from its original intelligent driving business, computer vision experts, and senior practitioners in robotics.
This talent flow also underscores an industry commonality. Autonomous driving and embodied intelligence are deeply interconnected in underlying technologies such as environmental perception and real-time modeling. After all, 'a car is just a robot with four wheels,' and intelligent driving algorithms and simulation platforms can, to a certain extent, be directly applied to robot development.
Moreover, embodied intelligence (Embodied AI) is viewed as a pivotal breakthrough for the 'grounding' of AI technology, with its core lying in realizing the closed-loop interaction of 'perception-understanding-decision-making-execution' through physical entities like robots.
This concept was first mentioned in the 2025 government work report as a future industry, immediately sparking a capital boom. In the first half of the year alone, domestic financing in this field exceeded RMB 20 billion, involving 130 financing events, far surpassing the total for the entire year of 2024.
Industry predictions generally align with Elon Musk's vision, foreseeing humanoid robots becoming the mainstay of industry in the future, with their numbers potentially surpassing humans, reaching 10 billion to 20 billion units, forming a 'new terminal market comparable to mobile phones.'
SenseTime chose this juncture to enter the market, aiming to leverage the combined path of 'large models + robots' to transform its expertise in visual recognition, multi-modal perception, and large model training into a new growth engine.
SenseTime, now deeply entrenched, has its own 'embodied intelligence' equation.
From visual recognition that 'understands the world' to multi-modal large models that 'think about the world,' and soon to embodied intelligence systems that 'actively transform the world,' SenseTime's foray into embodied intelligence is no whim but a gradual transition grounded in its technological accumulations.
The team led by Wang Xiaogang, co-founder of SenseTime, developed the 'Absolute Shadow Enlightenment' system in the field of intelligent driving, already capable of understanding physical laws and learning traffic rules. Since both cars and robots are essentially embodied intelligent entities, this paves the way for technology transfer.
Furthermore, SenseTime has adopted a pragmatic strategy of phased evolution in its technological roadmap. In August 2022, SenseTime launched the 'Yuanluobo' household chess-playing robot, its first consumer-grade AI product, deeply integrating visual algorithms with robotic arms, achieving precise chess piece recognition and grasping in occluded environments, initially constructing a closed-loop framework of 'vision-perception-decision-making.'
While this product serves a single function, it marks the beginning of SenseTime's attempt to break through the 'open-loop' limitations of traditional AI—transitioning from 'thinking' about the world from the cloud to truly interacting with the physical world.
In April 2025, SenseTime released the 'SenseNova V6' multi-modal large model, employing a Mixture of Experts (MoE) architecture with 600 billion parameters, achieving comprehensive enhancements in 'long thinking chains × mathematical abilities × reasoning abilities × global memory,' with a particular focus on strengthening multi-modal deep reasoning capabilities.
Moreover, this model has been integrated into the humanoid robot 'Feiyan,' enabling it to possess panoramic vision perception, emotional interaction, and mental health screening functions, while also allowing for more natural thinking and expression.
Furthermore, SenseTime's upcoming embodied intelligence 'brain' platform represents a new pinnacle in its technological integration. According to currently disclosed information, this platform aims to integrate advanced perception, visual navigation, and multi-modal interaction capabilities, providing robust empowerment for robots and various intelligent terminals.
Notably, SenseTime's transformation strategy exhibits distinct 'trinity' characteristics. At the capital level, it conducts two-way financing through new share placements and business spin-offs; at the technical level, it relies on the large-scale computing power platform and the SenseNova large model to build foundational capabilities; and at the ecological level, it swiftly establishes industrial alliances through strategic cooperation, investments, and mergers and acquisitions.
This comprehensive advancement strategy not only underscores SenseTime's determination to transform but also hints at the time pressure and competitive landscape it faces. Nowadays, the embodied intelligence race has entered its second development stage, with various giants entering the fray. SenseTime must seize the dividends of this robotics wave or risk missing the opportunity to turn the tide.
With giants converging on embodied intelligence, what are SenseTime's chances of success?
Currently, while the embodied intelligence race boasts broad prospects, it has become a fiercely competitive red ocean, with tech giants and startups vying on the same stage. SenseTime's entry faces challenges from multi-dimensional competitors at home and abroad, each with their strengths in technological routes, capital strength, and ecosystem construction.
Globally, OpenAI collaborates with robotics company Figure AI to develop general-purpose robots, Google launches the embodied intelligence RT-2 model, and NVIDIA focuses on world models and simulation technologies.
In the domestic market, Huawei released the CloudRobo embodied intelligence platform with a 'brain' in June 2025; ByteDance's Seed team launched the general-purpose robot model GR-3 on July 22; and the Institute for AI Industry Research earlier unveiled the cross-ontology embodied cerebrum-cerebellum collaboration framework RoboOS and the open-source embodied brain RoboBrain.
Unitree R1 by Unitree Technology (Image source: Caixin Global)
Meanwhile, internet giants are also ramping up investments. JD.com led investments in three robotics enterprises; Meituan consecutively led multiple project financings related to robots, and so on.
In comparison, SenseTime's core advantages lie in its years of accumulation in the field of computer vision, its early layout in multi-modal large models, and its robust computing infrastructure. Visual information comprises over 80% of human perception, and SenseTime has consistently been at the forefront of machine vision technology, with deep technical reserves in image recognition, video analysis, and environmental understanding.
Additionally, SenseTime's 'SenseNova' large model series leads domestically in multi-modal fusion, with the V6 version achieving capabilities such as the longest 64K thinking chain, 10-minute long video understanding, and deep reasoning, providing a solid foundation for embodied intelligence's cognitive decision-making.
Moreover, with a computing power scale of 23,000 PetaFlops, SenseTime can support large-scale simulation training and complex model iterations, an infrastructure advantage that is difficult to surpass in the short term.
Disadvantages include a lack of hardware experience, cash flow pressure, and loss dilemmas. Compared to enterprises like Tesla and Huawei with mature hardware supply chains, SenseTime is virtually starting from scratch in robot body design, motion control, and hardware integration.
While collaboration with companies like Fourier and Songying can partially mitigate this shortcoming, cultivating core hardware capabilities still requires long-term investment. In the field of embodied intelligence, which necessitates long-term investment, balancing R&D investment and profit expectations will be a significant test for SenseTime.
The uncertainty of technological routes is another pressure SenseTime must confront. Currently, there is no unified technical standard in the field of embodied intelligence, with the VLA model, the 'cerebrum-cerebellum' architecture, and the world model developing in parallel, each with its merits and drawbacks.
Furthermore, the Scaling Law of embodied intelligence differs from that of language models. As parameters increase and data volumes expand, the marginal cost of system performance improvement may be higher. SenseTime needs to accurately grasp the direction of technological evolution to avoid resource misallocation.
Conclusion
SenseTime's layout in embodied intelligence is essentially an ultimate transition of its computer vision hegemony from 'understanding the world' to 'transforming the world.'
Facing the collective dilemma of the Four Little Dragons of AI—technological disconnect in the era of large models and reliance on government projects—SenseTime has chosen to launch a life-or-death breakthrough with the fusion of 'large models + robots.' The outcome of this battle will not only impact the enterprise's survival but will also reshape China's position in the global embodied intelligence competition.
Source: Hong Kong Stock Research Institute