Ideal i8's Secret Weapon: Unveiling the Data, Engineering, and Organizational Masterplan Behind VLA's Premiere

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

07/30 2025 457

Over the past half-decade, Ideal Auto has persistently pursued advancements in intelligent driving. From initially relying on high-precision maps, it transitioned to a "light map" strategy and ultimately aimed for map-free technology. Despite several shifts in its technological roadmap, Ideal Auto has yet to ascend to the industry's elite tier.

On the evening of July 29, the Ideal i8 was officially unveiled, priced between 321,800 yuan and 369,800 yuan. This marks Ideal Auto's debut in the pure electric SUV segment. Unlike its early days in the extended-range race, technological advancements in the pure electric field are now sophisticated, making it challenging for Ideal Auto to uncover untouched territories. This time, it targets areas where users still experience pain points: energy replenishment efficiency and intelligent driving.

With the launch of the i8, Ideal Auto premiered VLA (Vision - Language - Action) ahead of its competitors. Prior to the i8's unveiling, we engaged in an insightful conversation with several colleagues and Ideal Auto's autonomous driving R&D leaders.

Written by Cao Lin and Mao Shiyang

Original content by AutoPixel (ID: autopix)

Dialogue Guests:

Dr. Lang Xianpeng, Senior Vice President of Ideal Auto's Autonomous Driving Research and Development;

Zhan Kun, Senior Algorithm Expert of Ideal Auto's Autonomous Driving;

Zhan Yifei, Senior Algorithm Expert of Ideal Auto's Autonomous Driving.

01. Aligning VLA with the i8's Product Positioning

How do you anticipate VLA enhancing the appeal of the i8?

Lang Xianpeng: Firstly, the i8 is an exceptional vehicle. Secondly, we have invested considerable effort into the entire engineering deployment of the VLA model and the debugging of related software and hardware. It is safe to say that in terms of driving comfort, it surpasses any Ideal Auto car you've previously experienced.

We also hope that VLA will become a crucial factor influencing users' purchase decisions for the i8. Additionally, we aim to first provide a remarkable experience upgrade for existing users who have previously used Ideal Auto's intelligent driving features. Secondly, we aspire to create a positive first impression and a sense of novelty towards intelligent driving among users who have never utilized assisted or autonomous driving.

Do you genuinely believe that pure electric vehicle users will prioritize intelligent driving?

Lang Xianpeng: A year ago, people might have questioned this when asked about new car buyers' requirements for intelligent driving. However, I now believe, particularly for new car buyers, intelligent driving is undoubtedly among their top priorities. Last year's McKinsey survey ranked it as the first or second most important factor in car purchases. Our marketing team's research confirms this as a top-three priority.

Among efficiency, comfort, and safety, which indicator does Ideal Auto's VLA prioritize at this stage?

Lang Xianpeng: One key indicator is MPA, which represents the mileage at which an accident occurs. Ideal Auto owners' human driving data indicates an accident every 600,000 kilometers, whereas with assisted driving features, accidents occur every 3.5 to 4 million kilometers. We aim to continually improve this mileage data. Our goal is to increase MPA to ten times that of human driving, signifying ten times the safety. We aspire to achieve an accident rate of once every 6 million kilometers under assisted driving, but this can only be attained after enhancing the VLA model.

However, the industry often mentions MPI, emphasizing fewer manual takeovers, which is a more explicit metric for evaluating intelligent driving's technical proficiency.

Lang Xianpeng: We have analyzed this as well. Some safety risks might necessitate takeovers, but so can poor comfort, such as sudden or harsh braking. It's not always about encountering safety risks; poor driving comfort also dissuades users from using assisted driving features.

Efficiency follows safety and comfort. For instance, if you take the wrong route, although it compromises efficiency, we won't immediately correct it through dangerous maneuvers. We must pursue efficiency while ensuring safety and comfort.

During an earlier test ride in the i8, we encountered a scenario on a narrow two-way road with a tricycle on our right. We instructed the test car to change lanes to the left, requiring it to cross into the opposite lane. However, VLA didn't execute this. I heard the commentator mention that previous versions could, but now they can't. Why is that?

Zhan Kun: VLA is designed to be a reliable family driver. Regardless of the situation, we align it with our values of safety, comfort, and peace of mind. Therefore, in this scenario, crossing into the opposite lane is not recommended. However, technically, fine-tuning such a version is feasible, but we still prioritize providing a safer and more reassuring driving experience. If opportunities arise later, we will explore better styles or approaches.

▍Ideal i8

What is the fundamental difference between VLA and the end-to-end large models previously discussed?

Lang Xianpeng: We believe that the VLA model can achieve higher levels of autonomous driving, but it is currently in its nascent stage. In this technological cycle, the infant VLA model is roughly equivalent to the upper limit of end-to-end models and still has a long way to go.

However, I believe this process will not be sluggish. It took end-to-end models approximately a year to progress from 10MPI to 100MPI. I anticipate that the iteration speed of VLA will also be swift. Perhaps by next year, when we reconvene, it will have iterated to 1000MPI. Last year, I discussed this with everyone, and many doubted it was possible, but we achieved it.

Why did you wait until the i8's delivery to release VLA? Many competitors are also vying to be first.

Lang Xianpeng: We will undoubtedly be ahead of our competitors. We will be the first.

02. Turning the Tide

Can you explain in simpler terms the challenges automakers face in creating a VLA model?

Lang Xianpeng: Many have inquired whether automakers can skip previous rule-based algorithms and the end-to-end stage to develop a VLA model. I believe this isn't feasible. While VLA's data and algorithms may differ from the past, they still build upon previous foundations. Without a comprehensive data loop collected through real vehicles, there would be no data to train the world model.

Ideal Auto can implement the VLA model because we possess 1.2 billion pieces of data. Only by thoroughly understanding these data can we better generate new data. Without this data foundation, we couldn't train the world model in the first place, nor would we know what kind of data to generate.

When did you first realize the importance of data?

Lang Xianpeng: Five years ago, Ideal Auto entered the self-developed autonomous driving race as a latecomer, but our thinking about autonomous driving didn't commence in 2020. When I first joined Ideal Auto, Li Xiang interviewed me and asked what I deemed most important, such as succeeding in autonomous driving or becoming the best.

I said that currently, it's the data. Although other factors are crucial, data must be prepared in advance. We started building a data loop with the Ideal ONE. In 2020, we accumulated about 15 million pieces of valid returned data during our first full year of deliveries. We indeed annotated a lot of data, and the samples accumulated from this.

Ideal Auto has traditionally performed modestly in the field of self-developed intelligent driving. Why were you able to turn the tide in just one year?

Lang Xianpeng: It's actually standing on the shoulders of giants. Looking back, the entire industry took about ten years to transition from rule-based algorithms to end-to-end models. However, the iteration from end-to-end models will be rapid because the entire engineering and data infrastructure will be mature by then. For VLA, I believe the pace will be similar. Currently, VLA might not seem to offer much more than a slightly enhanced experience over end-to-end models. But when you see a 1000MPI product next year, I believe everyone will realize that autonomous driving is truly arriving.

Many automakers are researching VLA. Although Ideal Auto secured the first release, are you concerned that other automakers will overtake you using their latecomer advantage, similar to how Ideal Auto did in the past year?

Lang Xianpeng: Since the advent of end-to-end models last year, the industry and our competitors have truly taken Ideal Auto's autonomous driving seriously, but it's too late for them. Building these capabilities doesn't happen overnight or yield the same results as ours. This year, when we initiated work on VLA, we were the first to propose it and will be the first to deliver it. Many are still just discussing it and using end-to-end methods to develop VLA.

If you continue to develop so-called VLA using the end-to-end approach, your speed will undoubtedly slow down, regardless of whether it's 10 million, 20 million, or even 100 million clips. Firstly, with such a vast amount of parameters, you need significant training computing power. Let's not even discuss the model's size. Additionally, your iteration speed will decrease.

Ideal Auto only conducted 20,000 kilometers of real-vehicle testing this year. What is the rationale for significantly reducing real-vehicle testing? Ideal Auto's car ownership is relatively high among new-energy vehicle companies. Why abandon this advantage?

Lang Xianpeng: Over 90% of the testing in the current super version and the VLA version of the Ideal i8 is simulation testing. We believe that real-vehicle testing has numerous issues. Cost is one aspect, but the primary issue is that we cannot fully replicate the scenario where the problem occurred when testing and verifying certain scenarios. Additionally, real-vehicle testing's efficiency is too low. During real-vehicle testing, you need to drive past a scenario and then retest it. Our current simulation effects fully match those of real-vehicle testing.

The usual industry practice is to maintain the scale of real-vehicle testing and significantly increase simulation testing for incremental improvements. Is Ideal Auto being too aggressive?

Lang Xianpeng: Simulation testing is effective and cost-efficient, so why not utilize it? We retain real-vehicle testing for some necessary content. Any technological advancement must be accompanied by changes in the R&D process. After the industrial era, farming with knives and fire was replaced by mechanization. The same applies to the era of autonomous driving. After the end-to-end era, we entered the era of using AI technology for autonomous driving. Now that we have entered the era of large VLA models, testing efficiency is the core factor in improving capabilities. If we want to iterate rapidly, we must eliminate factors in the process that hinder quick iteration. If there's still significant real-vehicle and manual involvement, the speed will decrease. It's not that we must replace real-vehicle testing, but this technology, this solution inherently necessitates the use of simulation testing. If we don't adopt this approach, we aren't practicing reinforcement learning or developing the VLA model.

Can simulation testing fully replicate the real physical world?

Zhan Yifei: In 2024, we still conducted over 1.5 million kilometers of real-vehicle testing. In fact, we already possessed the capability of world model simulation at that time. We used these over 1.5 million kilometers of real-vehicle testing to verify the reliability of the simulation environment.

Initially, there were issues with the reproducibility or authenticity of the world model simulation. However, through comparison with real-vehicle testing data, we conducted numerous engineering and algorithm optimizations to address the vulnerabilities or defects in simulation testing over the past year, achieving a very high degree of simulation consistency. Although it hasn't reached 100%, the accuracy rate can exceed 99.9%.

Recently, Ideal Auto released the OTA 7.5 version, and VLA will soon be released. What is the significance of this version?

Lang Xianpeng: The OTA 7.5 version introduced Super Alignment, which is significant for VLA because it accumulates numerous evaluation scenarios and data for VLA. Assuming other teams are developing VLA models, evaluation alone is a challenge requiring the accumulation of many scenarios. The reason we can iterate rapidly in the VLA model is that VLA evaluation resembles previous real-vehicle evaluations. During real-vehicle evaluations, everyone has their own methods and scenarios. Our VLA simulation evaluation has already laid a foundation in Super Alignment. Currently, there are over 400,000 scenario evaluations, and we will continue to supplement them.

What pitfalls did Ideal Auto encounter when developing VLA?

Lang Xianpeng: Our understanding has always been relatively good. There are certainly minor pitfalls, such as the amount of computing power reserved and the speed of delivery. Small engineering details and optimizations are undoubtedly encountered by everyone. I believe it's acceptable to encounter minor pitfalls, but we shouldn't make significant misjudgments. I think we've been fortunate.

▍Lang Xianpeng

If competitors also launch VLA, even if later than Ideal Auto, will their catch-up speed also be rapid?

Lang Xianpeng: The VLA model will also witness rapid iterations, but this hinges on comprehensive foundational capabilities such as algorithms, computational power, and data, as well as the support of engineering capabilities. Notably, the training of VLA differs markedly from that of end-to-end models. It necessitates more mature simulation environments tailored for reinforcement learning, which stands in stark contrast to imitation learning that relies solely on real-vehicle data.

How significant is the technical hurdle involved?

Lang Xianpeng: There is indeed a technical hurdle. Ideal Auto's core technical barrier lies in the simulation of the world model, which is highly sophisticated and difficult for others to replicate in a short span. Given its necessity for rapid iteration and real-vehicle testing, it is challenging for competitors to surpass us.

How long can Ideal Auto sustain its lead in VLA?

Lang Xianpeng: Our organization is not a conventional functional structure but an IPD organization, akin to a large-scale project. Despite departmental divisions and assignments, we establish internal project teams to handle specific initiatives, be it end-to-end last year, map-free the year before, or VLA this year. The organizational challenge is minimal as everyone has been accustomed to project-based R&D for years, which has become one of our strengths. Last year, the end-to-end team comprised 180 individuals, while the VLA team now has slightly more than 200. I believe we don't need thousands; Tesla does an excellent job with just one or two hundred people.

03. Maximizing In-Vehicle Computing Power

Some competitors have launched vehicles with higher on-board computing power than the i8. Do you feel any pressure?

Lang Xianpeng: Computing power and quantization accuracy are intertwined. If higher precision is used, equivalent or effective computing power diminishes, but better quantization accuracy enhances it. Without knowing others' quantization accuracy, it's challenging to judge.

We have a long-term plan for in-vehicle computing power but cannot disclose it now.

Their approach involves independently developing chips and algorithms, ensuring high compatibility.

Lang Xianpeng: The core rationale for in-house chip development is optimization tailored to our algorithms, offering high cost-effectiveness and efficiency. We continue to use the Thor chip because NVIDIA provides excellent support for new operators, and its computing power suffices. Changes during the VLA iteration process remain possible, so we're sticking with Thor. If the algorithm becomes stable, independently developing chips for better efficiency and cost might be considered.

NVIDIA's Thor is a general-purpose chip. Can computing power be maximized using it?

Zhan Kun: We began deploying the Orin chip for large models last year, initially deemed impossible by NVIDIA. Through detailed analysis and collaboration, our engineering and deployment teams accomplished much, including modifying the CUDA and rewriting PTX instructions, to achieve the current performance.

Li Auto's autonomous driving team consistently demonstrates strong engineering deployment capabilities, with a keen eye for detail. Maximizing chip performance involves bottom-level analysis and addressing bottlenecks. VLA's efficiency has improved nearly tenfold, from initially requiring 500-600 milliseconds per frame to achieving 10Hz. This involves adjusting algorithms, operators, and reducing precision from FP16 to FP8, significantly boosting performance. NVIDIA also recommends FP4 in its latest architecture, which we aim to leverage further.

Integrating the Thor chip into vehicles wasn't easy either.

Lang Xianpeng: We were the first to use the Thor chip in vehicles and previously the Orin chip in the L9 model. We've accumulated extensive cooperation experience with chip vendors. Addressing defects and iterating with partners is part of the normal process. Chip production requires significant inputs, and we provide many during new chip development. Issues on older chips are often resolved in newer versions.

How do you maintain model accuracy when reducing precision from FP16 to FP8?

Zhan Kun: This is a common industry challenge. Large models reduce numerical accuracy requirements. As models grow, their fault tolerance decreases, allowing for more sophisticated operations and data capacity. This shift towards lower precision and compute-intensive operators characterizes VLM and VLA. Additionally, we've done extensive data cleaning to stabilize and converge training.

Is FP4 achievable in the future, doubling Thor's effective computing power?

Zhan Kun: Achieving FP4 requires more effort in training, data iteration, and cleaning. We're actively exploring this and will soon further maximize Thor's computing power.

04. Halfway to the Goal

Is VLA a technological innovation in AI or engineering?

Zhan Kun: VLA is more than just an engineering innovation. It aligns with embodied intelligence, bringing large models to the physical world. Our VLA model introduces embodied intelligence concepts to autonomous driving, a pioneering initiative.

However, engineering innovation is crucial for autonomous driving. Deploying large models on edge computing is challenging. Many teams find VLA deployment difficult, especially with limited edge chip computing power. Thus, it's not solely engineering but requires extensive optimization.

Driving with the VLA model offers a superior experience to end-to-end models, though few are available. What's its significance?

Zhan Kun: The VLA model can think, a significant advantage over end-to-end models. Language (L) is crucial in VLA. For autonomous driving to advance to L4 or higher, L is essential. Large language models and others are also moving towards end-to-end L. Recognizing this, we're vigorously developing L with many VLA applications.

Will future VLA and Li Auto's intelligent agent converge into a unified architecture?

Zhan Kun: We believe VLA will form a larger, unified architecture. It's a promising technology for AI implementation in the physical world, not just for autonomous driving but potentially the most reasonable direction for physical AI thus far.

So, VLA isn't just the starting point for L4 intelligent driving but also for AI. Do you plan to use it on other hardware like robots?

Lang Xianpeng: Absolutely, and we've established various robot departments. VLA is a robust framework for embodied intelligence with potential extensions.

When can a higher-level intelligent agent like an AI Agent be realized?

Lang Xianpeng: We previously conceptualized a driver Agent but have iterated on it. VLA should focus on providing an excellent driver, a family driver, that excels at driving. Agent capabilities will later integrate with other applications. Current AI Agent experiences and products are still in early stages.

When deploying the VLA model in vehicles, is a lighter, smaller version necessary, perhaps through distillation?

Zhan Kun: We've balanced efficiency and distillation during deployment. Our base model is a self-developed 8x0.4B MoE, unique in the industry. It suits the NVIDIA chip well, offering fast inference and large model capacity. Additionally, we've distilled a 32B cloud-based model into a 3.2B MoE model, combined with Vision and Action using Diffusion, significantly optimizing performance.

During the test drive, VLA accurately moved the i8 forward 5 meters when instructed. Was this due to specific training?

Zhan Kun: We don't train models on rigid data like moving forward 10m or 12m. However, general knowledge data includes understanding of physical space. Large models now incorporate spatial knowledge, as seen in ChatGPT and Qianwen. After learning these capabilities, we reflect them in actions. When fed vast amounts of data, models generalize and exhibit abilities, including behavior, at scale.

Our abilities and knowledge integrate various disciplines, and we closely monitor large model progress, which can easily transfer to autonomous driving.

(For readability, question parts have been re-edited, and answer parts slightly modified without altering their meaning.)

This article is original content from Autopix and should not be republished without authorization.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links