07/28 2025
445
The wave of large models has permeated public life for over two years, yet a noticeable gap persists between public anticipation and industrial applications in this domain.
At the perception level, the public witnesses rapid updates and iterations in model capabilities weekly, with models surpassing human performance across various tasks and evaluation metrics. However, the industrial landscape presents a contrasting picture. In sectors such as industrial manufacturing, healthcare, and finance, many insiders acknowledge that large models are still in the nascent stages of single-point application, with large-scale deployment and ToB blockbuster applications remaining elusive.
The crux of this gap lies in the stringent reliability requirements of professional scenarios, far exceeding the capabilities of current general models. This scenario is akin to an exceptional undergraduate, well-versed in general education, struggling to transition into a clinician.
To ensure precision in professional fields, the industry has adopted various measures, including fine-tuning, retrieval enhancement, and knowledge bases. Nonetheless, there is a general consensus that the credible application of large models necessitates a paradigm shift.
At the WAIC forum "From General Intelligence to Professional Productivity: A New Paradigm of AI Application Led by High-Order Programs" on July 27, AntChainMind, a subsidiary of Ant Group, introduced a novel solution—a technical framework for credible large model application based on High-Order Programs (HOP). This framework leverages human intelligence to address scenarios demanding high reliability, employing expert experience, domain knowledge, and multiple verifications to guarantee execution accuracy at the engineering level.
During the forum, AntChainMind also announced the official open-sourcing of this technical framework, aiming to accelerate the credible application of large models across industries.
Wei Tao, Vice President of Ant Group and Chairman of AntChainMind, used new energy vehicles as a metaphor for today's large model industrial applications. Large models serve as the motor system of new energy vehicles, a general intelligence engine. However, the overall reliability hinges on the electronic control system.
'Previously, when reliability was lacking, the blame fell on the engine. We believe that high-order programs represent an advanced control system capable of effectively managing the intelligence of the electronic control component in industrial AI applications. The battery represents data. In the future of industrial AI, the core will encompass data, intelligent models, and high-order programs, collectively supporting the industry's AI transformation.'
01
Industrial-Grade Large Model Applications Face the 'Last Mile' Challenge
Users of image generation applications have undoubtedly encountered AI blunders. While generated images may appear decent at first glance, closer inspection often reveals issues such as an extra finger on a character's left hand, unnaturally distorted right-hand joints, or unreadable text within the image.
Over the past two years, continuous model iterations have significantly improved these issues. For instance, current models can generate letters and text with minimal errors. However, occasional flaws still surface in AI-generated images.
The content generation field has also been heavily impacted. When prompted to generate an article, AI sometimes produces thousands of words, yet upon closer inspection, the references and key data are entirely fabricated.
While the tolerance for such issues in general public scenarios is relatively high, they become problematic in sectors like industrial manufacturing, healthcare, and finance, which involve life safety and financial transactions. The industry's heightened expectations for AI accuracy make the issue of insufficient model reliability a potential obstacle to industrial applications.
IDC reported that, based on a survey of over 300 enterprises, 87% believe that the accuracy of existing models fails to meet business requirements and cannot measure specific outcomes. This is particularly evident in tasks involving user information, production, and decision-making, where models are expected to demonstrate advanced logical reasoning and task execution capabilities.
An industrial AI service provider told Digital Frontier that industrial production control scenarios demand exceptionally high levels of safety, accuracy, timeliness, and generalization from models. For instance, in the chemical industry, boilers or reaction tanks often exhibit high temperature, high pressure, flammability, and explosiveness, with complex reaction processes involving numerous steps. Inaccurate AI results can interfere with normal industrial operations and, in severe cases, lead to safety accidents.
This AI service provider noted that these challenges have hindered the progress of AI adoption in the industry compared to other general fields.
The medical field faces similar issues. Due to the 'black box' reasoning process of large language models, medical large models often struggle with interpretability and reliability in practical applications.
The industry believes that the insufficient reliability encountered in large model applications may stem from two primary reasons.
Firstly, it is related to the inherent hallucinations of large models. When confronted with incomplete or contradictory information, large models generate seemingly plausible explanations through 'completion logic'. Current frontier research indicates that the hallucination problem has not been entirely eradicated despite model scaling and technological advancements. In April 2023, OpenAI reported that when summarizing factual information about characters, the o3 and o4-mini models produced erroneous information 33% and 48% of the time, respectively, compared to the earlier o1 model's hallucination rate of only 16%.
Secondly, there is uncertainty in the model's adherence to user input instructions, especially in complex tasks, multi-step reasoning, or business scenarios with stringent constraints. Issues such as instruction misunderstanding, overwriting, and omission are more prominent in these settings. In June 2023, research published by Apple on its machine learning research website revealed that the reasoning model would completely fail when the task exceeded a critical threshold.
At the WAIC forum "From General Intelligence to Professional Productivity: A New Paradigm of AI Application Led by High-Order Programs," a panel of experts and industry participants discussed solutions for the credible application of large models in industrial end-use, concluding that the solution may not solely reside in the model itself.
Chen Chun, a professor at Zhejiang University and director of the National Key Laboratory of Blockchain and Data Security, holds a somewhat unconventional view regarding hallucinations. He believes that hallucinations are not negative elements to be 'wiped out' but rather a product of artificial intelligence systems reaching a certain level of intelligence. Eliminating all hallucinations would reduce large models to mere mechanical retrieval tools.
Wei Tao echoed this sentiment, using the periodic table of elements and the discovery of the benzene ring structure as examples. He argued that the non-logical, jumping thinking mode in human intelligence, akin to hallucinations, has significantly advanced human civilization.
If hallucinations are not a hindrance to industrial end-use, where lies the solution for large-scale application of large models in high-accuracy scenarios? Chen Chun believes that the breakthrough in reliability does not lie in eliminating 'intellectual characteristics' but in establishing an engineering guarantee framework.
Wei Tao concurred, noting, 'There is a current trend that opposes intelligence and engineering. The problem-solving approach is not placed at the model end, so it doesn't seem very intelligent.' He believes that we should draw from human intelligence and integrate it with engineering to ensure the credible application of large models in high-reliability scenarios.
02
HOP: Ensuring Credible Large Model Applications Through Engineering
Guided by the concept that the credible application of large models requires the integration of intelligence and engineering, AntChainMind has embarked on a series of explorations in this domain.
At WAIC, AntChainMind announced and open-sourced its exploration direction—the HOP large model credible application technology framework, a novel approach to ensuring the successful deployment of large models in high-reliability scenarios.
Wei Tao explained that humans make errors when handling complex tasks, yet many professional fields, such as civil aviation, healthcare, construction, and production lines, have stringent correctness requirements and极低的容错率. To address reliability issues, these scenarios typically adopt Standard Operating Procedures (SOPs), systematizing and standardizing operational flows, work methods, tool usage, timing, and other elements to form repeatable, quantifiable, and optimizable operating standards, ensuring accuracy and reliability through verification and testing.
This solution minimizes error risks and enhances error detection by standardizing actions. HOP borrows from this concept, employing decomposition, verification, and actual measurement in three steps to guarantee model execution reliability from an engineering perspective.
The HOP, or High-Order Program language, is a fusion of programming and natural languages, combining their strengths while mitigating their weaknesses. While natural language boasts a rich vocabulary and diverse grammatical structures, its openness may introduce ambiguity and fuzziness. Conversely, programming languages are formal and precise but have a high learning curve.
HOP utilizes programming language to express logical components and relies on natural language for fuzzy and dynamic matching involving knowledge and semantics. 'Essentially, HOP treats large models as CPUs executing programming languages. Unlike traditional programming languages, large models, due to their high intelligence, can also handle conceptual tasks,' Wei Tao told Digital Frontier.
Specifically, the HOP-based large model credible application framework relies on three core components to ensure reliability:
The first step involves programmatic expression of business logic, analogous to task disassembly when humans address complex requirements. This process, akin to SOPs, disassembles best practices in the field and constructs them programmatically. Programmatic language eliminates the ambiguity and fuzziness of natural language. Additionally, it breaks down complex business logic into verifiable granularities, supporting subsequent efficient verification. This programmatic language, similar to traditional programming languages, offers scalability, allowing for flexible adaptation to future application changes.
The second step entails constructing a scenario knowledge graph. The industry consensus is that to ensure large models achieve over 99% reliability in vertical fields, it necessitates not only general and industry corpus sets but also the integration of expert professional knowledge within the scenario. The domain knowledge graph serves as the carrier for various best practices in related fields.
In this step, natural language is integrated with the domain knowledge graph to support the matching and derivation of fuzzy concepts required by the large model during HOP execution.
The third step involves a controlled tool chain. Similar to humans' error prevention through repeated checking and verification, the HOP execution framework incorporates a verification process when large models perform industry scenario tasks.
Since the granularity is ensured during the task disassembly step, verification dimensions can be embedded during execution and application, ensuring the verification mechanism runs throughout the entire process. After HOP verification, the reliability of large models in professional scenario applications is guaranteed.
Wei Tao emphasized that a comprehensive formal verification framework is crucial for enhancing large model performance. For instance, large models excel in solving math problems primarily because mathematicians have established a robust formal verification framework. 'As long as the large model's proof passes verification, the result can be guaranteed correct. The model can then continue trying in different directions until it succeeds.'
The above three steps enable HOP to not only encapsulate key knowledge and practices in vertical fields but also ensure large model reliability for professional applications through mechanisms such as SOPs and checklists. Furthermore, it can adapt professional knowledge and scenario applications based on knowledge concept matching.
Wei Tao informed Digital Frontier that high-order programs (HOP) and large models exhibit strong complementarity. HOP, which embodies the distillation of industry standard operating procedures (SOPs), ensures the correctness and reliability of industrial applications. It can be optimized, iterated, and verified prior to deployment. Concurrently, the advancement of large models facilitates HOP, substantially reducing the cost associated with its iteration and optimization. Previously, tasks necessitating human intervention can now be significantly optimized in terms of cost, thanks to the enhanced intelligence of models.
03
Driving Change in Large Model Industrial Applications
Over the past two years, focusing on the reliability and feasibility of large models, various industry players have embarked on a series of explorations, including but not limited to prompt engineering, fine-tuning, and knowledge-base-driven retrieval-augmented generation (RAG).
For instance, fine-tuning was previously deemed essential for industry adoption to tailor model capabilities to specific scenarios. 'Whenever a model encountered issues in industrial applications, fine-tuning was the immediate thought,' remarked an industry insider.
However, after one or two years of exploration into implementation, the industry has observed several limitations of fine-tuning. It requires preparing a corpus set and training based on it, which might degrade untrained components and impair reasoning abilities. Additionally, fine-tuned models bifurcate into two versions, potentially leading to a substantial increase in future deployment and application costs, as well as heightened management complexity.
The accumulation of knowledge and the crystallization of expert experience in the field have been recognized as pivotal for the successful deployment of large models over the past two years. Companies often highlight this aspect when sharing their experiences in applying vertical scenario applications.
Nonetheless, senior insiders have noted that these explorations are primarily isolated efforts by individual companies, each starting from scratch. From an industry-wide perspective, there is a lack of an effective mechanism to scale and replicate the accumulation and crystallization of expert knowledge and experience.
This year's WAIC witnessed unprecedented crowd and popularity.
Wei Tao mentioned that AntChainMind's introduction of the HOP framework represents a relatively systematic approach within the industry to address the reliability of large model applications from an engineering standpoint, elevating reliability to a new height. Characterized by low cost, flexible iteration, greater stability, and scalability, it facilitates the promotion of credible large model applications in the industry.
Taking cost as an illustration, Wei Tao explained that compared to traditional fine-tuning solutions, which previously necessitated substantial computational power for training, the HOP framework does not require such a high initial investment.
Regarding flexible iteration, whenever the accuracy and completion rate of large model execution fall short of requirements, the application party can optimize based on the HOP framework. This includes further dissecting the operation process and refining verification. Poor performance in industrial applications may also stem from incomplete scenario knowledge, misinterpretation of scenario terminology, or insufficient knowledge graphs. In such cases, providing better data and refining the disassembly and verification process can enhance the model's performance within the scenario.
Wei Tao believes that prior to the advent of high-order programs, engineering was cumbersome due to the absence of an effective workflow-level carrier, and delivery was challenging. With high-order programs, delivery becomes seamless. Additionally, given HOP's immense business value, it can safeguard the entire application process leveraging AntChainMind's computational power. Every verified HOP application can be invoked in a more credible and reliable manner.
It is understood that the high-order program technology framework has been initially applied in various industry scenarios, such as end-to-end financial risk control, network intrusion detection, and medical duplicate billing, with notable improvements in reliability and timeliness.
According to AntChainMind personnel, taking the scenario of joint financial risk control as an example, under the traditional financial risk control system, the entire process from data exploration, processing, to model construction and optimization heavily relies on manual intervention, resulting in lengthy processes, sluggish response times, and susceptibility to subjective factors, which hinder the efficiency and consistency of joint modeling for financial risk control.
After implementing the HOP technology framework, complex SOPs are transformed into executable processes and codes, enabling intelligent orchestration and automated execution of end-to-end risk control. Compared to traditional modeling personnel manually conducting data analysis and code development, large models integrated with HOP can shorten the modeling cycle while ensuring high accuracy, significantly reducing tedious tasks like repetitive data processing and process execution. This not only minimizes processing costs but also alleviates the shortage of professional talent.
However, Wei Tao also emphasized that relying solely on HOP may not be a one-size-fits-all solution, and a single application cannot resolve all industry challenges. Rather, it provides a technical framework through which each specific scenario can address its unique problems.
Moreover, the integration of intelligence with engineering and expert knowledge is highly industry-specific. To foster the credible implementation of large models, it is imperative to establish an ecosystem encompassing industry experts from diverse fields and sectors.
"The HOP framework aims to serve the entire ecosystem. We aspire to collaborate more closely with the industry through open-source initiatives to resolve the reliability dilemma of large models in professional applications and propel their widespread adoption in specialized domains," said Wei Tao.