10/17 2024
594
Source | Lingyi Think Tank
Expert from the National Computer System Quality Supervision and Test Center:
The first challenge of vertical large models: “cooking” data
Putting effort into applications is widely considered a shortcut for China's large models to surpass others. Applications need to be implemented in various industries and scenarios, which are known as vertical large models. However, building vertical models also faces many challenges.
“Many industries lack authoritative and unified standards and guidelines, making it difficult to work without a clear foundation, especially in the financial industry. Although many departments and governments are attempting to establish standards, a unified data governance standard has not yet been formed.” Experts from the National Computer System Quality Supervision and Test Center (hereinafter referred to as “the National Computer Test Center”) believe that this is the first challenge to overcome in developing vertical large models; otherwise, it will be like trying to cook without ingredients.
On August 16, 2024, at the “Opportunities and Thresholds of Financial Large Models” conference jointly organized by Lingyi Think Tank and Suzhou High-Speed Rail New City Industrial Development Co., Ltd., experts from the National Computer Test Center conducted an in-depth elaboration on data governance issues.
In a post-conference interview, the experts from the National Computer Test Center systematically explained their understanding of data governance and data management based on their conference remarks.
01
Vertical models have a preliminary data foundation
Lingyi Finance: It is generally believed that there are three foundations for the development of large models: computing power, algorithms, and data. Currently, most discussions focus on various types of public and open data. However, as large models delve deeper into various industries, niche areas, and scenarios, industry-specific data, commercial data, user data, and other non-public data become core resources. Do we currently have a data foundation for developing vertical large models?
Expert from the National Computer Test Center:
With the continuous improvement of digitalization levels across industries and advancements in cutting-edge big data technologies, many enterprises and institutions have established their data centers and data warehouses, accumulating vast amounts of industry-specific, commercial, and user data. These data are rich in content and variety, preliminarily establishing the technical and data foundation for developing large models.
However, there are still some challenges at the implementation level. For instance, the validity and accuracy of data directly impact the training effectiveness of large models. Additionally, protecting user data privacy and preventing leaks during the training process are crucial concerns.
Making data available, user-friendly, and more accurately reflecting industry and user needs poses ongoing requirements for data quality and security, necessitating routine data management methods.
02
National standards for data management
Lingyi Finance: Data management appears to be crucial. However, it is a broad term encompassing various aspects of data operations. How can we establish reasonable data management standards?
Expert from the National Computer Test Center:
Although the importance of standards in data management and governance has increased in recent years, there is still a lack of unified definitions for their content and methods. Typically, information and digital system construction serve as the primary means.
However, there are significant differences in data management practices among different enterprises. Data governance is a complex project often confronted with numerous issues, necessitating systematic guidance.
Based on the top-down design of constructing a fundamental data management system, China has introduced the DCMM standard, or the Data Management Capability Maturity Assessment Model, at the system framework level.
As China's first national standard in the field of data management, it represents a top-down approach to data governance. After years of vigorous promotion, it is currently in a period of rapid development.
The DCMM standard system classifies enterprise data management maturity into five levels, ranging from project-level to organizational-level, departmental-level, quantitative-level, and optimization-level, clearly positioning the stage of different enterprises' data management capabilities.
Thousands of enterprises' assessment practices have fully demonstrated the scientificity and applicability of the DCMM level classification.
The financial industry can also benefit from the promotion, implementation, and application of the DCMM standard system. It helps enterprises and industry institutions scientifically assess their data management capabilities, identify issues and deficiencies in data management, and establish a data management framework tailored to their characteristics, laying a solid foundation for financial data assetization and participation in the data market circulation.
Lingyi Finance: Specifically, in which aspects and areas should improvements be made to achieve a higher level of data management?
Expert from the National Computer Test Center:
The DCMM system encompasses technical and management requirements, comprehensively analyzing issues from multiple dimensions such as organization, systems, processes, and tools to help enterprises identify and address problems. It covers common elements of data governance, including eight core capability domains: data strategy, data governance, data architecture, data standards, data application, data security, data quality, and data lifecycle.
Specifically, advanced technological tools and platforms should be utilized to support big data governance and application implementation. Simultaneously, emphasis should be placed on standardized management throughout the process, engaging both management and business departments in ensuring a closed-loop and routine execution of data management tasks. A top-down approach should be adopted to foster a culture and awareness of data management, clarifying objectives, pathways, and responsibilities to avoid governance for governance's sake. Furthermore, diverse data analysis and sharing methods should be actively explored to unlock and realize the value of internal and external data assets.
Only by adopting a multi-pronged approach can we comprehensively enhance data management capabilities.
03
Challenge: Most enterprises are still at Level 2
Lingyi Finance: After several years of promoting data management inspections and ratings, what do you consider the biggest challenge facing the field of data management?
Expert from the National Computer Test Center:
From an enterprise perspective, leadership recognition and determination are crucial, serving as the driving force behind digital and intelligent transformation. Given that data management involves numerous departments and requires significant time, effort, and financial resources, the current status varies widely across industries and regions.
Based on the national DCMM implementation data, most enterprises are at Level 2, indicating that most data demands are limited to the business level. Insufficient investment in the overall planning of data governance systems and platforms suggests that enterprises need to enhance their understanding of the significance of data governance.
From an industry perspective, data management efforts often encounter a lack of industry data standards during implementation. Given the significant differences in work characteristics across industries, detailed industry norms regarding data quality, standards, and security are essential.
Without authoritative and unified industry data governance standards, enterprises must independently plan and construct their systems, increasing costs and difficulties in data governance while hindering data openness, sharing, and circulation.
Regarding data management evaluation and certification, while various data standards are actively being promoted, there are still few national and authoritative data governance certification systems.
For instance, China is vigorously promoting data monetization and data factor trading and circulation, which require ensuring data quality and obtaining data quality reports from third-party institutions as a prerequisite. However, the implementation, evaluation, and certification of data quality standards vary across industries and regions, leading to inconsistencies in standards and requirements.
The difficulty in cross-industry and cross-regional recognition of data quality reports not only limits large-scale data transactions and applications but also increases the difficulty of national and industry regulation, making it challenging to accurately measure data quality and formulate regulatory measures, thereby affecting the long-term healthy development of the data market.
04
Characteristics and attempts of government data
Lingyi Finance: Various industries hold significant amounts of data, much of which is controlled by government departments. From a government data management perspective, what is the current “data maturity” level?
Expert from the National Computer Test Center:
Government and administrative data have been at the forefront of China's data governance awareness in recent years, playing an active role. On the one hand, data authorities such as local government data bureaus lead the formulation of public data sharing service standards, planning and standardizing the top-level design of data governance across regions.
Meanwhile, some more developed regions actively take the lead in constructing data exchange and sharing service platforms, or even establishing data trading and circulation markets, deeply participating in data governance efforts to promote the integration and sharing of data resources. This provides more convenient and efficient tools for government data management, contributing to enhanced standardization and normalization of data management practices across regions.
Highly digitized industries, such as finance, energy, and healthcare, also see advanced regulatory levels, with various industry authorities actively promoting the standardized management of industry data.
We have contacted some medical institutions and learned that Beijing Data Exchange selected six hospitals in Beijing's Health Management Bureau last year for a data sharing and trading pilot program. Data trading encompasses various models, including unified, scenario-specific, and classified, with demand-side users authorized to access shared data on trading platforms without removing it.
Whether local governments or industry authorities, their roles and responsibilities in the data factor market differ significantly from those of enterprises and citizens, often acting more as regulators or supervisors.
Government data also has distinct requirements and concerns regarding data security and value, differing from other types of data.
Therefore, in the chain of data governance, trading, and circulation, different participants must undertake distinct data governance tasks based on their specific needs and characteristics to jointly construct a data governance ecosystem.
05
High data “maturity” in the financial industry
Lingyi Finance: Many financial institutions and fintech companies are advancing the development and innovation of financial large models. The financial industry is characterized by high data density and sensitivity. Regarding financial data management, what is the current overall situation, and is it prepared for the development of financial large models? Do you have any suggestions?
Expert from the National Computer Test Center:
The financial industry is currently demonstrating a positive trend in data management. It is recommended to continue improving and enhancing data management efforts while actively exploring the development of financial large models.
Based on national DCMM implementation statistics, although the financial industry has a relatively small number of enterprises, over half of financial enterprises that have obtained DCMM certificates are at Level 3 or above, and several banks have achieved the highest Level 5. The average data governance capability of the financial industry is leading nationally.
Financial enterprises score highly in data governance platform construction and data analysis application development. In the area of data security, where other industries generally perform poorly, the financial industry exhibits a high level of data management awareness and capabilities due to its unique characteristics.
Even the DCMM national standard itself drew on China's financial industry's data governance practices during its initial drafting phase.
Strong industry regulation, a solid digital foundation, and a strong demand for data governance, coupled with the organizational structure of group companies and subsidiaries represented by banks, provide continuous impetus and resource guarantees for implementing data management tasks across financial institutions and levels.
The financial data industry can leverage its strengths, with industry leaders serving as exemplary cases, to comprehensively enhance the industry's data management level while continuing to improve data governance outcomes in terms of data standards, quality, and open sharing.
Strengthen data factor ecosystem cooperation, actively innovate and practice, and explore the development of financial large models suited to industry characteristics and development needs.