This year, the Nobel Prize in Physics and Chemistry were both awarded to AI-related fields. The concept of AI for Science (AI for scientific research), which has been simplified as "AI4S", has also attracted the attention of scientists at home and abroad.
From November 4th to 6th, the 2024 Scientific Intelligence Summit was held at Peking University. Zhang Jin, Gong Xingao, Tang Chao and other academicians of the Chinese Academy of Sciences, as well as many experts and scholars with practical experience in AI scientific research, shared and discussed the current application of AI in scientific research. The specific applications of AI in the field of scientific research, the limitations and unsolved problems of AI, and the impact that AI for Science may have on the scientific research paradigm in the future.
AlphaFold’s success is only the first step in a long journey. Traditional AI frameworks still have limitations.
Hassabis, the current Nobel Prize winner in Chemistry, won the award for his development of the AlphaFold artificial intelligence model, which solved a 50-year-old problem and can predict the complex structures of approximately 200 million known proteins. , and has been used by more than 2 million people around the world. In the view of Tang Chao, academician of the Chinese Academy of Sciences and director of the Peking University-Tsinghua University Joint Center for Life Sciences, AlphaFold’s success does not equal success in the field of life sciences. It is only the “first step in a long march of thousands of miles.”
Tang Chao, academician of the Chinese Academy of Sciences and director of the Peking University-Tsinghua University Joint Center for Life Sciences, is giving a speech/photographed by Luo Yidan, reporter of Beijing News Shell Finance
Tang Chao introduced that most models in the field of life sciences are currently limited to a single modality, such as single-cell transcription, RNA sequence, protein structure, etc. However, life science is a complex and huge system. The essence of life science is to start from molecules and cells. , organs to the multi-level and multi-dimensional interactive composition of the overall life.
"Life is a complex system with multiple scales and levels from macro to micro. Each level has its own language and logic, which influence each other." Tang Chao said, "Traditional AI frameworks perform well in processing structured and linear data, but life systems The data is dynamic and multi-bit interactive, so the traditional AI framework shows obvious limitations when dealing with high-dimensional, non-linear life science data.”
In addition, even single-modal AI research requires a good data foundation. Currently, some scientific research fields face problems of insufficient experimental data and insufficient standardization of experimental data.
Tang Chao said that the construction of life science data systems started late, with insufficient investment, lack of a complete full-chain ecosystem, and lack of systematic strategic planning and sharing mechanisms in the early stage. It is difficult to form high-impact and manuscript-ready data sets, and the data utilization rate lags behind. In Europe and America.
Zhang Jinze, academician of the Chinese Academy of Sciences, member of the Standing Committee of the Party Committee and Vice President of Peking University, mentioned when introducing the use of AI for materials research that the current data collection process is not uniform, and the data obtained by different equipment, environments, and operators are very different. In addition, the data generated by different types of experiments include images, spectral data, structural data, etc., in different formats.
AI modeling and training require the support of big data. Zhang Jin said, "Standardization is the basis for realizing data sharing, reproducibility and scientific knowledge iteration."
Zhang Jin, academician of the Chinese Academy of Sciences, member of the Standing Committee of the Party Committee and Vice President of Peking University, is giving a speech. Photo by Luo Yidan, reporter of Beijing News Shell Finance
In Tang Chao's view, the issues that need to be solved urgently in life science large model framework research include: optimizing the encoder design of sequence, image and matrix data based on the characteristics of life science data; adjusting module architecture and data for the fusion of different modal data Set selection and pre-training strategies. What can really cause "revolutionary changes" is how to build a new model architecture for the language logic, self-organization, hierarchical emergence, feedback mechanism, adaptability, etc. of life phenomena.
Tang Chao introduced that the research process in life sciences is often a cycle of: conducting experimental observations - model fitting to explain phenomena - summarizing properties - predicting behaviors - and then conducting experimental observations. He believes that model fitting may be completed through AI in the future, "We The goal is to build a multi-modal, cross-level life science model, and ultimately hope to discover new laws and principles in life science.”
AI revolutionizes the research paradigm: No longer obsessed with clear “explainability” through extensive experimental calibration
Although "AI4S" still has many problems that need to be solved, currently, AI has made achievements in many different scientific research fields. In addition to the Nobel Prize-related AlphaFold mentioned above, specific applications also include applications such as DeepMind. AI technology controls the shape of plasma in the nuclear fusion-Tokmak device, FraphCast predicts global weather in the next ten days and surpasses the human system HRES in 90% of indicators.
In addition, AI also accelerates the process of experimental research. Zhang Jin said that it is basically impossible for a student to repeat 3 sets of the same experiment in one day, but through the automated platform, 150 sets of automated experiments can be done in one day, which greatly improves the repeatability of the experiment, and high-quality experimental data is the key to simulation training. the basis of.
Jiang Jun, chair professor at the University of Science and Technology of China, introduced his and his team’s experience using the University of Science and Technology of China’s robotic chemist platform for experiments. Through his video presentation, the Beijing News Shell Finance reporter noticed this device with an omnidirectional mobile chassis and intelligent machinery. Arm, a fully autonomous experimental operation robot that looks like a "moving table".
Jiang Jun, Chair Professor of the University of Science and Technology of China, introduces the machine experiment system. Photo by Luo Yidan, reporter of Beijing News Shell Finance
Jiang Jun introduced the machine chemist platform of the University of Science and Technology of China as "able to read, be able to calculate, and work diligently". "Through the machine reading system, natural language processing capabilities are used to analyze papers, patents, textbooks, experimental electronic notebooks, and collect neutral data on site; Physical models/intelligent predictions through machine computing systems; experiments through machine experimental systems to obtain real-world feedback calibration.”
He introduced that the development trend of "AI4S" at home and abroad is large models + robots + ecological alliances. For example, the British AI-Hub Alliance spent 3.2 billion yuan to build an intelligent innovation factory with 11,000 square meters, 200 scientists and 100 engineers. It serves Unilever and accounts for 60% of its annual R&D funding.
Many scientists at the scene said that AI has brought scientific research into a new stage.
Gong Xingao, academician of the Chinese Academy of Sciences and professor of Fudan University, said that the paradigm of physics research is divided into four stages: experimental physics, theoretical physics, computational physics, and mathematical physics. At present, it has reached the stage of digital physics using data mining, artificial intelligence, and machine learning as tools.
From Zhang Jin's perspective, the awarding of the Nobel Prize to AI-related fields is a benchmark: "Scientific pursuits of rigor such as physics and chemistry will become more open. We are no longer obsessed with clear 'interpretability,' but Allowing black-box predictions to be accepted and continuously calibrated through experiments, ultimately leading to a more precise and comprehensive understanding.”