The latest research by Professor Li Feifei's team shows that multi-modal large models have made breakthroughs in spatial intelligence, demonstrating the ability to remember, recall space, and build local world models. This study used the VSI-Bench evaluation tool to test multiple large models based on real video scenarios. The results showed that some models have reached or approached human levels in spatial reasoning tasks, and found that cognitive map assistance can significantly improve the model. spatial understanding ability. This research not only reveals the latest progress of AI in the field of spatial perception, but also heralds the widespread application of AI in navigation, robot interaction and other fields in the future.
The VSI-Bench developed by the research team contains more than 5,000 high-quality question and answer pairs, covering a variety of scenarios and geographical areas, providing a reliable benchmark for evaluating visual spatial intelligence. The research results are of great significance in promoting the development of general artificial intelligence (AGI), and also provide a solid technical foundation for World Labs, a company founded by Professor Li Feifei that focuses on the development of spatial intelligence AI models. The company’s success also confirms the huge potential and market prospects of spatial intelligent AI.
The research results show that although the overall performance of the multi-modal model is still lower than that of humans, it has reached or approached human levels on some tasks. For example, Gemini-1.5Pro performs outstandingly in tasks such as absolute distance and room size estimation, and some open source models such as the LLaVA series have also achieved competitive results.
The study also pointed out that using cognitive maps to assist spatial reasoning can significantly improve the model's performance on spatial tasks, with the accuracy increasing by up to 10 percentage points. This shows that explicitly generating cognitive maps can help break through the model's bottleneck in spatial understanding.
Li Feifei said that spatial intelligence is the key ability of AI to understand the physical world and is crucial to the realization of artificial general intelligence (AGI). She believes that spatial intelligence will become the next cutting-edge technology direction in the field of AI, and is even expected to achieve important breakthroughs in 2025.
In September this year, World Labs, a company founded by Li Feifei, announced its official launch and focuses on developing AI models with spatial intelligence. The company has received investment from well-known institutions including Nvidia, a16z, Adobe and other well-known institutions, and its current valuation exceeds US$1 billion.
This research and its application mark a key advancement in AI technology from two-dimensional information processing to three-dimensional space perception. In the future, it is expected to be widely used in navigation, robot interaction, augmented reality and other fields, opening up a new path for the further development of artificial intelligence.
The results of this research are exciting and point out a new direction for the development of AI. In the future, with the continuous advancement of technology, AI applications based on spatial intelligence will profoundly change our lives and bring more convenience and possibilities to human society.