Waymo is using Google's powerful multimodal large language model Gemini to revolutionize its self-driving taxi training methods. At the heart of this breakthrough is an end-to-end multimodal model called EMMA, which can process sensor data more efficiently, thereby generating more accurate predictions of future driving trajectory. This move not only improves the intelligence and safety of Waymo's autonomous driving system, but also marks a major leap in the application of large language models in the field of autonomous driving, indicating that in the future, autonomous driving technology will surpass traditional modular design and be smarter , develop in a more independent direction.
Recently, Waymo has taken another important step in the field of autonomous driving. The company has long viewed its collaboration with Google DeepMind as its competitive advantage, and is now leveraging Google's multimodal large language model, Gemini, to improve the training of its self-driving taxis.
Waymo has released a new research paper introducing an "end-to-end multimodal model" called EMMA, which is able to process sensor data to generate the future driving trajectory of autonomous vehicles. This means Waymo’s driverless vehicles can make driving decisions more intelligently and effectively avoid obstacles.
The importance of this new technology is not only in its innovation, but also in its potential to change the scope of applications of most large language models at present. Waymo wants to see MLLM as a “class 1 citizen” of its autonomous driving system, which means that future autonomous driving may be very different from current chatbots or image generators.
In this paper, Waymo mentioned that traditional autonomous driving systems usually develop specific "modules" for various functions, including perception, mapping, prediction and planning. While this approach has made some progress in the past few years, its limitations are also obvious, especially when dealing with new and complex environments. Waymo believes that MLLMs like Gemini can solve these problems because they have extensive "world knowledge" and are able to perform "chain thinking reasoning" to simulate human logical reasoning.
The EMMA model was developed to help Waymo’s self-driving taxis navigate in complex environments. For example, when encountering situations such as animals or road construction, EMMA can help driverless cars find the best driving path. However, Waymo also realized that EMMA has some limitations, such as the current inability to process 3D sensor inputs from lidar or radar.
Waymo's research in this area needs further depth, but they hope that this achievement will inspire more research to address current problems and promote the development of autonomous driving technology.
Key points:
Waymo is using Google's Gemini model to develop a new autonomous taxi training system, EMMA, to improve decision-making capabilities.
The EMMA model is able to process complex sensor data, helping driverless vehicles intelligently avoid obstacles.
While EMMA has potential, Waymo acknowledges that further research is still needed to overcome its existing limitations.
Waymo's EMMA model represents a significant leap in autonomous driving technology, which leverages large language models to process multimodal data to pave the way for safer and smarter autonomous driving systems in the future. Although challenges remain, this study undoubtedly brings new hope for the future development of the field of autonomous driving.