The editor of Downcodes learned that Google is using its Gemini AI to improve robot navigation and task execution capabilities. The latest research from the DeepMind team shows that the long context window function of Gemini 1.5 Pro significantly improves the natural language interaction between users and the RT-2 robot. This technological breakthrough allows robots to understand the environment by watching videos and complete complex tasks such as guiding users to power sockets for charging according to instructions, demonstrating the huge potential of artificial intelligence in the field of robotics.
Google is training its robots with Gemini AI to improve navigation and task completion.
In a new research paper, the DeepMind Robotics team explains in detail how to use Gemini1.5Pro's long context window to make it easier for users to interact with the RT-2 robot using natural language instructions. By taking a video tour of a designated area, the researchers used Gemini1.5Pro to let the robot "watch" the video to understand the environment, allowing the robot to perform commands based on what it observed, such as guiding the user to a power outlet for charging.
DeepMind said that the robot equipped with Gemini successfully executed more than 50 user instructions in an operating area of more than 9,000 square feet, with a success rate of 90%.
In addition, the researchers also found that Gemini1.5Pro allows the robot to plan how to complete instructions, not just navigation. For example, when a user with lots of Coke cans on their desk asks the robot if their favorite drink is available, Gemini lets the robot know it should head to the refrigerator to check and then reports the results back to the user. DeepMind said it would investigate these results further.
While the video demonstration provided by Google is impressive, it takes 10-30 seconds for the robot to process these instructions, according to the research paper. While it may be some time before we share our homes with more advanced environmental mapping robots, at least these robots might be able to help us find our lost keys or wallet.
Highlight:
Gemini AI trains robots to improve navigation and task completion capabilities
? Gemini1.5Pro enables robots to execute natural language instructions
Gemini enables robots to plan and execute instructions beyond navigation, study finds
This research result heralds the rapid development of robot technology in the future. The application of Gemini AI will greatly enhance the intelligence level of robots and enable them to better serve human life. Although there are still some technical bottlenecks, I believe that in the near future, more advanced robots will enter our lives and bring us more convenience. The editor of Downcodes will continue to pay attention to the latest developments in this field.