NVIDIA announces that its research team has developed a new neural network called HOVER (Humanoid Multifunction Controller), a small but efficient neural network with only 1.5 million parameters dedicated to controlling the movement and operation of humanoid robots . What is unique about HOVER is its ability to capture subconscious processes in human movement, allowing robots to perform complex tasks without cumbersome programming, which is a significant breakthrough in the field of robot control. Its efficient training process is also impressive. It actually only took 50 minutes of real time to train in a virtual environment for a year.
The research team announced an exciting progress in developing a new neural network called HOVER (Humanoid Multifunctional Controller). This neural network has 1.5 million parameters and is specifically designed to coordinate the movement and operation of humanoid robots.
"Not all basic models need to be huge. The 1.5M parameter neural network we trained is designed to control the body of a humanoid robot." He further explained that HOVER is able to capture human motion," he explained. The subconscious process allows the robot to perform complex tasks without cumbersome programming. He mentioned that “humans need a lot of subconscious processing when walking, maintaining balance, and manipulating their limbs flexibly.”
During the training process, HOVER used NVIDIA's Isaac simulation platform, which can accelerate physical simulation at a speed of 10,000 times that of real time.
Jim Fan revealed that the model has been trained in a virtual environment for a year and actually only took about 50 minutes of real time, which is done on a single GPU. He said that this efficient training allows neural networks to be transferred smoothly to real-world applications without fine-tuning.
HOVER has the ability to respond to a variety of high-level motion instructions, including control of head and hand posture using XR devices such as Apple's Vision Pro, or obtaining full-body postures through motion capture and RGB cameras, and even joints from exoskeletons. Angle, or get the root speed command from the joystick. Fan stressed that HOVER provides a unified interface for robots that control different input devices, thereby facilitating the collection of remote operational data for training.
In addition, HOVER is integrated with upstream vision-language-action models, allowing motion instructions to be converted to low-level motor signals at high frequency. This model is compatible with any humanoid robot that can be simulated in Isaac, allowing users to give the robot life easily.
Back in the early this year, NVIDIA also announced a project called the GR00T, a general-purpose model designed for humanoid robots. The GR00T (Generalist Robot00Technology) powered robots can understand natural language and mimic human movements by observing actions, allowing them to quickly learn coordination, flexibility and other skills needed to interact effectively in the real world.
Paper URL: https://arxiv.org/pdf/2410.21229
Key points:
- NVIDIA launches HOVER, a 1.5 million parameter neural network designed to control the movement and operation of humanoid robots.
- ⏳ HOVER trained in a virtual environment for one year, and the actual training time was only 50 minutes, which improved the efficiency of real-life applications.
- HOVER supports a variety of high-level motion instructions, can work in collaboration with different input devices, and provides a unified interface for robot control.
The emergence of HOVER marks a major leap in the control technology of humanoid robots. Its efficient training methods and multifunctional control capabilities have laid a solid foundation for the widespread application of humanoid robots in the future. We look forward to this technology bringing more exciting applications and innovations in the future.