Research on the scale of data in the field of robot manipulation has always been a major challenge in the field of robot learning. Existing research focuses on the fields of natural language processing and computer vision, while research on the field of robot manipulation is relatively scarce. This article introduces the latest research results of the research team of Tsinghua University. This study discovered the pattern of data scale in robot imitation learning and proposed an efficient data collection strategy, which significantly improved the generalization ability of robot strategies.
The rapid development of deep learning is inseparable from large-scale data sets, models and computational volume. In the fields of natural language processing and computer vision, researchers have discovered a power-law relationship between model performance and data scale. However, the field of robots, especially robot manipulation, has not yet established similar scale laws.
A research team at Tsinghua University recently published a paper exploring the scale of data in robot imitation learning and proposed an efficient data collection strategy that collected enough data in just one afternoon to make the strategy Ability to achieve a success rate of about 90% on new environments and new objects.
The researchers divided generalization capabilities into two dimensions: environmental generalization and object generalization, and used hand-held jaws to collect human demonstration data on various environments and different objects, and modeled these data using diffusion strategies. The researchers first focused on the two tasks of pouring water and mouse placement. By analyzing how the performance of strategies on new environments or new objects changes with the increase in the number of objects, they summarized the data scale rules.
The research results show that:
The generalization ability of the strategy to a new object, a new environment or both is power-law related to the number of training objects, training environments or training environment-object pairs, respectively.
Increasing the diversity of environments and objects is more effective than increasing the number of demonstrations for each environment or object.
Collect data in as many environments as possible (for example, 32 environments), with a unique operation object and 50 demonstrations in each environment, you can train a strategy with strong generalization ability (90% success rate) to make It can operate on new environments and new objects.
Based on these data scale laws, researchers have proposed an efficient data collection strategy. They recommend collecting data in as many different environments as possible, using only one unique object in each environment. When the total number of environment-object pairs reaches 32, it is usually enough to train a strategy that can operate in a new environment and interact with objects that have not been seen before. For each environment-object pair, 50 demos are recommended to collect.
To verify the universal applicability of the data collection strategy, the researchers applied it to two new tasks: folding the towel and unplugging the charger. The results show that this strategy can also train strategies with strong generalization capabilities on these two new tasks.
The study shows that a single-task strategy that can be deployed to any environment and object by investing relatively modest time and resources can be learned. To further support researchers' efforts in this regard, the Tsinghua team released their code, data and models, hoping to inspire further research in the field and ultimately implement a universal robot that can solve complex, open-world problems.
Paper address: https://arxiv.org/pdf/2410.18647
This study provides important theoretical guidance and practical methods for robot imitation learning, and lays a solid foundation for building a more generalized robot intelligent system. The open source of this research result also provides valuable resources for other researchers to promote the development of the field.