“AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, 2021-11-09 (; similar):
Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both approaches? A number of prior methods have aimed to address this question, proposing a variety of techniques that integrate elements of IL and RL. However, scaling up such methods to complex robotic skills that integrate diverse offline data and generalize meaningfully to real-world scenarios still presents a major challenge.
In this paper, our aim is to test the scalability of prior IL + RL algorithms and devise a system based on detailed empirical experimentation that combines existing components in the most effective and scalable way. To that end, we present a series of experiments aimed at understanding the implications of each design decision, so as to develop a combined approach that canuse demonstrations and heterogeneous prior data to attain the best performance on a range of real-world and realistic simulated robotic problems.
Our complete method, which we call AW-Opt, combines elements of advantage-weighted regression [1, 2] and QT-Opt [3], providing a unified approach for integrating demonstrations and offline data for robotic manipulation.
Please see https://awopt.github.io/ for more details.