“AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Yao Lu, Karol Hausman, Yevgen Chebotar, Mengyuan Yan, Eric Jang, Alexander Herzog, Ted Xiao, Alex Irpan, Mohi Khansari, Dmitry Kalashnikov, Sergey Levine2021-11-09 (, , ; similar)⁠:

Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both approaches? A number of prior methods have aimed to address this question, proposing a variety of techniques that integrate elements of IL and RL. However, scaling up such methods to complex robotic skills that integrate diverse offline data and generalize meaningfully to real-world scenarios still presents a major challenge.

In this paper, our aim is to test the scalability of prior IL + RL algorithms and devise a system based on detailed empirical experimentation that combines existing components in the most effective and scalable way. To that end, we present a series of experiments aimed at understanding the implications of each design decision, so as to develop a combined approach that canuse demonstrations and heterogeneous prior data to attain the best performance on a range of real-world and realistic simulated robotic problems.

Our complete method, which we call AW-Opt, combines elements of advantage-weighted regression [1, 2] and QT-Opt [3], providing a unified approach for integrating demonstrations and offline data for robotic manipulation.

Please see https://awopt.github.io/ for more details.