“Director: Deep Hierarchical Planning from Pixels”, 2022-06-08 ():
[homepage; blog; cf. Socratic/SayCan] Intelligent agents need to select long sequences of actions to solve complex tasks. While humans easily break down tasks into subgoals and reach them through millions of muscle commands, current artificial intelligence is limited to tasks with horizons of a few hundred decisions, despite large compute budgets. Research on hierarchical reinforcement learning aims to overcome this limitation but has proven to be challenging, current methods rely on manually specified goal spaces or subtasks, and no general solution exists.
We introduce Director, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model [using DreamerV2]. The high-level policy maximizes task and exploration rewards by selecting latent goals and the low-level policy learns to achieve the goals. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director outperforms exploration methods on tasks with sparse rewards, including 3D maze traversal with a quadruped robot from an egocentric camera and proprioception, without access to the global position or top-down view that was used by prior work. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
…The computation time of Director is 20% longer than that of DreamerV2. Each training run used a single V100 GPU with XLA and mixed precision enabled and completed in less than 24 hours…The results of this experiment are summarized in Appendix A due to space constraints, with the full training curves for Atari and the Control Suite included in Appendices J & K. We observe that Director indeed learns successfully across many environments, showing broader applicability than most prior hierarchical reinforcement learning methods. In addition, providing task reward to the worker is not as important as expected—the hierarchy solves a wide range of tasks purely by following goals at the low level. Additionally providing task reward to the worker completely closes the gap to the state-of-the-art DreamerV2 agent. Figure K.1 in the appendix further shows that Director achieves a higher human-normalized median score than Dreamer on the 55 Atari games.