“Procedural Generalization by Planning With Self-Supervised World Models”, 2021-11-02 (; similar):
One of the key promises of model-based reinforcement learning is the ability to generalize using an internal model of the world to make predictions in novel environments and tasks. However, the generalization ability of model-based agents is not well understood because existing work has focused on model-free agents when benchmarking generalization.
Here, we explicitly measure the generalization ability of model-based agents in comparison to their model-free counterparts. We focus our analysis on MuZero ( et al 2020), a powerful model-based agent, and evaluate its performance on both procedural and task generalization. We identify 3 factors of procedural generalization—planning, self-supervised representation learning, and procedural data diversity—and show that by combining these techniques:
we achieve state-of-the art generalization performance and data efficiency on Procgen ( et al 2019). However, we find that these factors do not always provide the same benefits for the task generalization benchmarks in Meta-World ( et al 2019), indicating that transfer remains a challenge and may require different approaches than procedural generalization.
Overall, we suggest that building generalizable agents requires moving beyond the single-task, model-free paradigm and towards self-supervised model-based agents that are trained in rich, procedural, multi-task environments.