“Job Hunt As a PhD in RL: How It Actually Happens § Reinforcement Learning Reflections”, Nato Lambert2022-07-05 ()⁠:

…“We solved perception, but our bots keep crashing.” There is a growing optimism—or maybe more realistically and surprisingly a need for—reinforcement learning expertise. The most expansive robotic platform in the real-world today is autonomous vehicles. Among these labs, very good perception APIs have been left to control teams. These control engineers have been struggling to develop existing stacks to cover the long tail of potential problems one encounters when driving in SF. RL practitioners step in with mindsets and tool-sets for creating machine-learning based decision making systems.

In many cases I think the technology would involve integrating specific model learning to areas of confusing control performance with high value placed on robustness. With this in mind, most companies happily entertained far out ideas such as “AlphaZero for AVs” or “RL as an adversary for reliability testing.” One of my evaluation metrics for a company is how they responded to the question of how they intend on dealing with new and continuous data from partners / deployed robots—my idea of how all real world data becomes ML.

We’ll see in the next few years how these ideas play out, but big players are moving fast in the space. Tesla is growing their RL team, and I do think they generally try and build things that “work” (but often leaves a lot to be desired in terms of evaluating its impact and being transparent on its capabilities). DeepMind seemingly continues to hire everyone I’ve looked up to that comes onto the job market in RL, so their big projects won’t slow at all.

The billion+ dollar question is how big of a data and training scale is needed for something like MuZero to work. We’ve seen that at maximum scale it is amazing. EfficientZero started to peel that back. So many companies have wanted to try MuZero, so I think we’ll know in a couple years.