AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Job Hunt as a PhD in RL: How it Actually Happens § Reinforcement learning reflections
Stochastic MuZero: Planning in Stochastic Environments with a Learned Model
MuZero with Self-competition for Rate Control in VP9 Video Compression
Procedural Generalization by Planning with Self-Supervised World Models
Podracer architectures for scalable Reinforcement Learning
MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model
Playing Nondeterministic Games through Planning with a Learned Model
Combining Off and On-Policy Training in Model-Based Reinforcement Learning
Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision
On the role of planning in model-based deep reinforcement learning
The Value Equivalence Principle for Model-Based Reinforcement Learning
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Monte-Carlo Tree Search as Regularized Policy Optimization
Continuous Control for Searching and Planning with a Learned Model
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Surprising Negative Results for Generative Adversarial Tree Search
TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models.
2021-schrittwieser-figure1-mspacmanmuzerologrewardscaling.jpg
2020-anonymous-drlsampleefficiency-figure1-alescoresandsamplesovertime.png
2020-anonymous-drlsampleefficiency-figure2-dqnlevelsampleefficiencyovertime.jpg
https://www.reddit.com/r/reinforcementlearning/comments/zqxc12/muzero_learns_to_play_teamfight_tactics/
https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DbERaNdoegnO%2523deepmind.html
Procedural Generalization by Planning with Self-Supervised World Models
https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2106.10316%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2106.04615%2523deepmind.html
Podracer architectures for scalable Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html
MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06294%2523deepmind.html
The Value Equivalence Principle for Model-Based Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2011.03506%2523deepmind.html
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Continuous Control for Searching and Planning with a Learned Model
https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html
TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning