‘MuZero’ directory

Gwern

‘MuZero’ directory

Links

“AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning ”, Mathieu et al 2023

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning⁠

“Job Hunt As a PhD in RL: How It Actually Happens § Reinforcement Learning Reflections ”, Lambert 2022

Job Hunt as a PhD in RL: How it Actually Happens § Reinforcement learning reflections

“Large-Scale Retrieval for Reinforcement Learning ”, Humphreys et al 2022

Large-Scale Retrieval for Reinforcement Learning⁠

“Boosting Search Engines With Interactive Agents ”, Ciaramita et al 2022

Boosting Search Engines with Interactive Agents⁠

“Stochastic MuZero: Planning in Stochastic Environments With a Learned Model ”, Antonoglou et al 2022

Stochastic MuZero: Planning in Stochastic Environments with a Learned Model⁠

“Policy Improvement by Planning With Gumbel ”, Danihelka et al 2022

Policy improvement by planning with Gumbel⁠

“MuZero With Self-Competition for Rate Control in VP9 Video Compression ”, Mandhane et al 2022

MuZero with Self-competition for Rate Control in VP9 Video Compression⁠

“Procedural Generalization by Planning With Self-Supervised World Models ”, Anand et al 2021

Procedural Generalization by Planning with Self-Supervised World Models⁠

“Mastering Atari Games With Limited Data ”, Ye et al 2021

Mastering Atari Games with Limited Data⁠

“Proper Value Equivalence ”, Grimm et al 2021

Proper Value Equivalence⁠

“Vector Quantized Models for Planning ”, Ozair et al 2021

Vector Quantized Models for Planning⁠

“Muesli: Combining Improvements in Policy Optimization ”, Hessel et al 2021

Muesli: Combining Improvements in Policy Optimization⁠

“Podracer Architectures for Scalable Reinforcement Learning ”, Hessel et al 2021

Podracer architectures for scalable Reinforcement Learning⁠

“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model ”, Schrittwieser et al 2021

MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model⁠

“Learning and Planning in Complex Action Spaces ”, Hubert et al 2021

Learning and Planning in Complex Action Spaces⁠

“Playing Nondeterministic Games through Planning With a Learned Model ”, Willkens & Pollack 2021

Playing Nondeterministic Games through Planning with a Learned Model⁠

“Visualizing MuZero Models ”, Vries et al 2021

Visualizing MuZero Models⁠

“Combining Off and On-Policy Training in Model-Based Reinforcement Learning ”, Borges & Oliveira 2021

Combining Off and On-Policy Training in Model-Based Reinforcement Learning⁠

“Improving Model-Based Reinforcement Learning With Internal State Representations through Self-Supervision ”, Scholz et al 2021

Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision⁠

“On the Role of Planning in Model-Based Deep Reinforcement Learning ”, Hamrick et al 2020

On the role of planning in model-based deep reinforcement learning⁠

“The Value Equivalence Principle for Model-Based Reinforcement Learning ”, Grimm et al 2020

The Value Equivalence Principle for Model-Based Reinforcement Learning⁠

“Measuring Progress in Deep Reinforcement Learning Sample Efficiency ”, Anonymous 2020

Measuring Progress in Deep Reinforcement Learning Sample Efficiency⁠

“Monte-Carlo Tree Search As Regularized Policy Optimization ”, Grill et al 2020

Monte-Carlo Tree Search as Regularized Policy Optimization⁠

“Continuous Control for Searching and Planning With a Learned Model ”, Yang et al 2020

Continuous Control for Searching and Planning with a Learned Model⁠

“Agent57: Outperforming the Human Atari Benchmark ”, Puigdomènech et al 2020

Agent57: Outperforming the human Atari benchmark⁠

“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model ”, Schrittwieser et al 2019

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model⁠

“Surprising Negative Results for Generative Adversarial Tree Search ”, Azizzadenesheli et al 2018

Surprising Negative Results for Generative Adversarial Tree Search⁠

“TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning ”, Farquhar et al 2017

TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning⁠

“Monte Carlo Tree Search in JAX ”

Monte Carlo tree search in JAX⁠

“A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models. ”

A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.⁠

“MuZero ”

MuZero⁠

“Learning to Search With MCTSnets ”

⁠Learning to search with MCTSnets⁠ :

View HTML:

⁠https://proceedings.mlr.press/v80/guez18a.html⁠

“MuZero Intuition ”

⁠MuZero Intuition :

View HTML:

⁠/doc/www/www.furidamu.org/b6c1341b8703a5b0a1e44ede5f00d4e5a0b02354.html⁠

“Remaking EfficientZero (As Best I Can) ”

⁠Remaking EfficientZero (as best I can)⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/bPa6AzRgGZGmxbq6n/remaking-efficientzero-as-best-i-can⁠

“EfficientZero: How It Works ”

⁠EfficientZero: How It Works⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/mRwJce3npmzbKfxws/efficientzero-how-it-works⁠

“MuZero ”

⁠MuZero⁠ :

⁠https://www.youtube.com/watch?v=L0A86LmH7Yw#deepmind⁠

Wikipedia

Monte Carlo tree search⁠
MuZero⁠ :

https://en.wikipedia.org/wiki/MuZero⁠
Tensor processing unit⁠

Miscellaneous

Bibliography

https://arxiv.org/abs/2206.05314#deepmind: “Large-Scale Retrieval for Reinforcement Learning ”⁠, Peter C. Humphreys, Arthur Guez, Olivier Tieleman …, Laurent Sifre⁠, Théophane Weber, Timothy Lillicrap⁠
link-bibliography⁠
https://openreview.net/forum?id=0ZbPmmB61g#google: “Boosting Search Engines With Interactive Agents ”⁠, Massimiliano Ciaramita, Leonard Adolphs, Michelle Chen Huebscher …, Sascha Rothe, Christian Buck, Thomas Hofmann⁠, Yannic Kilcher⁠, Lasse Espeholt, Pier Giuseppe Sessa, Lierni Sestorain, Benjamin Börschinger
link-bibliography⁠
https://openreview.net/forum?id=bERaNdoegnO#deepmind: “Policy Improvement by Planning With Gumbel ”⁠, Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver⁠
link-bibliography⁠
https://arxiv.org/abs/2111.01587#deepmind: “Procedural Generalization by Planning With Self-Supervised World Models ”⁠, Ankesh Anand, Jacob Walker, Yazhe Li …, Eszter Vértes, Julian Schrittwieser, Sherjil Ozair, Théophane Weber, Jessica B. Hamrick
link-bibliography⁠
https://arxiv.org/abs/2111.00210: “Mastering Atari Games With Limited Data ”⁠, Weirui Ye, Shaohuai Liu, Thanard Kurutach …, Pieter Abbeel⁠, Yang Gao⁠
link-bibliography⁠
https://arxiv.org/abs/2106.10316#deepmind: “Proper Value Equivalence ”⁠, Christopher Grimm, André Barreto, Gregory Farquhar …, David Silver⁠, Satinder Singh⁠
link-bibliography⁠
https://arxiv.org/abs/2106.04615#deepmind: “Vector Quantized Models for Planning ”⁠, Sherjil Ozair, Yazhe Li, Ali Razavi …, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals⁠
link-bibliography⁠
https://arxiv.org/abs/2104.06272#deepmind: “Podracer Architectures for Scalable Reinforcement Learning ”⁠, Matteo Hessel, Manuel Kroiss, Aidan Clark …, Iurii Kemaev, John Quan⁠, Thomas Keck, Fabio Viola, Hado van Hasselt⁠
link-bibliography⁠
https://arxiv.org/abs/2104.06294#deepmind: “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model ”⁠, Julian Schrittwieser, Thomas Hubert, Amol Mandhane …, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver⁠
link-bibliography⁠
https://arxiv.org/abs/2102.12924: “Visualizing MuZero Models ”⁠, Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat
link-bibliography⁠
https://arxiv.org/abs/2011.03506#deepmind: “The Value Equivalence Principle for Model-Based Reinforcement Learning ”⁠, Christopher Grimm, André Barreto, Satinder Singh⁠, David Silver⁠
link-bibliography⁠
https://arxiv.org/abs/2102.04881: “Measuring Progress in Deep Reinforcement Learning Sample Efficiency ”⁠, Anonymous
link-bibliography⁠
https://arxiv.org/abs/2006.07430: “Continuous Control for Searching and Planning With a Learned Model ”⁠, Xuxi Yang, Werner Duvaud, Peng Wei
link-bibliography⁠
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/: “Agent57: Outperforming the Human Atari Benchmark ”⁠, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski …, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell⁠
link-bibliography⁠
https://arxiv.org/abs/1710.11417: “TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning ”⁠, Gregory Farquhar, ⁠Tim Rocktäschel, Maximilian Igl, Shimon Whiteson
link-bibliography⁠