Bibliography:

  1. ‘model-based RL’ tag

  2. ‘hidden-information game’ tag

  3. ‘offline RL’ tag

  4. AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

  5. Job Hunt as a PhD in RL: How it Actually Happens § Reinforcement learning reflections

  6. Large-Scale Retrieval for Reinforcement Learning

  7. Boosting Search Engines with Interactive Agents

  8. Stochastic MuZero: Planning in Stochastic Environments with a Learned Model

  9. Policy improvement by planning with Gumbel

  10. MuZero with Self-competition for Rate Control in VP9 Video Compression

  11. Procedural Generalization by Planning with Self-Supervised World Models

  12. Mastering Atari Games with Limited Data

  13. Proper Value Equivalence

  14. Vector Quantized Models for Planning

  15. Muesli: Combining Improvements in Policy Optimization

  16. Podracer architectures for scalable Reinforcement Learning

  17. MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model

  18. Learning and Planning in Complex Action Spaces

  19. Scaling Scaling Laws with Board Games

  20. Playing Nondeterministic Games through Planning with a Learned Model

  21. Visualizing MuZero Models

  22. Combining Off and On-Policy Training in Model-Based Reinforcement Learning

  23. Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision

  24. On the role of planning in model-based deep reinforcement learning

  25. The Value Equivalence Principle for Model-Based Reinforcement Learning

  26. Measuring Progress in Deep Reinforcement Learning Sample Efficiency

  27. Monte-Carlo Tree Search as Regularized Policy Optimization

  28. Continuous Control for Searching and Planning with a Learned Model

  29. Agent57: Outperforming the human Atari benchmark

  30. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  31. Surprising Negative Results for Generative Adversarial Tree Search

  32. TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

  33. Monte Carlo Tree Search in JAX

  34. A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models.

  35. MuZero

  36. Learning to Search With MCTSnets

  37. MuZero Intuition

  38. b6c1341b8703a5b0a1e44ede5f00d4e5a0b02354.html

  39. Remaking EfficientZero (as Best I Can)

  40. EfficientZero: How It Works

  41. MuZero

  42. 2021-schrittwieser-figure1-mspacmanmuzerologrewardscaling.jpg

  43. 2020-anonymous-drlsampleefficiency-figure1-alescoresandsamplesovertime.png

  44. 2020-anonymous-drlsampleefficiency-figure2-dqnlevelsampleefficiencyovertime.jpg

  45. https://github.com/opendilab/LightZero

  46. https://www.reddit.com/r/reinforcementlearning/comments/zqxc12/muzero_learns_to_play_teamfight_tactics/

  47. d13d0162737154474d60b1ba4fb6d02fa80daf7a.html

  48. https://x.com/polynoamial/status/1676971503261454340

  49. Large-Scale Retrieval for Reinforcement Learning

  50. https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html

  51. Boosting Search Engines with Interactive Agents

  52. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html

  53. Policy improvement by planning with Gumbel

  54. Julian Schrittwieser

  55. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DbERaNdoegnO%2523deepmind.html

  56. Procedural Generalization by Planning with Self-Supervised World Models

  57. Julian Schrittwieser

  58. Sherjil Ozair

  59. https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html

  60. Mastering Atari Games with Limited Data

  61. https%253A%252F%252Farxiv.org%252Fabs%252F2111.00210.html

  62. Proper Value Equivalence

  63. https%253A%252F%252Farxiv.org%252Fabs%252F2106.10316%2523deepmind.html

  64. Vector Quantized Models for Planning

  65. Sherjil Ozair

  66. https%253A%252F%252Farxiv.org%252Fabs%252F2106.04615%2523deepmind.html

  67. Podracer architectures for scalable Reinforcement Learning

  68. https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html

  69. MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model

  70. Julian Schrittwieser

  71. https%253A%252F%252Farxiv.org%252Fabs%252F2104.06294%2523deepmind.html

  72. Visualizing MuZero Models

  73. https%253A%252F%252Farxiv.org%252Fabs%252F2102.12924.html

  74. The Value Equivalence Principle for Model-Based Reinforcement Learning

  75. https%253A%252F%252Farxiv.org%252Fabs%252F2011.03506%2523deepmind.html

  76. Measuring Progress in Deep Reinforcement Learning Sample Efficiency

  77. https%253A%252F%252Farxiv.org%252Fabs%252F2102.04881.html

  78. Continuous Control for Searching and Planning with a Learned Model

  79. https%253A%252F%252Farxiv.org%252Fabs%252F2006.07430.html

  80. Agent57: Outperforming the human Atari benchmark

  81. https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html

  82. TreeQN & ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

  83. https%253A%252F%252Farxiv.org%252Fabs%252F1710.11417.html