Bibliography:

  1. ‘model-based RL’ tag

  2. ‘NN sampling’ tag

  3. ‘AI scaling’ tag

  4. ‘RL exploration’ tag

  5. ‘MuZero’ tag

  6. Learning Formal Mathematics From Intrinsic Motivation

  7. Can Go AIs be adversarially robust?

  8. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge

  9. Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game

  10. Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)

  11. Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

  12. Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

  13. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  14. Self-play reinforcement learning guides protein engineering

  15. Evaluating Superhuman Models with Consistency Checks

  16. BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

  17. Who Will You Be After ChatGPT Takes Your Job? Generative AI is coming for white-collar roles. If your sense of worth comes from work—what’s left to hold on to?

  18. AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

  19. Solving math word problems with process & outcome-based feedback

  20. Are AlphaZero-like Agents Robust to Adversarial Perturbations?

  21. Adversarial Policies Beat Superhuman Go AIs

  22. Large-Scale Retrieval for Reinforcement Learning

  23. Newton’s method for reinforcement learning and model predictive control

  24. HTPS: HyperTree Proof Search for Neural Theorem Proving

  25. CrossBeam: Learning to Search in Bottom-Up Program Synthesis

  26. Policy improvement by planning with Gumbel

  27. Formal Mathematics Statement Curriculum Learning

  28. Player of Games

  29. ν-SDDP: Neural Stochastic Dual Dynamic Programming

  30. Acquisition of Chess Knowledge in AlphaZero

  31. Evaluating model-based planning and planner amortization for continuous control

  32. Scalable Online Planning via Reinforcement Learning Fine-Tuning

  33. Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control

  34. How Does AI Improve Human Decision-Making? Evidence from the AI-Powered Go Program

  35. Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN

  36. Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

  37. Scaling Scaling Laws with Board Games

  38. OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune

  39. Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants

  40. Investment vs. reward in a competitive knapsack problem

  41. Solving Mixed Integer Programs Using Neural Networks

  42. Monte-Carlo Graph Search for AlphaZero

  43. Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search

  44. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess

  45. Learning Personalized Models of Human Behavior in Chess

  46. Learning Compositional Neural Programs for Continuous Control

  47. ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

  48. Monte-Carlo Tree Search as Regularized Policy Optimization

  49. Tackling Morpion Solitaire with AlphaZero-like Ranked Reward Reinforcement Learning

  50. Aligning Superhuman AI with Human Behavior: Chess as a Model System

  51. Neural Machine Translation with Monte-Carlo Tree Search

  52. Real World Games Look Like Spinning Tops

  53. Approximate exploitability: Learning a best response in large games

  54. Accelerating and Improving AlphaZero Using Population Based Training

  55. Self-Play Learning Without a Reward Metric

  56. (Yonhap Interview) Go master Lee says he quits—unable to win over AI Go players

  57. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  58. Multiplayer AlphaZero

  59. Global optimization of quantum dynamics with AlphaZero deep exploration

  60. Learning Compositional Neural Programs with Recursive Tree Search and Planning

  61. π-IW: Deep Policies for Width-Based Planning in Pixel Domains

  62. Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

  63. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search

  64. Minigo: A Case Study in Reproducing Reinforcement Learning Research

  65. α-Rank: Multi-Agent Evaluation by Evolution

  66. Accelerating Self-Play Learning in Go

  67. ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

  68. Bayesian Optimization in AlphaGo

  69. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

  70. Deep Reinforcement Learning

  71. AlphaSeq: Sequence Discovery with Deep Reinforcement Learning

  72. ExIt-OOS: Towards Learning from Planning in Imperfect Information Games

  73. Has dynamic programming improved decision making?

  74. Surprising Negative Results for Generative Adversarial Tree Search

  75. Improving width-based planning with compact policies

  76. Dual Policy Iteration

  77. Solving the Rubik’s Cube Without Human Knowledge

  78. Feedback-Based Tree Search for Reinforcement Learning

  79. A Tree Search Algorithm for Sequence Labeling

  80. Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

  81. Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

  82. Learning to Search with MCTSnets

  83. M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

  84. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

  85. AlphaGo Zero: Mastering the game of Go without human knowledge

  86. Self-taught AI is best yet at strategy game Go

  87. DeepMind’s latest AI breakthrough is its most important yet: Google-owned DeepMind’s Go-playing artificial intelligence can now learn without human help… or data

  88. Learning Generalized Reactive Policies using Deep Neural Networks

  89. Learning to Plan Chemical Syntheses

  90. Thinking Fast and Slow with Deep Learning and Tree Search

  91. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

  92. Mastering the game of Go with deep neural networks and tree search

  93. Giraffe: Using Deep Reinforcement Learning to Play Chess

  94. Algorithmic Progress in Six Domains

  95. Reinforcement Learning As Classification: Leveraging Modern Classifiers

  96. Deep-Learning the Hardest Go Problem in the World

  97. 49cf0ea51ddaba1082e22b764e301f22ac1adc4f.html

  98. Learning From Scratch by Thinking Fast and Slow With Deep Learning and Tree Search

  99. c33413d773497d089931310ad7b97c287ca16a72.html

  100. Acquisition of Chess Knowledge in AlphaZero

  101. b8607948c97fe8c79e91a181aa86724277ee14d7.html

  102. Leela Chess Zero: AlphaZero for the PC

  103. 5278ec872ea3bbeb091950c499392f3ba5544232.html

  104. The Future Is Here – AlphaZero Learns Chess

  105. dc4fe83e45046ca8ee0c73b377067e1171b85c52.html

  106. Trading Off Compute in Training and Inference

  107. Trading Off Compute in Training and Inference § MCTS Scaling

  108. Beyond the Board: Exploring AI Robustness Through Go

  109. Monte Carlo Tree Search in JAX

  110. An Open-Source Implementation of the AlphaGoZero Algorithm

  111. Adversarial Policies in Go

  112. The 3 Tricks That Made AlphaGo Zero Work

  113. 6b8c8a5ae7a54a748d110150c56c8713315c8026.html

  114. AlphaGo Zero and the Foom Debate

  115. How to Build Your Own AlphaZero AI Using Python and Keras

  116. Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable

  117. design#future-tag-features

    [Transclude the forward-link's context]

  118. 2023-zahavy-figure7-scalingofchesspuzzlesolutionswithmultiplealphazeroagentsandsimulations.png

  119. 2022-humphreys-figure2-retrievalaugmentedmuzerogoagentarchitecture.jpg

  120. 2022-mcgrath-figure4-alphazerolearningofhumanchessconceptsovertraininghistory.png

  121. 2022-mcgrath-figure5-a-alphazerovshumanprofessionalopeningmoveoverhistory.png

  122. 2022-mcgrath-figure5-b-alphazeroopeningmoveovertraininghistory.png

  123. 2022-mcgrath-figure7-alphazerorapidlydiscoveriesbasicchessopenings.png

  124. 2021-choi-figure2-globalgoplayerimprovementduetoleelarelease.jpg

  125. 2021-jones-figure5-alphazerohexscalinglaws.png

  126. 2021-jones-figure6-computerfrontierbyboardsize.jpg

  127. 2017-silver-figure3b-alphagozeropredictionofhumanexpertgomovesvssuperhumanlyaccuratepredictions.png

  128. 2017-silver-figure6-performanceofalphagozerolearningcurvesandbyelocomparison.jpg

  129. http://cl-informatik.uibk.ac.at/cek/holstep/ckfccs-holstep-submitted.pdf

  130. e0d6678fd4d64cea7e57de4e2de167be8e89e13a.pdf

  131. http://www.incompleteideas.net/Talks/UBC-2016.pdf

  132. f72c9193ec0797e087c54b37c78c937f371c14e1.pdf

  133. https://ai.facebook.com/blog/open-sourcing-polygames-a-new-framework-for-training-ai-bots-through-self-play/

  134. https://cacm.acm.org/magazines/2021/9/255049-playing-with-and-against-computers/abstract

  135. https://conversationswithtyler.com/episodes/vishy-anand/

  136. https://lczero.org/blog/2024/02/how-well-do-lc0-networks-compare-to-the-greatest-transformer-network-from-deepmind/

  137. https://proceedings.neurips.cc/paper/2014/file/8bb88f80d334b1869781beb89f7b73be-Paper.pdf

  138. 6b9be76e6bb7e96e45deb8048751ae72ce31243b.pdf

  139. https://research.google/blog/leveraging-machine-learning-for-game-development/

  140. https://web.stanford.edu/~surag/posts/alphazero.html

  141. https://www.deepmind.com/blog/alphagos-next-move/

  142. https://www.deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/

  143. https://www.deepmind.com/blog/exploring-mysteries-alphago/

  144. https://www.lesswrong.com/posts/FF8i6SLfKb4g7C4EL/inside-the-mind-of-a-superhuman-go-model-how-does-leela-zero-2

  145. https://www.nature.com/articles/s41598-019-45619-9#deepmind

  146. https://www.reddit.com/r/MachineLearning/comments/76xjb5/ama_we_are_david_silver_and_julian_schrittwieser/

  147. e32584e4ff492e4a3852fa3ed2a38d2b973a5a08.html

  148. https://www.reddit.com/r/MachineLearning/comments/rdb1uw/p_utttai_alphazerolike_solution_for_playing/

  149. 1a5f349ad44de805bfc858b1438d347b061a1969.html

  150. https://www.reddit.com/r/baduk/comments/qqjw64/shin_jinseo_ai_difference_shrinking/

  151. https://x.com/LeelaChessZero/status/1757502430495859103

  152. Artificial Intelligence for Retrosynthetic Planning Needs Both Data and Expert Knowledge

  153. %252Fdoc%252Freinforcement-learning%252Fmodel%252Falphago%252F2024-striethkalthoff.pdf.html

  154. Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game

  155. https%253A%252F%252Fwww.bloomberg.com%252Fnews%252Farticles%252F2024-03-12%252Fcognition-ai-is-a-peter-thiel-backed-coding-assistant.html

  156. Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

  157. https%253A%252F%252Farxiv.org%252Fabs%252F2310.16410%2523deepmind.html

  158. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  159. https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html

  160. Who Will You Be After ChatGPT Takes Your Job? Generative AI is coming for white-collar roles. If your sense of worth comes from work—what’s left to hold on to?

  161. https%253A%252F%252Fwww.wired.com%252Fstory%252Fstatus-work-generative-artificial-intelligence%252F.html

  162. Are AlphaZero-like Agents Robust to Adversarial Perturbations?

  163. https%253A%252F%252Farxiv.org%252Fabs%252F2211.03769.html

  164. Adversarial Policies Beat Superhuman Go AIs

  165. Sergey Levine

  166. https%253A%252F%252Farxiv.org%252Fabs%252F2211.00241.html

  167. Large-Scale Retrieval for Reinforcement Learning

  168. https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html

  169. HTPS: HyperTree Proof Search for Neural Theorem Proving

  170. https%253A%252F%252Farxiv.org%252Fabs%252F2205.11491%2523facebook.html

  171. Policy improvement by planning with Gumbel

  172. Julian Schrittwieser

  173. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DbERaNdoegnO%2523deepmind.html

  174. Formal Mathematics Statement Curriculum Learning

  175. https%253A%252F%252Farxiv.org%252Fabs%252F2202.01344%2523openai.html

  176. Player of Games

  177. https%253A%252F%252Farxiv.org%252Fabs%252F2112.03178%2523deepmind.html

  178. Acquisition of Chess Knowledge in AlphaZero

  179. https%253A%252F%252Farxiv.org%252Fabs%252F2111.09259%2523deepmind.html

  180. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess

  181. https%253A%252F%252Farxiv.org%252Fabs%252F2009.04374%2523deepmind.html

  182. Accelerating and Improving AlphaZero Using Population Based Training

  183. https%253A%252F%252Farxiv.org%252Fabs%252F2003.06212.html

  184. Improving width-based planning with compact policies

  185. https%253A%252F%252Farxiv.org%252Fabs%252F1806.05898.html

  186. AlphaGo Zero: Mastering the game of Go without human knowledge

  187. Julian Schrittwieser

  188. Karen Simonyan

  189. Lucas Baker (B.S. ’11)

  190. %252Fdoc%252Freinforcement-learning%252Fmodel%252Falphago%252F2017-silver.pdf%2523deepmind.html