- See Also
-
Links
- “Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, Et Al 2022
- “Large-Scale Retrieval for Reinforcement Learning”, Et Al 2022
- “HTPS: HyperTree Proof Search for Neural Theorem Proving”, Et Al 2022
- “CrossBeam: Learning to Search in Bottom-Up Program Synthesis”, Et Al 2022
- “Policy Improvement by Planning With Gumbel”, Et Al 2022
- “Player of Games”, Et Al 2021
- “Ν-SDDP: Neural Stochastic Dual Dynamic Programming”, Et Al 2021
- “Acquisition of Chess Knowledge in AlphaZero”, Et Al 2021
- “Evaluating Model-based Planning and Planner Amortization for Continuous Control”, Et Al 2021
- “Scalable Online Planning via Reinforcement Learning Fine-Tuning”, Et Al 2021
- “Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control”, 2021
- “How Does AI Improve Human Decision-Making? Evidence from the AI-Powered Go Program”, Et Al 2021
- “Train on Small, Play the Large: Scaling Up Board Games With AlphaZero and GNN”, Ben-Assayag & El-2021
- “Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments”, Et Al 2021
- “Scaling Scaling Laws With Board Games”, 2021
- “OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune”, 2021
- “Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants”, Et Al 2021
- “Investment vs. Reward in a Competitive Knapsack Problem”, 2021
- “Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search”, Et Al 2020
- “Assessing Game Balance With AlphaZero: Exploring Alternative Rule Sets in Chess”, Et Al 2020
- “Learning Personalized Models of Human Behavior in Chess”, McIlroy-Et Al 2020
- “ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games”, Et Al 2020
- “Learning Compositional Neural Programs for Continuous Control”, Et Al 2020
- “Monte-Carlo Tree Search As Regularized Policy Optimization”, Et Al 2020
- “Tackling Morpion Solitaire With AlphaZero-like Ranked Reward Reinforcement Learning”, Et Al 2020
- “Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy-Et Al 2020
- “Neural Machine Translation With Monte-Carlo Tree Search”, 2020
- “Approximate Exploitability: Learning a Best Response in Large Games”, Et Al 2020
- “Real World Games Look Like Spinning Tops”, Et Al 2020
- “Accelerating and Improving AlphaZero Using Population Based Training”, Et Al 2020
- “Self-Play Learning Without a Reward Metric”, Et Al 2019
- “(Yonhap Interview) Go Master Lee Says He Quits Unable to Win over AI Go Players”, 2019
- “MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Et Al 2019
- “Multiplayer AlphaZero”, 2019
- “Global Optimization of Quantum Dynamics With AlphaZero Deep Exploration”, Et Al 2019
- “Learning Compositional Neural Programs With Recursive Tree Search and Planning”, Et Al 2019
- “Π-IW: Deep Policies for Width-Based Planning in Pixel Domains”, Et Al 2019
- “Policy Gradient Search: Online Planning and Expert Iteration without Search Trees”, Et Al 2019
- “AlphaX: EXploring Neural Architectures With Deep Neural Networks and Monte Carlo Tree Search”, Et Al 2019
- “Minigo: A Case Study in Reproducing Reinforcement Learning Research”, 2019
- “Α-Rank: Multi-Agent Evaluation by Evolution”, Et Al 2019
- “Accelerating Self-Play Learning in Go”, 2019
- “ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero”, Et Al 2019
- “Bayesian Optimization in AlphaGo”, Et Al 2018
- “A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-play”, Et Al 2018
- “Deep Reinforcement Learning”, 2018
- “AlphaSeq: Sequence Discovery With Deep Reinforcement Learning”, Et Al 2018
- “ExIt-OOS: Towards Learning from Planning in Imperfect Information Games”, 2018
- “Has Dynamic Programming Improved Decision Making?”, 2018
- “Improving Width-based Planning With Compact Policies”, Et Al 2018
- “Surprising Negative Results for Generative Adversarial Tree Search”, Et Al 2018
- “Dual Policy Iteration”, Et Al 2018
- “Solving the Rubik’s Cube Without Human Knowledge”, Et Al 2018
- “Feedback-Based Tree Search for Reinforcement Learning”, Et Al 2018
- “A Tree Search Algorithm for Sequence Labeling”, Et Al 2018
- “Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations”, 2018
- “Sim-to-Real Optimization of Complex Real World Mobile Network With Imperfect Information via Deep Reinforcement Learning from Self-play”, Et Al 2018
- “Learning to Search With MCTSnets”, Et Al 2018
- “M-Walk: Learning to Walk over Graphs Using Monte Carlo Tree Search”, Et Al 2018
- “Mastering Chess and Shogi by Self-Play With a General Reinforcement Learning Algorithm”, Et Al 2017
- “AlphaGo Zero: Mastering the Game of Go without Human Knowledge”, Et Al 2017
- “DeepMind’s Latest AI Breakthrough Is Its Most Important Yet: Google-owned DeepMind’s Go-playing Artificial Intelligence Can Now Learn without Human Help… or Data”, 2017
- “Self-taught AI Is Best yet at Strategy Game Go”, 2017
- “Learning Generalized Reactive Policies Using Deep Neural Networks”, Et Al 2017
- “Learning to Plan Chemical Syntheses”, Et Al 2017
- “Thinking Fast and Slow With Deep Learning and Tree Search”, Et Al 2017
- “DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker”, Et Al 2017
- “Mastering the Game of Go With Deep Neural Networks and Tree Search”, Et Al 2016
- “Giraffe: Using Deep Reinforcement Learning to Play Chess”, 2015
- “Algorithmic Progress in Six Domains”, 2013
- “Reinforcement Learning As Classification: Leveraging Modern Classifiers”, 2003
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, Et Al 2022
“Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, 2022-11-07 ( ; similar; bibliography)
“Large-Scale Retrieval for Reinforcement Learning”, Et Al 2022
“Large-Scale Retrieval for Reinforcement Learning”, 2022-06-10 ( ; similar; bibliography)
“HTPS: HyperTree Proof Search for Neural Theorem Proving”, Et Al 2022
“HTPS: HyperTree Proof Search for Neural Theorem Proving”, 2022-05-23 ( ; similar)
“CrossBeam: Learning to Search in Bottom-Up Program Synthesis”, Et Al 2022
“CrossBeam: Learning to Search in Bottom-Up Program Synthesis”, 2022-03-20 (similar)
“Policy Improvement by Planning With Gumbel”, Et Al 2022
“Policy improvement by planning with Gumbel”, 2022-03-04 ( ; similar; bibliography)
“Player of Games”, Et Al 2021
“Player of Games”, 2021-12-06 ( ; similar; bibliography)
“Ν-SDDP: Neural Stochastic Dual Dynamic Programming”, Et Al 2021
“ν-SDDP: Neural Stochastic Dual Dynamic Programming”, 2021-12-01 ( ; similar)
“Acquisition of Chess Knowledge in AlphaZero”, Et Al 2021
“Acquisition of Chess Knowledge in AlphaZero”, 2021-11-17 ( ; similar; bibliography)
“Evaluating Model-based Planning and Planner Amortization for Continuous Control”, Et Al 2021
“Evaluating model-based planning and planner amortization for continuous control”, 2021-10-07 (similar)
“Scalable Online Planning via Reinforcement Learning Fine-Tuning”, Et Al 2021
“Scalable Online Planning via Reinforcement Learning Fine-Tuning”, 2021-09-30 ( ; similar)
“Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control”, 2021
“Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control”, 2021-08-20 (similar)
“How Does AI Improve Human Decision-Making? Evidence from the AI-Powered Go Program”, Et Al 2021
“How Does AI Improve Human Decision-Making? Evidence from the AI-Powered Go Program”, 2021-07-26 ( ; backlinks; similar)
“Train on Small, Play the Large: Scaling Up Board Games With AlphaZero and GNN”, Ben-Assayag & El-2021
“Train on Small, Play the Large: Scaling Up Board Games with AlphaZero and GNN”, 2021-07-18 (similar)
“Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments”, Et Al 2021
“Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments”, 2021-04-20 ( ; similar)
“Scaling Scaling Laws With Board Games”, 2021
“Scaling Scaling Laws with Board Games”, 2021-04-07 ( ; backlinks; similar)
“OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune”, 2021
“OLIVAW: Mastering Othello without Human Knowledge, nor a Fortune”, 2021-03-31 (similar)
“Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants”, Et Al 2021
“Transfer of Fully Convolutional Policy-Value Networks Between Games and Game Variants”, 2021-02-24 ( ; similar)
“Investment vs. Reward in a Competitive Knapsack Problem”, 2021
“Investment vs. reward in a competitive knapsack problem”, 2021-01-26 ( ; similar)
“Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search”, Et Al 2020
“Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search”, 2020-12-14 (similar)
“Assessing Game Balance With AlphaZero: Exploring Alternative Rule Sets in Chess”, Et Al 2020
“Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess”, 2020-09-09 ( ; similar)
“Learning Personalized Models of Human Behavior in Chess”, McIlroy-Et Al 2020
“Learning Personalized Models of Human Behavior in Chess”, 2020-08-23 ( ; similar)
“ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games”, Et Al 2020
“ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games”, 2020-07-27 ( ; similar)
“Learning Compositional Neural Programs for Continuous Control”, Et Al 2020
“Learning Compositional Neural Programs for Continuous Control”, 2020-07-27 (similar)
“Monte-Carlo Tree Search As Regularized Policy Optimization”, Et Al 2020
“Monte-Carlo Tree Search as Regularized Policy Optimization”, 2020-07-24 ( ; similar)
“Tackling Morpion Solitaire With AlphaZero-like Ranked Reward Reinforcement Learning”, Et Al 2020
“Tackling Morpion Solitaire with AlphaZero-like Ranked Reward Reinforcement Learning”, 2020-06-14 (similar)
“Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy-Et Al 2020
“Aligning Superhuman AI with Human Behavior: Chess as a Model System”, 2020-06-02 ( ; similar)
“Neural Machine Translation With Monte-Carlo Tree Search”, 2020
“Neural Machine Translation with Monte-Carlo Tree Search”, 2020-04-27 (similar)
“Approximate Exploitability: Learning a Best Response in Large Games”, Et Al 2020
“Approximate exploitability: Learning a best response in large games”, 2020-04-20 ( ; similar)
“Real World Games Look Like Spinning Tops”, Et Al 2020
“Real World Games Look Like Spinning Tops”, 2020-04-20 ( ; similar)
“Accelerating and Improving AlphaZero Using Population Based Training”, Et Al 2020
“Accelerating and Improving AlphaZero Using Population Based Training”, 2020-03-13 ( ; similar)
“Self-Play Learning Without a Reward Metric”, Et Al 2019
“Self-Play Learning Without a Reward Metric”, 2019-12-16 ( ; backlinks; similar)
“(Yonhap Interview) Go Master Lee Says He Quits Unable to Win over AI Go Players”, 2019
“(Yonhap Interview) Go master Lee says he quits unable to win over AI Go players”, 2019-11-27 (similar)
“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Et Al 2019
“MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model”, 2019-11-19 ( ; similar)
“Multiplayer AlphaZero”, 2019
“Multiplayer AlphaZero”, 2019-10-29 (similar)
“Global Optimization of Quantum Dynamics With AlphaZero Deep Exploration”, Et Al 2019
“Global optimization of quantum dynamics with AlphaZero deep exploration”, 2019-07-12 (similar)
“Learning Compositional Neural Programs With Recursive Tree Search and Planning”, Et Al 2019
“Learning Compositional Neural Programs with Recursive Tree Search and Planning”, 2019-05-30 (similar)
“Π-IW: Deep Policies for Width-Based Planning in Pixel Domains”, Et Al 2019
“π-IW: Deep Policies for Width-Based Planning in Pixel Domains”, 2019-04-12 ( ; similar)
“Policy Gradient Search: Online Planning and Expert Iteration without Search Trees”, Et Al 2019
“Policy Gradient Search: Online Planning and Expert Iteration without Search Trees”, 2019-04-07 (similar)
“AlphaX: EXploring Neural Architectures With Deep Neural Networks and Monte Carlo Tree Search”, Et Al 2019
“AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search”, 2019-03-26 ( ; similar)
“Minigo: A Case Study in Reproducing Reinforcement Learning Research”, 2019
“Minigo: A Case Study in Reproducing Reinforcement Learning Research”, 2019-03-06 (similar)
“Α-Rank: Multi-Agent Evaluation by Evolution”, Et Al 2019
“α-Rank: Multi-Agent Evaluation by Evolution”, 2019-03-04 ( ; similar)
“Accelerating Self-Play Learning in Go”, 2019
“Accelerating Self-Play Learning in Go”, 2019-02-27 ( ; similar)
“ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero”, Et Al 2019
“ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero”, 2019-02-12 (similar)
“Bayesian Optimization in AlphaGo”, Et Al 2018
“Bayesian Optimization in AlphaGo”, 2018-12-17 (similar)
“A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go through Self-play”, Et Al 2018
“A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play”, 2018-12-07 ( ; similar)
“Deep Reinforcement Learning”, 2018
“Deep Reinforcement Learning”, 2018-10-15 ( ; similar)
“AlphaSeq: Sequence Discovery With Deep Reinforcement Learning”, Et Al 2018
“AlphaSeq: Sequence Discovery with Deep Reinforcement Learning”, 2018-09-26 (similar)
“ExIt-OOS: Towards Learning from Planning in Imperfect Information Games”, 2018
“ExIt-OOS: Towards Learning from Planning in Imperfect Information Games”, 2018-08-30 ( ; similar)
“Has Dynamic Programming Improved Decision Making?”, 2018
“Has dynamic programming improved decision making?”, 2018-08-22 (backlinks; similar)
“Improving Width-based Planning With Compact Policies”, Et Al 2018
“Improving width-based planning with compact policies”, 2018-06-15 ( ; similar)
“Surprising Negative Results for Generative Adversarial Tree Search”, Et Al 2018
“Surprising Negative Results for Generative Adversarial Tree Search”, 2018-06-15 ( ; similar)
“Dual Policy Iteration”, Et Al 2018
“Dual Policy Iteration”, 2018-05-28 (similar)
“Solving the Rubik’s Cube Without Human Knowledge”, Et Al 2018
“Solving the Rubik’s Cube Without Human Knowledge”, 2018-05-18 (similar)
“Feedback-Based Tree Search for Reinforcement Learning”, Et Al 2018
“Feedback-Based Tree Search for Reinforcement Learning”, 2018-05-15 (similar)
“A Tree Search Algorithm for Sequence Labeling”, Et Al 2018
“A Tree Search Algorithm for Sequence Labeling”, 2018-04-29 ( ; similar)
“Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations”, 2018
“Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations”, 2018-04-12 ( ; backlinks; similar)
“Sim-to-Real Optimization of Complex Real World Mobile Network With Imperfect Information via Deep Reinforcement Learning from Self-play”, Et Al 2018
“Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play”, 2018-02-18 ( ; similar)
“Learning to Search With MCTSnets”, Et Al 2018
“Learning to Search with MCTSnets”, 2018-02-13 ( ; similar)
“M-Walk: Learning to Walk over Graphs Using Monte Carlo Tree Search”, Et Al 2018
“M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search”, 2018-02-12 ( ; similar)
“Mastering Chess and Shogi by Self-Play With a General Reinforcement Learning Algorithm”, Et Al 2017
“Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”, 2017-12-05 ( ; similar)
“AlphaGo Zero: Mastering the Game of Go without Human Knowledge”, Et Al 2017
“AlphaGo Zero: Mastering the game of Go without human knowledge”, 2017-10-19 ( ; similar; bibliography)
“DeepMind’s Latest AI Breakthrough Is Its Most Important Yet: Google-owned DeepMind’s Go-playing Artificial Intelligence Can Now Learn without Human Help… or Data”, 2017
“DeepMind’s latest AI breakthrough is its most important yet: Google-owned DeepMind’s Go-playing artificial intelligence can now learn without human help… or data”, 2017-10-18 (similar)
“Self-taught AI Is Best yet at Strategy Game Go”, 2017
“Self-taught AI is best yet at strategy game Go”, 2017-10-18
“Learning Generalized Reactive Policies Using Deep Neural Networks”, Et Al 2017
“Learning Generalized Reactive Policies using Deep Neural Networks”, 2017-08-24 (similar)
“Learning to Plan Chemical Syntheses”, Et Al 2017
“Learning to Plan Chemical Syntheses”, 2017-08-14 ( ; similar)
“Thinking Fast and Slow With Deep Learning and Tree Search”, Et Al 2017
“Thinking Fast and Slow with Deep Learning and Tree Search”, 2017-05-23 (similar)
“DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker”, Et Al 2017
“DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker”, 2017-01-06 ( ; backlinks; similar)
“Mastering the Game of Go With Deep Neural Networks and Tree Search”, Et Al 2016
“Mastering the game of Go with deep neural networks and tree search”, 2016-01-28 ( ; similar)
“Giraffe: Using Deep Reinforcement Learning to Play Chess”, 2015
“Giraffe: Using Deep Reinforcement Learning to Play Chess”, 2015-09-04 ( ; similar)
“Algorithmic Progress in Six Domains”, 2013
“Algorithmic Progress in Six Domains”, 2013-12-09 ( ; similar)
“Reinforcement Learning As Classification: Leveraging Modern Classifiers”, 2003
Wikipedia
Miscellaneous
-
https://blog.janestreet.com/deep-learning-the-hardest-go-problem-in-the-world/
-
https://cacm.acm.org/magazines/2021/9/255049-playing-with-and-against-computers/fulltext
-
https://en.chessbase.com/post/acquisition-of-chess-knowledge-in-alphazero
-
https://en.chessbase.com/post/the-future-is-here-alphazero-learns-chess
-
https://hackernoon.com/the-3-tricks-that-made-alphago-zero-work-f3d47b6686ef
-
https://proceedings.neurips.cc/paper/2014/file/8bb88f80d334b1869781beb89f7b73be-Paper.pdf
-
https://www.deepmind.com/blog/alphazero-shedding-new-light-grand-games-chess-shogi-and-go/
Link Bibliography
-
https://arxiv.org/abs/2211.03769
: “Are AlphaZero-like Agents Robust to Adversarial Perturbations?”, Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, Cho-Jui Hsieh: -
https://arxiv.org/abs/2206.05314#deepmind
: “Large-Scale Retrieval for Reinforcement Learning”, Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre, Théophane Weber, Timothy Lillicrap: -
https://openreview.net/forum?id=bERaNdoegnO#deepmind
: “Policy Improvement by Planning With Gumbel”, Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver: -
https://arxiv.org/abs/2112.03178#deepmind
: “Player of Games”, : -
https://arxiv.org/abs/2111.09259#deepmind
: “Acquisition of Chess Knowledge in AlphaZero”, : -
2017-silver.pdf#deepmind
: “AlphaGo Zero: Mastering the Game of Go without Human Knowledge”, :