Bibliography (5):

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Wikipedia Bibliography:
1. Reinforcement learning
2. Gumbel distribution