Bibliography (5):

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
The Value Equivalence Principle for Model-Based Reinforcement Learning
A Clean Implementation of MuZero and AlphaZero following the AlphaZero General Framework. Train and Pit Both Algorithms against Each Other, and Investigate Reliability of Learned MuZero MDP Models.
Wikipedia Bibliography:
1. Reinforcement learning
2. Latent and observable variables