Bibliography (7):

https://x.com/GhugareRaj/status/1572228478115934209
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
When to Trust Your Model: Model-Based Policy Optimization (MOPO)
Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model
TD3: Addressing Function Approximation Error in Actor-Critic Methods
Wikipedia Bibliography:
1. Reinforcement learning
2. Latent and observable variables