Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
When to Trust Your Model: Model-Based Policy Optimization (MOPO)
Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model
TD3: Addressing Function Approximation Error in Actor-Critic Methods
Wikipedia Bibliography: