https://x.com/aviral_kumar2/status/1887764754539614648
Scaling laws for single-agent reinforcement learning
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
Parallel Q-Learning (PQL): Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation