Bibliography (7):
Deep Reinforcement Learning without Experience Replay, Target Networks, or Batch Updates
https://mujoco.org/
dm_control: Software and Tasks for Continuous Control
The Arcade Learning Environment: An Evaluation Platform for General Agents
Wikipedia Bibliography:
Online machine learning
Q-learning
https://en.wikipedia.org/wiki/Temporal_Difference_learning :
https://en.wikipedia.org/wiki/Temporal_Difference_learning