Bibliography (17):

tank#alternative-examples

[Transclude the forward-link's context]
Proximal Policy Optimization Algorithms
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
https://github.com/aypan17/reward-misspecification
https://arxiv.org/pdf/2201.03544.pdf#page=6
Unsolved Problems in ML Safety
Flow: A Modular Learning Framework for Mixed Autonomy Traffic
Reinforcement Learning for Optimization of COVID-19 Mitigation policies
Deep Reinforcement Learning for Closed-Loop Blood Glucose Control
Openai/gym: A Toolkit for Developing and Comparing Reinforcement Learning Algorithms.
https://arxiv.org/pdf/2201.03544.pdf#page=3
https://arxiv.org/pdf/2201.03544.pdf#page=4
https://arxiv.org/pdf/2201.03544.pdf#page=2
https://ai100.stanford.edu/gathering-strength-gathering-storms-one-hundred-year-study-artificial-intelligence-ai100-2021-study
Wikipedia Bibliography:
1. https://en.wikipedia.org/wiki/Proxy_(statistics) :
  
  https://en.wikipedia.org/wiki/Proxy_(statistics)
2. https://en.wikipedia.org/wiki/River_Raid :
  
  https://en.wikipedia.org/wiki/River_Raid