Bibliography (7):
https://www.lesswrong.com/posts/eoHbneGvqDu25Hasc/rl-with-kl-penalties-is-better-seen-as-bayesian-inference
GPT-3: Language Models are Few-Shot Learners
Deep reinforcement learning from human preferences
Wikipedia Bibliography:
Expected value
Kullback-Leibler divergence
Variational Bayesian methods
Bayesian statistics