Bibliography (5):
Language Models are Unsupervised Multitask Learners
Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior
Learning to summarize from human feedback
Wikipedia Bibliography:
Convolutional neural network
Reinforcement learning