Bibliography (5):

  1. Language Models are Unsupervised Multitask Learners

  2. Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior

  3. Learning to summarize from human feedback