Bibliography (10):

  1. WebGPT: Browser-assisted question-answering with human feedback

  2. https://x.com/dust4ai/status/1587104029712203778

  3. GPT-3: Language Models are Few-Shot Learners

  4. Learning from Human Preferences

  5. Fine-Tuning GPT-2 from Human Preferences

  6. https://openai.com/research/learning-to-summarize-with-human-feedback

  7. https://openai.com/research/summarizing-books

  8. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  9. https://openai.com/research/debate

  10. Wikipedia Bibliography:

    1. Reinforcement learning