Bibliography (7):

  1. Deep reinforcement learning from human preferences

  2. https://openai.com/blog/our-approach-to-alignment-research

  3. https://openai.com/research/critiques

  4. https://deepmind.google/discover/blog/red-teaming-language-models-with-language-models/

  5. https://openai.com/research/language-models-can-explain-neurons-in-language-models

  6. Wikipedia Bibliography:

    1. Ilya Sutskever

    2. Reinforcement learning