-
Deep reinforcement learning from human preferences
-
https://openai.com/blog/our-approach-to-alignment-research
-
https://openai.com/research/critiques
-
https://deepmind.google/discover/blog/red-teaming-language-models-with-language-models/
-
https://openai.com/research/language-models-can-explain-neurons-in-language-models
-