https://huggingface.co/blog/rlhf (GPT, preference learning, AI safety)
https://huggingface.co/blog/rlhf