Bibliography (3):

https://openai.com/index/gpt-4-research/
Proximal Policy Optimization Algorithms
https://github.com/OpenBMB/UltraFeedback