Bibliography (4):
https://x.com/emmons_scott/status/1762886003046629586
When Your AIs Deceive You: Challenges With Partial Observability in RLHF
Language Models Learn to Mislead Humans via RLHF
Wikipedia Bibliography:
Reinforcement learning