Bibliography (4):

  1. https://x.com/emmons_scott/status/1762886003046629586

  2. When Your AIs Deceive You: Challenges With Partial Observability in RLHF

  3. Language Models Learn to Mislead Humans via RLHF

  4. Wikipedia Bibliography:

    1. Reinforcement learning