Bibliography (3):

  1. Discovering Language Model Behaviors with Model-Written Evaluations

  2. PaLM: Scaling Language Modeling with Pathways

  3. https://github.com/google/sycophancy-intervention