“Reward Hacking Behavior Can Generalize across Tasks” (GPT-3 nonfiction, Decision Transformer, AI safety)
View HTML:
Reward Hacking Behavior Can Generalize across Tasks