“Preventing Language Models from Hiding Their Reasoning” (inner monologue (AI), steganography, AI safety; backlinks)
View HTML:
Preventing Language Models from Hiding Their Reasoning