Silicon Dragon · Feb 24, 2023 · 11:44 AM UTC

Silicon Dragon · Feb 24, 2023 · 11:44 AM UTC

Silicon Dragon

Silicon Dragon @sdrinf

24 Feb 2023

Lesser known #ChatGPT tricks: ask it to assign truthiness floats to responses to bias the model for metacognition. See below for with & without

Feb 24, 2023 · 11:44 AM UTC

Silicon Dragon · Feb 24, 2023 · 11:46 AM UTC

Silicon Dragon @sdrinf

24 Feb 2023

Strongly suspect the model can interally reason about truthiness of answers, and was just not rewarded for truth during RLHF, in the name of maintaining noble lies society tells to itself.