โreversal curseโ โ fine-tuning on โA is Bโ does not at all instill โB is A.โ
fantastic intuition builder for what SFT actually does. tuning doesnโt normally make a model conversant in new facts beyond their recitation โ SFT isnโt โknow this,โ itโs โbe this.โ
Does a language model trained on โA is Bโ generalize to โB is Aโ?
E.g. When trained only on โGeorge Washington was the first US presidentโ, can models automatically answer โWho was the first US president?โ
Our new paper shows they cannot!