Bibliography (4):

  1. https://www.lesswrong.com/posts/HHSuvG2hqAnGT5Wzp/no-convincing-evidence-for-gradient-descent-in-activation#Transformers_Learn_in_Context_by_Gradient_Descent__van_Oswald_et_al__2022_

  2. Attention Is All You Need

  3. In-Context Learning and Induction Heads

  4. https://github.com/google-research/self-organising-systems/tree/master/transformers_learn_icl_by_gd