Bibliography (7):

  1. A Mathematical Framework for Transformer Circuits

  2. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]

  3. Explaining grokking through circuit efficiency