Bibliography (4):

  1. A Theory for Emergence of Complex Skills in Language Models

  2. The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

  3. ‘MLP NN’ directory

  4. Wikipedia Bibliography:

    1. Cross-entropy