Bibliography (4):
A Theory for Emergence of Complex Skills in Language Models
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
‘MLP NN’ directory
Wikipedia Bibliography:
Cross-entropy