Grokking phase transitions in learning local rules with gradient descent
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
The large learning rate phase of deep learning: the catapult mechanism
Understanding the Role of Training Regimes in Continual Learning
Qualitatively characterizing neural network optimization problems
Wikipedia Bibliography: