Bibliography (10):
https://github.com/ironjr/grokfast
https://arxiv.org/pdf/2405.20233#page=3
https://arxiv.org/pdf/2405.20233#page=12
Decoupled Weight Decay Regularization
Omnigrok: Grokking Beyond Algorithmic Data
https://arxiv.org/pdf/2405.20233#page=2
Wikipedia Bibliography:
Gradient descent
Stochastic gradient descent
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adamโโ:
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam
Ridge regression ยง Tikhonov regularizationโโ:
https://en.wikipedia.org/wiki/Ridge_regression#Tikhonov_regularization