Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Deep Grokking: Would Deep Neural Networks Generalize Better?
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Critical Data Size of Language Models from a Grokking Perspective
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Grokking in Linear Estimators—A Solvable Model that Groks without Understanding
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Grokking as the Transition from Lazy to Rich Training Dynamics
PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Progress measures for grokking via mechanistic interpretability
Grokking phase transitions in learning local rules with gradient descent
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Towards Understanding Grokking: An Effective Theory of Representation Learning
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]
Learning through atypical "phase transitions" in overparameterized neural networks
Knowledge distillation: A good teacher is patient and consistent
Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets
The large learning rate phase of deep learning: the catapult mechanism
Sea-Snell/grokking: Unofficial Re-Implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
Teddykoker/grokking: PyTorch Implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"
Grokking: Generalization beyond Overfitting on Small Algorithmic Datasets (Paper Explained)
2024-charton-figure1a-repetitionoftrainingdatashowsemergenceaftereenoughrepetitionbutdoubledescent.png
2024-charton-figure1b-twosettrainingismoresampleefficientatlearningarithmeticoperations.png
2024-huang-figure10-manuallydisentanglingmemorizationfromadditionacceleratesgrokkinginmodelscalingduetolessinterference.jpg
2024-huang-figure8-memorizationcontaminationvastlydelaysgrokkinginscalingupmodels.png
2024-lee-figure7-weightdecaylargelyreplacesgrokfastoptimizerinspeedingupgrokking.png
2024-salvi-figure3-phasediagramofgrokkingwithlabelnoiseandregularization.png
2024-wang-figure13-increasedweightdecayregularizationacceleratesgrokking.jpg
2024-wang-figure2-ratioofdeducedtheoremstoaxiomschangesgrokkingspeed.png
2024-wang-figure4-grokkingphasetransitionofcompositionalcircuitintransformer.jpg
2024-wang-figure5-grokkingphasetransitionofcomparisoncircuitintransformer.png
2024-zhu-figure1-grokkingofmodulararithmeticacrossdatasetsizes.png
2024-zhu-figure11-yelptransformergrokkingshowingdegrokking.png
2024-zhu-figure4-imdbmoviereviewdatasetsizeneedstoincreasewithmodelsizeforgrokking.jpg
2024-zhu-figure5-phasetransitionsinmodelparametersduringgrokkingfrommemorizationtogeneralization.png
2022-liu-figure1-goldilockszoneofinitializationandrelationshiptogrokking.png
2022-liu-figure10-grokkinggeneralizationtimeimproveswithhigherweightdecay.png
2022-liu-figure7-transformergrokkingvsweightnormformodularaddition.png
https://www.lesswrong.com/s/5omSW4wNKbEvYsyje/p/GpSzShaaf8po4rcmA
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FLncYobrn3vRr7qkZW%252Fthe-slingshot-helps-with-learning.html
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Deep Grokking: Would Deep Neural Networks Generalize Better?
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Critical Data Size of Language Models from a Grokking Perspective
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Progress measures for grokking via mechanistic interpretability
Grokking phase transitions in learning local rules with gradient descent
Towards Understanding Grokking: An Effective Theory of Representation Learning
Learning through atypical "phase transitions" in overparameterized neural networks
Knowledge distillation: A good teacher is patient and consistent
https%253A%252F%252Farxiv.org%252Fabs%252F2106.05237%2523google.html
Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets
%252Fdoc%252Fai%252Fnn%252Ffully-connected%252F2021-power.pdf%2523openai.html
https%253A%252F%252Fkarpathy.github.io%252F2019%252F04%252F25%252Frecipe%252F.html
Wikipedia Bibliography: