“‘Grokking (NN)’ Tag”,2020-03-04
![]()
Bibliography for tag
ai/scaling/emergence/grokking, most recent first: 40 annotations & 27 links (parent).
- See Also
- Gwern
- Links
- “Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond”, et al 2024
- “The Slingshot Helps With Learning”, 2024
- “Emergent Properties With Repeated Examples”, 2024
- “Grokking Modular Polynomials”, et al 2024
- “Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks”, et al 2024
- “Grokfast: Accelerated Grokking by Amplifying Slow Gradients”, et al 2024
- “Deep Grokking: Would Deep Neural Networks Generalize Better?”, et al 2024
- “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, et al 2024
- “Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition”, et al 2024
- “A Tale of Tails: Model Collapse As a Change of Scaling Laws”, et al 2024
- “Critical Data Size of Language Models from a Grokking Perspective”, et al 2024
- “Grokking Group Multiplication With Cosets”, et al 2023
- “Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking”, et al 2023
- “Outliers With Opposing Signals Have an Outsized Effect on Neural Network Optimization”, 2023
- “Grokking Beyond Neural Networks: An Empirical Exploration With Model Complexity”, et al 2023
- “Grokking in Linear Estimators—A Solvable Model That Groks without Understanding”, et al 2023
- “To Grok or Not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets”, et al 2023
- “Grokking As the Transition from Lazy to Rich Training Dynamics”, et al 2023
- “PassUntil: Predicting Emergent Abilities With Infinite Resolution Evaluation”, et al 2023
- “Explaining Grokking through Circuit Efficiency”, et al 2023
- “Latent State Models of Training Dynamics”, et al 2023
- “The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks”, et al 2023
- “Predicting Grokking Long Before It Happens: A Look into the Loss Landscape of Models Which Grok”, et al 2023
- “A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations”, et al 2023
- “Progress Measures for Grokking via Mechanistic Interpretability”, et al 2023
- “Grokking Phase Transitions in Learning Local Rules With Gradient Descent”, Žunkovič & 2022
- “Omnigrok: Grokking Beyond Algorithmic Data”, et al 2022
- “The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon”, et al 2022
- “Towards Understanding Grokking: An Effective Theory of Representation Learning”, et al 2022
- “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [Paper]”, et al 2022
- “Learning through Atypical “Phase Transitions” in Overparameterized Neural Networks”, et al 2021
- “Knowledge Distillation: A Good Teacher Is Patient and Consistent”, et al 2021
- “Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets”, et al 2021
- “The Large Learning Rate Phase of Deep Learning: the Catapult Mechanism”, et al 2020
- “A Recipe for Training Neural Networks”, 2019
- “Sea-Snell/grokking: Unofficial Re-Implementation of “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets””
- “Openai/grok”
- “Teddykoker/grokking: PyTorch Implementation of “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets””
- “Hypothesis: Gradient Descent Prefers General Circuits”
- “Grokking: Generalization beyond Overfitting on Small Algorithmic Datasets (Paper Explained)”
- Sort By Magic
- Miscellaneous
- Bibliography