Bibliography:

  1. ‘AI emergence’ tag

  2. Hardware Hedging Against Scaling Regime Shifts

  3. The Complexity Dynamics of Grokking

  4. Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

  5. The slingshot helps with learning

  6. Emergent properties with repeated examples

  7. Grokking Modular Polynomials

  8. Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

  9. Grokfast: Accelerated Grokking by Amplifying Slow Gradients

  10. Deep Grokking: Would Deep Neural Networks Generalize Better?

  11. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

  12. Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

  13. A Tale of Tails: Model Collapse as a Change of Scaling Laws

  14. Critical Data Size of Language Models from a Grokking Perspective

  15. Grokking Group Multiplication with Cosets

  16. Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

  17. Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

  18. Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

  19. Grokking in Linear Estimators—A Solvable Model that Groks without Understanding

  20. To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

  21. Grokking as the Transition from Lazy to Rich Training Dynamics

  22. PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation

  23. Explaining grokking through circuit efficiency

  24. Latent State Models of Training Dynamics

  25. The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

  26. Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

  27. A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations

  28. Progress measures for grokking via mechanistic interpretability

  29. Grokking phase transitions in learning local rules with gradient descent

  30. Omnigrok: Grokking Beyond Algorithmic Data

  31. The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

  32. Towards Understanding Grokking: An Effective Theory of Representation Learning

  33. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]

  34. Learning through atypical "phase transitions" in overparameterized neural networks

  35. Knowledge distillation: A good teacher is patient and consistent

  36. Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets

  37. The large learning rate phase of deep learning: the catapult mechanism

  38. A Recipe for Training Neural Networks

  39. The Complexity Dynamics of Grokking [Blog]

  40. Sea-Snell/grokking: Unofficial Re-Implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"

  41. Openai/grok

  42. Teddykoker/grokking: PyTorch Implementation of "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets"

  43. Hypothesis: Gradient Descent Prefers General Circuits

  44. Grokking: Generalization beyond Overfitting on Small Algorithmic Datasets (Paper Explained)

  45. design#future-tag-features

    [Transclude the forward-link's context]

  46. 2024-charton-figure1a-repetitionoftrainingdatashowsemergenceaftereenoughrepetitionbutdoubledescent.png

  47. 2024-charton-figure1b-twosettrainingismoresampleefficientatlearningarithmeticoperations.png

  48. 2024-fan-figure2-grokkingincreaseswithmlpdepth.jpg

  49. 2024-huang-figure1-phasediagramofregimesforgrokking.jpg

  50. 2024-huang-figure10-manuallydisentanglingmemorizationfromadditionacceleratesgrokkinginmodelscalingduetolessinterference.jpg

  51. 2024-huang-figure3-modelscalingincreasesgrokking.png

  52. 2024-huang-figure8-memorizationcontaminationvastlydelaysgrokkinginscalingupmodels.png

  53. 2024-lee-figure7-weightdecaylargelyreplacesgrokfastoptimizerinspeedingupgrokking.png

  54. 2024-salvi-figure3-phasediagramofgrokkingwithlabelnoiseandregularization.png

  55. 2024-wang-figure1-grokkingforimplicitreasoning.png

  56. 2024-wang-figure13-increasedweightdecayregularizationacceleratesgrokking.jpg

  57. 2024-wang-figure2-ratioofdeducedtheoremstoaxiomschangesgrokkingspeed.png

  58. 2024-wang-figure4-grokkingphasetransitionofcompositionalcircuitintransformer.jpg

  59. 2024-wang-figure5-grokkingphasetransitionofcomparisoncircuitintransformer.png

  60. 2024-zhu-figure1-grokkingofmodulararithmeticacrossdatasetsizes.png

  61. 2024-zhu-figure11-yelptransformergrokkingshowingdegrokking.png

  62. 2024-zhu-figure3-yelpgrokkingresults.png

  63. 2024-zhu-figure4-imdbmoviereviewdatasetsizeneedstoincreasewithmodelsizeforgrokking.jpg

  64. 2024-zhu-figure5-phasetransitionsinmodelparametersduringgrokkingfrommemorizationtogeneralization.png

  65. 2022-liu-figure1-goldilockszoneofinitializationandrelationshiptogrokking.png

  66. 2022-liu-figure10-grokkinggeneralizationtimeimproveswithhigherweightdecay.png

  67. 2022-liu-figure6-phasediagramsofgrokking.png

  68. 2022-liu-figure7-transformergrokkingvsweightnormformodularaddition.png

  69. 2021-power-figure1-grokkinglearningcurves.jpg

  70. 2021-power-poster.png#openai

  71. https://www.lesswrong.com/s/5omSW4wNKbEvYsyje/p/GpSzShaaf8po4rcmA

  72. https://x.com/AbdullahSabry42/status/1543195805741350917

  73. The slingshot helps with learning

  74. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FLncYobrn3vRr7qkZW%252Fthe-slingshot-helps-with-learning.html

  75. Grokfast: Accelerated Grokking by Amplifying Slow Gradients

  76. https%253A%252F%252Farxiv.org%252Fabs%252F2405.20233.html

  77. Deep Grokking: Would Deep Neural Networks Generalize Better?

  78. https://sites.google.com/view/razp/home

  79. https%253A%252F%252Farxiv.org%252Fabs%252F2405.19454.html

  80. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

  81. https%253A%252F%252Farxiv.org%252Fabs%252F2405.15071.html

  82. Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

  83. Search

  84. https%253A%252F%252Farxiv.org%252Fabs%252F2402.15175.html

  85. A Tale of Tails: Model Collapse as a Change of Scaling Laws

  86. https%253A%252F%252Farxiv.org%252Fabs%252F2402.07043.html

  87. Critical Data Size of Language Models from a Grokking Perspective

  88. https%253A%252F%252Farxiv.org%252Fabs%252F2401.10463.html

  89. To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

  90. https%253A%252F%252Farxiv.org%252Fabs%252F2310.13061.html

  91. PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation

  92. Search

  93. Ning Ding

  94. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03262.html

  95. Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

  96. https%253A%252F%252Farxiv.org%252Fabs%252F2306.13253.html

  97. Progress measures for grokking via mechanistic interpretability

  98. Jacob Steinhardt

  99. https%253A%252F%252Farxiv.org%252Fabs%252F2301.05217.html

  100. Grokking phase transitions in learning local rules with gradient descent

  101. https%253A%252F%252Farxiv.org%252Fabs%252F2210.15435.html

  102. Omnigrok: Grokking Beyond Algorithmic Data

  103. https%253A%252F%252Farxiv.org%252Fabs%252F2210.01117.html

  104. Towards Understanding Grokking: An Effective Theory of Representation Learning

  105. https%253A%252F%252Farxiv.org%252Fabs%252F2205.10343.html

  106. Learning through atypical "phase transitions" in overparameterized neural networks

  107. https%253A%252F%252Farxiv.org%252Fabs%252F2110.00683.html

  108. Knowledge distillation: A good teacher is patient and consistent

  109. Lucas Beyer

  110. https%253A%252F%252Farxiv.org%252Fabs%252F2106.05237%2523google.html

  111. Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets

  112. Vedant Misra

  113. %252Fdoc%252Fai%252Fnn%252Ffully-connected%252F2021-power.pdf%2523openai.html

  114. A Recipe for Training Neural Networks

  115. https%253A%252F%252Fkarpathy.github.io%252F2019%252F04%252F25%252Frecipe%252F.html

  116. Wikipedia Bibliography:

    1. Jess Smith

    2. Max Tegmark

    3. Andrej Karpathy