https://gwern.net/doc/ai/nn/fully-connected/2021-power.pdf#openai
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
PassUntil: Predicting Emergent Abilities with Infinite Resolution Evaluation
Progress measures for grokking via mechanistic interpretability
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Wikipedia Bibliography: