- See Also
-
Gwern
- “Research Ideas”, Gwern 2017
- “Review Of Crumb”, Gwern 2024
- “Miscellaneous”, Gwern 2009
- “Movie Reviews”, Gwern 2014
- “Anime Reviews”, Gwern 2010
- “Utext: Rich Unicode Documents”, Gwern 2023
- “Novelty Nets: Classifier Anti-Guidance”, Gwern 2024
- “InvertOrNot.com Proposal”, Gwern 2021
- “The Second Apocalypse: Freedom In An Unfree Universe”, Gwern 2017
- “Review Of The Quantum Thief Trilogy”, Gwern 2022
-
Links
- “No Physics? No Problem. AI Weather Forecasting Is Already Making Huge Strides.”
- “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, Spigler et al 2019
- “LGE: Cell-Free Latent Go-Explore”, Gallouédec & Dellandréa 2022
- “A Solvable Model of Neural Scaling Laws”, Maloney et al 2022
- “Reflexion: Language Agents With Verbal Reinforcement Learning”, Shinn et al 2023
- “Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models”, Lu et al 2024
- “From Bare Metal to a 70B Model: Infrastructure Set-Up and Scripts”
- “Parameter Counts in Machine Learning”
- “Carl Schmitt’s Ultimate Emergency: The Night of the Long Knives”, Vagts 2012
- “BADGE: Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds”, Ash et al 2019
- “Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning”, Kirsch et al 2021
- “Position: Understanding LLMs Requires More Than Statistical Generalization”, Reizinger et al 2024
- “Diffusion On Syntax Trees For Program Synthesis”, Kapur et al 2024
- “Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, Ankner et al 2024
- “State Soup: In-Context Skill Learning, Retrieval and Mixing”, Pióro et al 2024
- “Data Curation via Joint Example Selection Further Accelerates Multimodal Learning”, Evans et al 2024
- “Introducing AuraSR—An Open Reproduction of the GigaGAN Upscaler”
- “A Potpourri of Cool-Looking Scripts”
- “The Super Effectiveness of Pokémon Embeddings Using Only Raw JSON and Images”
- “Solving a Rubik's Cube in Record Time”
- “Why I Attack”, Carlini 2024
- “Nicolas Heess”
- “Stefano Ermon”
- “Sherjil Ozair”
- “Jared Kaplan”
- “Curious about You”, translucentaudiosynthesis319 2024
- “Jianfeng Gao at Microsoft Research”
- “Cuboth”
- “Belief in God: A Game-Theoretic Paradox”, Brams 1982
- “Association between Prescription Medications and Falls at Home among Young and Middle-Aged Adults”, Kool et al 2012
- “The Benefits of Frequent Positive Affect: Does Happiness Lead to Success?”, Lyubomirsky et al 2005
- “DLA—Diffusion Limited Aggregation”
- “Daniel Levy”
- “On the Number of Response Regions of Deep Feed Forward Networks With Piece-Wise Linear Activations”, Pascanu et al 2013
- “Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model”, Wang et al 2016
- “MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
- “Grokking Group Multiplication With Cosets”, Stander et al 2023
- “Neural Networks Learn Statistics of Increasing Complexity”, Belrose et al 2024
- “How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad”, Abbe et al 2024
- “An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”
- “David Warde-Farley”
- “Upper Limits of Common Lisp [on the Norwegian Oil Industry]”
- “Circumstances of Fall-Related Injuries by Age and Gender among Community-Dwelling Adults in the United States”
- “Is Wikipedia Politically Biased?”
- “Claude Sonnet 3.5, Economist”
- “Mike Lewis”
- “Timothy Frayling”
- “Jakob Grove”
- “Jouke- Jan Hottenga”
- “Is Bronny James Underrated? Inside the Phenomenon of the NBA Bloodline”
- “Veikko Salomaa”
- “Anthropic CEO Dario Amodei on Being an Underdog, AI Safety, and Economic Inequality”, Perrigo 2024
- “Lili Milani CV, ETIS”
- “Is Software Eating the World?”
- “Falls and Fall-Related Injuries among Community-Dwelling Adults in the United States”
- “Interview: Hannu Rajaniemi; SciFiNow Sits down With a Rising Star of Science Fiction”, Bandah 2010
- “Animation Platforms: Yoshiyama Yū, Tropical Rouge! Pretty Cure, and Sakuga As New Media”, Tai 2024
- “Monitoring Technologies, Environmental Performance, and Health Outcomes: Evidence from China”, Hu et al 2023
- “Maxout Networks”, Goodfellow et al 2013
- “Larger Language Models Do In-Context Learning Differently”, Wei et al 2023
- “Simple Synthetic Data Reduces Sycophancy in Large Language Models”, Wei et al 2023
- “Motif: Intrinsic Motivation from Artificial Intelligence Feedback”, Klissarov et al 2023
- “Diff History for Neural Language Agents”, Piterbarg et al 2023
- “Playing NetHack With LLMs: Potential & Limitations As Zero-Shot Agents (NetPlay)”, Jeurissen et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, Zhao et al 2024
- “Gamers Have Become Less Interested in Strategic Thinking and Planning”
- “Evolutionary-Scale Prediction of Atomic Level Protein Structure With a Language Model”, Lin et al 2022
- “Training Compute-Optimal Protein Language Models”, Cheng et al 2024
- “Etched Is Making the Biggest Bet in AI”
- “An Evolved Circuit, Intrinsic in Silicon, Entwined With Physics”, Thompson 1997
- “On the Number of Linear Regions of Deep Neural Networks”, Montúfar et al 2014
- “Qualitatively Characterizing Neural Network Optimization Problems”, Goodfellow et al 2014
- “The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes”, Fort & Scherlis 2018
- “On Lazy Training in Differentiable Programming”, Chizat et al 2018
- “Fantastic Generalization Measures and Where to Find Them”, Jiang et al 2019
- “Understanding the Role of Training Regimes in Continual Learning”, Mirzadeh et al 2020
- “Sharpness-Aware Minimization (SAM) for Efficiently Improving Generalization”, Foret et al 2020
- “The Modern Mathematics of Deep Learning”, Berner et al 2021
- “Adaptive Gradient Methods at the Edge of Stability”, Cohen et al 2022
- “Grokking Phase Transitions in Learning Local Rules With Gradient Descent”, Žunkovič & Ilievski 2022
- “Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4”, Chang et al 2023
- “Predicting Grokking Long Before It Happens: A Look into the Loss Landscape of Models Which Grok”, Notsawo et al 2023
- “To Grok or Not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets”, Doshi et al 2023
- “Characterizing Mechanisms for Factual Recall in Language Models”, Yu et al 2023
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Gwern
“Research Ideas”, Gwern 2017
“Review Of Crumb”, Gwern 2024
“Miscellaneous”, Gwern 2009
“Movie Reviews”, Gwern 2014
“Anime Reviews”, Gwern 2010
“Utext: Rich Unicode Documents”, Gwern 2023
“Novelty Nets: Classifier Anti-Guidance”, Gwern 2024
“InvertOrNot.com Proposal”, Gwern 2021
“The Second Apocalypse: Freedom In An Unfree Universe”, Gwern 2017
“Review Of The Quantum Thief Trilogy”, Gwern 2022
Links
“No Physics? No Problem. AI Weather Forecasting Is Already Making Huge Strides.”
No physics? No problem. AI weather forecasting is already making huge strides.
“Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, Spigler et al 2019
Asymptotic learning curves of kernel methods: empirical data versus Teacher-Student paradigm
“LGE: Cell-Free Latent Go-Explore”, Gallouédec & Dellandréa 2022
“A Solvable Model of Neural Scaling Laws”, Maloney et al 2022
“Reflexion: Language Agents With Verbal Reinforcement Learning”, Shinn et al 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
“Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models”, Lu et al 2024
Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models
“From Bare Metal to a 70B Model: Infrastructure Set-Up and Scripts”
From bare metal to a 70B model: infrastructure set-up and scripts
“Parameter Counts in Machine Learning”
“Carl Schmitt’s Ultimate Emergency: The Night of the Long Knives”, Vagts 2012
Carl Schmitt’s Ultimate Emergency: The Night of the Long Knives
“BADGE: Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds”, Ash et al 2019
BADGE: Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
“Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning”, Kirsch et al 2021
Stochastic Batch Acquisition: A Simple Baseline for Deep Active Learning
“Position: Understanding LLMs Requires More Than Statistical Generalization”, Reizinger et al 2024
Position: Understanding LLMs Requires More Than Statistical Generalization
“Diffusion On Syntax Trees For Program Synthesis”, Kapur et al 2024
“Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, Ankner et al 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
“State Soup: In-Context Skill Learning, Retrieval and Mixing”, Pióro et al 2024
“Data Curation via Joint Example Selection Further Accelerates Multimodal Learning”, Evans et al 2024
Data curation via joint example selection further accelerates multimodal learning
“Introducing AuraSR—An Open Reproduction of the GigaGAN Upscaler”
Introducing AuraSR—An open reproduction of the GigaGAN Upscaler
“A Potpourri of Cool-Looking Scripts”
“The Super Effectiveness of Pokémon Embeddings Using Only Raw JSON and Images”
The Super Effectiveness of Pokémon Embeddings Using Only Raw JSON and Images:
View External Link:
“Solving a Rubik's Cube in Record Time”
“Why I Attack”, Carlini 2024
“Nicolas Heess”
“Stefano Ermon”
“Sherjil Ozair”
“Jared Kaplan”
“Curious about You”, translucentaudiosynthesis319 2024
“Jianfeng Gao at Microsoft Research”
“Cuboth”
Cuboth:
“Belief in God: A Game-Theoretic Paradox”, Brams 1982
“Association between Prescription Medications and Falls at Home among Young and Middle-Aged Adults”, Kool et al 2012
Association between prescription medications and falls at home among young and middle-aged adults
“The Benefits of Frequent Positive Affect: Does Happiness Lead to Success?”, Lyubomirsky et al 2005
The Benefits of Frequent Positive Affect: Does Happiness Lead to Success?
“DLA—Diffusion Limited Aggregation”
“Daniel Levy”
“On the Number of Response Regions of Deep Feed Forward Networks With Piece-Wise Linear Activations”, Pascanu et al 2013
On the number of response regions of deep feed forward networks with piece-wise linear activations
“Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model”, Wang et al 2016
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, Tanzer et al 2023
MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book
“Grokking Group Multiplication With Cosets”, Stander et al 2023
“Neural Networks Learn Statistics of Increasing Complexity”, Belrose et al 2024
“How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad”, Abbe et al 2024
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process”
“David Warde-Farley”
“Upper Limits of Common Lisp [on the Norwegian Oil Industry]”
“Circumstances of Fall-Related Injuries by Age and Gender among Community-Dwelling Adults in the United States”
“Is Wikipedia Politically Biased?”
“Claude Sonnet 3.5, Economist”
“Mike Lewis”
“Timothy Frayling”
“Jakob Grove”
“Jouke- Jan Hottenga”
“Is Bronny James Underrated? Inside the Phenomenon of the NBA Bloodline”
Is Bronny James underrated? Inside the phenomenon of the NBA bloodline
“Veikko Salomaa”
“Anthropic CEO Dario Amodei on Being an Underdog, AI Safety, and Economic Inequality”, Perrigo 2024
Anthropic CEO Dario Amodei on Being an Underdog, AI Safety, and Economic Inequality
“Lili Milani CV, ETIS”
“Is Software Eating the World?”
“Falls and Fall-Related Injuries among Community-Dwelling Adults in the United States”
Falls and Fall-Related Injuries among Community-Dwelling Adults in the United States
“Interview: Hannu Rajaniemi; SciFiNow Sits down With a Rising Star of Science Fiction”, Bandah 2010
Interview: Hannu Rajaniemi; SciFiNow sits down with a rising star of science fiction
“Animation Platforms: Yoshiyama Yū, Tropical Rouge! Pretty Cure, and Sakuga As New Media”, Tai 2024
Animation Platforms: Yoshiyama Yū, Tropical Rouge! Pretty Cure, and Sakuga as New Media
“Monitoring Technologies, Environmental Performance, and Health Outcomes: Evidence from China”, Hu et al 2023
Monitoring Technologies, Environmental Performance, and Health Outcomes: Evidence from China
“Maxout Networks”, Goodfellow et al 2013
“Larger Language Models Do In-Context Learning Differently”, Wei et al 2023
“Simple Synthetic Data Reduces Sycophancy in Large Language Models”, Wei et al 2023
Simple synthetic data reduces sycophancy in large language models
“Motif: Intrinsic Motivation from Artificial Intelligence Feedback”, Klissarov et al 2023
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
“Diff History for Neural Language Agents”, Piterbarg et al 2023
“Playing NetHack With LLMs: Potential & Limitations As Zero-Shot Agents (NetPlay)”, Jeurissen et al 2024
Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents (NetPlay)
“Probing the Decision Boundaries of In-Context Learning in Large Language Models”, Zhao et al 2024
Probing the Decision Boundaries of In-context Learning in Large Language Models
“Gamers Have Become Less Interested in Strategic Thinking and Planning”
Gamers Have Become Less Interested in Strategic Thinking and Planning
“Evolutionary-Scale Prediction of Atomic Level Protein Structure With a Language Model”, Lin et al 2022
Evolutionary-scale prediction of atomic level protein structure with a language model
“Training Compute-Optimal Protein Language Models”, Cheng et al 2024
“Etched Is Making the Biggest Bet in AI”
“An Evolved Circuit, Intrinsic in Silicon, Entwined With Physics”, Thompson 1997
An evolved circuit, intrinsic in silicon, entwined with physics
“On the Number of Linear Regions of Deep Neural Networks”, Montúfar et al 2014
“Qualitatively Characterizing Neural Network Optimization Problems”, Goodfellow et al 2014
Qualitatively characterizing neural network optimization problems
“The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes”, Fort & Scherlis 2018
The Goldilocks zone: Towards better understanding of neural network loss landscapes
“On Lazy Training in Differentiable Programming”, Chizat et al 2018
“Fantastic Generalization Measures and Where to Find Them”, Jiang et al 2019
“Understanding the Role of Training Regimes in Continual Learning”, Mirzadeh et al 2020
Understanding the Role of Training Regimes in Continual Learning
“Sharpness-Aware Minimization (SAM) for Efficiently Improving Generalization”, Foret et al 2020
Sharpness-Aware Minimization (SAM) for Efficiently Improving Generalization
“The Modern Mathematics of Deep Learning”, Berner et al 2021
“Adaptive Gradient Methods at the Edge of Stability”, Cohen et al 2022
“Grokking Phase Transitions in Learning Local Rules With Gradient Descent”, Žunkovič & Ilievski 2022
Grokking phase transitions in learning local rules with gradient descent
“Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4”, Chang et al 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
“Predicting Grokking Long Before It Happens: A Look into the Loss Landscape of Models Which Grok”, Notsawo et al 2023
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
“To Grok or Not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets”, Doshi et al 2023
“Characterizing Mechanisms for Factual Recall in Language Models”, Yu et al 2023
Characterizing Mechanisms for Factual Recall in Language Models
Wikipedia
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/1905.10843
: “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, -
https://arxiv.org/abs/2210.16859
: “A Solvable Model of Neural Scaling Laws”, -
https://arxiv.org/abs/2405.15143
: “Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models”, -
https://arxiv.org/abs/2405.20541
: “Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, -
https://www.scifinow.co.uk/news/interview-hannu-rajaniemi/
: “Interview: Hannu Rajaniemi; SciFiNow Sits down With a Rising Star of Science Fiction”, -
2024-tai.pdf
: “Animation Platforms: Yoshiyama Yū, Tropical Rouge! Pretty Cure, and Sakuga As New Media”, -
https://arxiv.org/abs/2303.03846#google
: “Larger Language Models Do In-Context Learning Differently”, -
https://arxiv.org/abs/2308.03958#deepmind
: “Simple Synthetic Data Reduces Sycophancy in Large Language Models”, -
https://arxiv.org/abs/2406.11233
: “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, -
https://www.biorxiv.org/content/10.1101/2024.06.06.597716.full
: “Training Compute-Optimal Protein Language Models”, -
https://arxiv.org/abs/1402.1869
: “On the Number of Linear Regions of Deep Neural Networks”, -
https://arxiv.org/abs/1912.02178
: “Fantastic Generalization Measures and Where to Find Them”, -
https://arxiv.org/abs/2010.01412#google
: “Sharpness-Aware Minimization (SAM) for Efficiently Improving Generalization”, -
https://arxiv.org/abs/2210.15435
: “Grokking Phase Transitions in Learning Local Rules With Gradient Descent”, -
https://arxiv.org/abs/2306.13253
: “Predicting Grokking Long Before It Happens: A Look into the Loss Landscape of Models Which Grok”, -
https://arxiv.org/abs/2310.13061
: “To Grok or Not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets”,