Foundational Challenges in Assuring Alignment and Safety of Large Language Models
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Training Dynamics of Contextual N-Grams in Language Models
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
A Theory for Emergence of Complex Skills in Language Models
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Toolformer: Language Models Can Teach Themselves to Use Tools
Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation with Interaction
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
Language Models are Multilingual Chain-of-Thought Reasoners
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
A General Language Assistant as a Laboratory for Alignment
Observed Universality of Phase Transitions in High-Dimensional Geometry, with Implications for Modern Data Analysis and Signal Processing
The Phase Transition In Human Cognition § Phase Transitions in Language Processing
2023-hu-figure1-zoominginonsorscheretal2022flatscalingusingbruteforcesamplingtogetnonzeroresultsshowssmoothscalinghiddenbythefloorbias.jpg
2023-lee-figure4-nanogptemergesperfectarithmeticwithreversedigitnumbersbutconvergespoorlywithregulardigitnumbers.jpg
2023-lee-figure5-matrixcompletionalgorithmexhibitingemergenceonadditionsimilartonanogpt.jpg
2022-pan-figure1-abruptswitchindecisionleadingtorewardhackingofcarhighwaymergingtask.jpg
2022-pan-figure2-largernnmodelsarebetteratrewardhacking.png
2022-pan-table1-9kindsofmissspecificationsandresultingkindsofrewardhacking.png
https://cse-robotics.engr.tamu.edu/dshell/cs689/papers/anderson72more_is_different.pdf
https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/
https://www.reddit.com/r/mlscaling/comments/sjzvl0/d_instances_of_nonlog_capability_spikes_or/
A Theory for Emergence of Complex Skills in Language Models
Schema-learning and rebinding as mechanisms of in-context learning and emergence
https%253A%252F%252Farxiv.org%252Fabs%252F2307.01201%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.11399%2523google.html
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
https%253A%252F%252Farxiv.org%252Fabs%252F2210.09261%2523google.html
Language Models are Multilingual Chain-of-Thought Reasoners
https%253A%252F%252Farxiv.org%252Fabs%252F2210.03057%2523google.html
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
https%253A%252F%252Farxiv.org%252Fabs%252F2204.02311%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.07785%2523anthropic.html
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html
A General Language Assistant as a Laboratory for Alignment
https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DgJcEM8sxHK.html
https%253A%252F%252Fonlinelibrary.wiley.com%252Fdoi%252Ffull%252F10.1111%252Fj.1756-8765.2010.01116.x.html
%252Fdoc%252Fpsychology%252Fneuroscience%252F1991-vangeert.pdf.html
Wikipedia Bibliography: