- See Also
-
Links
- “Training Dynamics of Contextual N-Grams in Language Models”, Quirke et al 2023
- “A Theory for Emergence of Complex Skills in Language Models”, Arora & Goyal 2023
- “Teaching Arithmetic to Small Transformers”, Lee et al 2023
- “8 Things to Know about Large Language Models”, Bowman 2023
- “Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
- “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Tay et al 2022
- “Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, Suzgun et al 2022
- “Language Models Are Multilingual Chain-of-Thought Reasoners”, Shi et al 2022
- “Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit”, Barak et al 2022
- “Emergent Abilities of Large Language Models”, Wei et al 2022
- “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022
- “Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Chan et al 2022
- “PaLM: Scaling Language Modeling With Pathways”, Chowdhery et al 2022
- “In-context Learning and Induction Heads”, Olsson et al 2022
- “Predictability and Surprise in Large Generative Models”, Ganguli et al 2022
- “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Pan et al 2022
- “A Mathematical Framework for Transformer Circuits”, Elhage et al 2021
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Rae et al 2021
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
- “Program Synthesis With Large Language Models”, Austin et al 2021
- “MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
- “GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020
- “Emergence in Cognitive Science”, McClelland 2010
- “The Phase Transition In Human Cognition § Phase Transitions in Language Processing”, Spivey et al 2009 (page 13)
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Training Dynamics of Contextual N-Grams in Language Models”, Quirke et al 2023
“Training Dynamics of Contextual N-Grams in Language Models”
“A Theory for Emergence of Complex Skills in Language Models”, Arora & Goyal 2023
“A Theory for Emergence of Complex Skills in Language Models”
“Teaching Arithmetic to Small Transformers”, Lee et al 2023
“8 Things to Know about Large Language Models”, Bowman 2023
“Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
“U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, Tay et al 2022
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, Suzgun et al 2022
“Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”
“Language Models Are Multilingual Chain-of-Thought Reasoners”, Shi et al 2022
“Language Models are Multilingual Chain-of-Thought Reasoners”
“Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit”, Barak et al 2022
“Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit”
“Emergent Abilities of Large Language Models”, Wei et al 2022
“Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022
“Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Chan et al 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”
“PaLM: Scaling Language Modeling With Pathways”, Chowdhery et al 2022
“In-context Learning and Induction Heads”, Olsson et al 2022
“Predictability and Surprise in Large Generative Models”, Ganguli et al 2022
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Pan et al 2022
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”
“A Mathematical Framework for Transformer Circuits”, Elhage et al 2021
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Rae et al 2021
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“A General Language Assistant as a Laboratory for Alignment”
“Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
“Program Synthesis With Large Language Models”, Austin et al 2021
“MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
“GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020
“Emergence in Cognitive Science”, McClelland 2010
“The Phase Transition In Human Cognition § Phase Transitions in Language Processing”, Spivey et al 2009 (page 13)
“The Phase Transition In Human Cognition § Phase Transitions in Language Processing”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
cognitive-emergence
minitrans-arithmetic
upalm
languagemodels
Wikipedia
Miscellaneous
-
/doc/ai/scaling/emergence/2022-pan-figure2-largernnmodelsarebetteratrewardhacking.png
-
https://cse-robotics.engr.tamu.edu/dshell/cs689/papers/anderson72more_is_different.pdf
-
https://www.lesswrong.com/s/5omSW4wNKbEvYsyje/p/GpSzShaaf8po4rcmA
-
https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/
-
https://www.reddit.com/r/mlscaling/comments/sjzvl0/d_instances_of_nonlog_capability_spikes_or/
Link Bibliography
-
https://arxiv.org/abs/2307.15936
: “A Theory for Emergence of Complex Skills in Language Models”, Sanjeev Arora, Anirudh Goyal -
https://arxiv.org/abs/2307.03381
: “Teaching Arithmetic to Small Transformers”, Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos -
https://arxiv.org/abs/2210.11399#google
: “U-PaLM: Transcending Scaling Laws With 0.1% Extra Compute”, -
https://arxiv.org/abs/2210.09261#google
: “Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them”, -
https://arxiv.org/abs/2210.03057#google
: “Language Models Are Multilingual Chain-of-Thought Reasoners”, -
https://arxiv.org/abs/2204.02311#google
: “PaLM: Scaling Language Modeling With Pathways”, -
https://arxiv.org/abs/2202.07785#anthropic
: “Predictability and Surprise in Large Generative Models”, -
https://arxiv.org/abs/2201.03544
: “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Alexander Pan, Kush Bhatia, Jacob Steinhardt -
https://arxiv.org/abs/2112.11446#deepmind
: “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”, -
https://openreview.net/forum?id=gJcEM8sxHK
: “Mapping Language Models to Grounded Conceptual Spaces”, Roma Patel, Ellie Pavlick -
https://arxiv.org/abs/2009.03300
: “MMLU: Measuring Massive Multitask Language Understanding”, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt -
https://onlinelibrary.wiley.com/doi/full/10.1111/j.1756-8765.2010.01116.x
: “Emergence in Cognitive Science”, James L. McClelland