Bibliography:

  1. ‘AI scaling’ tag

  2. ‘grokking (NN)’ tag

  3. ‘inner monologue (AI)’ tag

  4. ‘PaLM’ tag

  5. ‘text style transfer’ tag

  6. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

  7. A phase transition between positional and semantic learning in a solvable model of dot-product attention

  8. Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

  9. Training Dynamics of Contextual N-Grams in Language Models

  10. Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

  11. A Theory for Emergence of Complex Skills in Language Models

  12. Teaching Arithmetic to Small Transformers

  13. Schema-learning and rebinding as mechanisms of in-context learning and emergence

  14. 8 Things to Know about Large Language Models

  15. The Quantization Model of Neural Scaling

  16. Toolformer: Language Models Can Teach Themselves to Use Tools

  17. Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation with Interaction

  18. Broken Neural Scaling Laws

  19. U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute

  20. Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

  21. Language Models are Multilingual Chain-of-Thought Reasoners

  22. Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

  23. Emergent Abilities of Large Language Models

  24. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

  25. Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

  26. PaLM: Scaling Language Modeling with Pathways

  27. In-Context Learning and Induction Heads

  28. Predictability and Surprise in Large Generative Models

  29. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

  30. A Mathematical Framework for Transformer Circuits

  31. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

  32. A General Language Assistant as a Laboratory for Alignment

  33. Mapping Language Models to Grounded Conceptual Spaces

  34. Program Synthesis with Large Language Models

  35. MMLU: Measuring Massive Multitask Language Understanding

  36. GPT-3: Language Models are Few-Shot Learners

  37. Emergence in Cognitive Science

  38. Observed Universality of Phase Transitions in High-Dimensional Geometry, with Implications for Modern Data Analysis and Signal Processing

  39. The Phase Transition In Human Cognition § Phase Transitions in Language Processing

  40. A dynamic systems model of cognitive and language growth

  41. design#future-tag-features

    [Transclude the forward-link's context]

  42. 2023-hu-figure1-zoominginonsorscheretal2022flatscalingusingbruteforcesamplingtogetnonzeroresultsshowssmoothscalinghiddenbythefloorbias.jpg

  43. 2023-lee-figure4-nanogptemergesperfectarithmeticwithreversedigitnumbersbutconvergespoorlywithregulardigitnumbers.jpg

  44. 2023-lee-figure5-matrixcompletionalgorithmexhibitingemergenceonadditionsimilartonanogpt.jpg

  45. 2022-pan-figure1-abruptswitchindecisionleadingtorewardhackingofcarhighwaymergingtask.jpg

  46. 2022-pan-figure2-largernnmodelsarebetteratrewardhacking.png

  47. 2022-pan-table1-9kindsofmissspecificationsandresultingkindsofrewardhacking.png

  48. https://cse-robotics.engr.tamu.edu/dshell/cs689/papers/anderson72more_is_different.pdf

  49. https://radiolab.org/podcast/91725-words/transcript

  50. https://www.pnas.org/doi/full/10.1073/pnas.2317967121

  51. https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/

  52. https://www.reddit.com/r/mlscaling/comments/sjzvl0/d_instances_of_nonlog_capability_spikes_or/

  53. https://x.com/_jasonwei/status/1635338409370865665

  54. https://x.com/sea_snell/status/1720926670704746503

  55. A Theory for Emergence of Complex Skills in Language Models

  56. https%253A%252F%252Farxiv.org%252Fabs%252F2307.15936.html

  57. Teaching Arithmetic to Small Transformers

  58. https%253A%252F%252Farxiv.org%252Fabs%252F2307.03381.html

  59. Schema-learning and rebinding as mechanisms of in-context learning and emergence

  60. https%253A%252F%252Farxiv.org%252Fabs%252F2307.01201%2523deepmind.html

  61. The Quantization Model of Neural Scaling

  62. https%253A%252F%252Farxiv.org%252Fabs%252F2303.13506.html

  63. U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute

  64. Yi Tay

  65. Jason Wei

  66. Neil Houlsby

  67. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11399%2523google.html

  68. Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

  69. Yi Tay

  70. Jason Wei

  71. https%253A%252F%252Farxiv.org%252Fabs%252F2210.09261%2523google.html

  72. Language Models are Multilingual Chain-of-Thought Reasoners

  73. Yi Tay

  74. Jason Wei

  75. https%253A%252F%252Farxiv.org%252Fabs%252F2210.03057%2523google.html

  76. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

  77. About Me

  78. Andrea Santilli

  79. Andy Zou

  80. Barret Zoph

  81. Behnam Neyshabur

  82. Colin Raffel

  83. https://people.eecs.berkeley.edu/~hendrycks/

  84. Daniel Levy

  85. Eric Tang

  86. Hannaneh Hajishirzi—University of Washington

  87. Jacob Hilton's Homepage

  88. Jared Kaplan

  89. Jascha Sohl-Dickstein

  90. Jason Wei

  91. Leo Gao

  92. Luke Metz

  93. Mantas Mazeika

  94. Mohit Bansal

  95. Nikita Nangia

  96. Omer Levy

  97. Owain Evans, AI Alignment Researcher

  98. Percy Liang

  99. Sam Bowman

  100. Stefano Ermon

  101. Stella Biderman

  102. Steven T. Piantadosi

  103. Vedant Misra

  104. https%253A%252F%252Farxiv.org%252Fabs%252F2206.04615.html

  105. PaLM: Scaling Language Modeling with Pathways

  106. Yi Tay

  107. https://x.com/jekbradbury

  108. Vedant Misra

  109. Barret Zoph

  110. Jason Wei

  111. https%253A%252F%252Farxiv.org%252Fabs%252F2204.02311%2523google.html

  112. Predictability and Surprise in Large Generative Models

  113. Andy Jones

  114. About Me

  115. Jared Kaplan

  116. Sam McCandlish

  117. https://jack-clark.net/about/

  118. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07785%2523anthropic.html

  119. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

  120. Jacob Steinhardt

  121. https%253A%252F%252Farxiv.org%252Fabs%252F2201.03544.html

  122. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

  123. Karen Simonyan

  124. https://x.com/jekbradbury

  125. Koray Kavukcuoglu

  126. https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html

  127. A General Language Assistant as a Laboratory for Alignment

  128. About Me

  129. Andy Jones

  130. https://jack-clark.net/about/

  131. Sam McCandlish

  132. Jared Kaplan

  133. https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html

  134. Mapping Language Models to Grounded Conceptual Spaces

  135. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DgJcEM8sxHK.html

  136. MMLU: Measuring Massive Multitask Language Understanding

  137. https://people.eecs.berkeley.edu/~hendrycks/

  138. Steven's Web Thoughts

  139. Andy Zou

  140. Mantas Mazeika

  141. Jacob Steinhardt

  142. https%253A%252F%252Farxiv.org%252Fabs%252F2009.03300.html

  143. Emergence in Cognitive Science

  144. https%253A%252F%252Fonlinelibrary.wiley.com%252Fdoi%252Ffull%252F10.1111%252Fj.1756-8765.2010.01116.x.html

  145. A dynamic systems model of cognitive and language growth

  146. %252Fdoc%252Fpsychology%252Fneuroscience%252F1991-vangeert.pdf.html