Bibliography:

  1. backstop#deep-bayes

    [Transclude the forward-link's context]

  2. ML Scaling subreddit

  3. It Looks Like You’re Trying To Take Over The World

  4. GPT-3: Language Models are Few-Shot Learners

  5. GPT-3 paper § Figure F.1: Four uncurated completions from a context suggesting the model compose a poem in the style of Wallace Stevens with the title ‘Shadows on the Way’

  6. GPT-3 Creative Fiction

  7. GPT-2 Neural Network Poetry

  8. GPT-3 Github JSON Dump Reformatted to Readable HTML

  9. OpenAI API

  10. Better Language Models and Their Implications

  11. GPT-3 Creative Fiction § BPEs

  12. Using Fast Weights to Attend to the Recent Past

  13. https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AMetaRL&include_over_18=on&restrict_sr=on&sort=top

  14. One-shot Learning with Memory-Augmented Neural Networks

  15. Prefrontal cortex as a meta-reinforcement learning system

  16. Matt Botvinick on the spontaneous emergence of learning algorithms

  17. Reinforcement Learning, Fast and Slow

  18. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

  19. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

  20. One Big Net For Everything

  21. Meta-Learning: Learning to Learn Fast

  22. Meta Reinforcement Learning

  23. Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

  24. GPT-1: Improving Language Understanding with Unsupervised Learning

  25. I Recently Came across Https://arxiv.org/abs/2004.08900, Which ‘Assumes 2-3 Runs’ of T5-11B. In Fact, We Trained T5-11B once. That’s Why We Spend 35 Pages Figuring out How We Should Train Before We Start Training. You Don’t Want to Mess up a Training Run That Big.

  26. $1970

  27. $1972

  28. $1997

  29. CERN makes bold push to build €21-billion supercollider: European particle-physics lab will pursue a 100-kilometre machine to uncover the Higgs boson’s secrets—but it doesn’t yet have the funds

  30. $2020

  31. $2010

  32. $1993

  33. Whole Brain Emulation: A Roadmap

  34. 2019 recent trends in GPU price per FLOPS

  35. Dota 2 With Large Scale Deep Reinforcement Learning § Pg11

  36. D.5: Context Dependence

  37. Efficient Attention: Breaking The Quadratic Transformer Bottleneck

  38. WBE and DRL: a Middle Way of imitation learning from the human brain

  39. LHOPT: A Generalizable Approach to Learning Optimizers

  40. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

  41. GPT-3 random sample dump: JavaScript tutorial

  42. On the Measure of Intelligence

  43. Deep Learning Hardware: Past, Present, & Future § Pg60

  44. Technology Forecasting: The Garden of Forking Paths

  45. GPT-3: Language Models Are Few-Shot Learners: 5. Limitations

  46. CTRL: A Conditional Transformer Language Model For Controllable Generation

  47. Towards a Human-like Open-Domain Chatbot

  48. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  49. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  50. Turing-NLG: A 17-billion-parameter language model by Microsoft

  51. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

  52. Machine Learning Scaling

  53. Extracting Training Data from Large Language Models

  54. Does Learning Require Memorization? A Short Tale about a Long Tail

  55. The Computational Limits of Deep Learning

  56. The Unreasonable Effectiveness of Data

  57. Scaling to Very Very Large Corpora for Natural Language Disambiguation

  58. Large Language Models in Machine Translation

  59. 2017-koehn-figure3-bleuscoreswithvaryingamountsoftrainingdata.png

  60. WinoGrande: An Adversarial Winograd Schema Challenge at Scale

  61. Total Compute Used to Train Language Model: Table D.1

  62. AI and Compute

  63. OpenAI's GPT-3 Language Model: A Technical Overview

  64. $1946

  65. People I Know at OpenAI Say V4 Is around the Corner and Easily Doable, And...will Be Here Soon (not Months but Year or So). And They Are Confident It Will Scale and Be around 100--1000×.

  66. Microsoft announces new supercomputer, lays out vision for future AI work

  67. Scaling Laws for Neural Language Models

  68. Scaling Laws for Neural Language Models: Figure 1: Language Modeling Performance Improves Smoothly As We Increase the Model Size, Dataset Size, and Amount of Compute Used for Training.

  69. Scaling Laws for Neural Language Models: Figure 15: Far beyond the Model Sizes We Study Empirically, We Find a Contradiction between Our Equations § Pg17

  70. https://arxiv.org/pdf/2005.14165.pdf#page=11&org=openai

  71. Table 2.2: Datasets Used to Train GPT-3. ‘Weight in Training Mix’ Refers to the Fraction of Examples during Training That Are Drawn from a given Dataset, Which We Intentionally Do Not Make Proportional to the Size of the Dataset. As a Result, When We Train for 300 Billion Tokens, Some Datasets Are Seen up to 3.4 times during Training While Other Datasets Are Seen Less Than Once.

  72. 2020-adiwardana-meena-figure1-humanratingsvslikelihood.png

  73. 2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.jpg

  74. 2020-hendrycks-figure1b-gpt3-qascaling.png

  75. MMLU: Measuring Massive Multitask Language Understanding

  76. https://x.com/geoffreyhinton/status/1270814602931187715

  77. The Bitter Lesson

  78. Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples

  79. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  80. Generative Language Modeling for Automated Theorem Proving

  81. The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing

  82. Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

  83. Fully-Connected Neural Nets

  84. A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

  85. Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size

  86. How AI Training Scales

  87. BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M

  88. Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

  89. NVAE: A Deep Hierarchical Variational Autoencoder

  90. Big Transfer (BiT): General Visual Representation Learning

  91. Are we done with ImageNet?

  92. On Robustness and Transferability of Convolutional Neural Networks

  93. Robustness properties of Facebook’s ResNeXt WSL models

  94. Self-training with Noisy Student improves ImageNet classification

  95. Measuring Robustness to Natural Distribution Shifts in Image Classification

  96. Understanding Robustness of Transformers for Image Classification

  97. Distilling the Knowledge in a Neural Network

  98. Smooth Adversarial Training

  99. 12-in-1: Multi-Task Vision and Language Representation Learning

  100. VideoBERT: A Joint Model for Video and Language Representation Learning

  101. The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism

  102. High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

  103. Grandmaster level in StarCraft II using multi-agent reinforcement learning

  104. One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

  105. A Style-Based Generator Architecture for Generative Adversarial Networks

  106. A simple neural network module for relational reasoning

  107. Neural scene representation and rendering

  108. Transformers as Soft Reasoners over Language

  109. Environmental drivers of systematicity and generalization in a situated agent

  110. Gated-Attention Architectures for Task-Oriented Language Grounding

  111. Interactive Grounded Language Acquisition and Generalization in a 2D World

  112. Compositional generalization through meta sequence-to-sequence learning

  113. Imitating Interactive Intelligence

  114. Solving Rubik’s Cube with a Robot Hand

  115. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  116. DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

  117. Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills

  118. Understanding RL Vision: With diverse environments, we can analyze, diagnose and edit deep reinforcement learning models using attribution

  119. Muppet: Massive Multi-task Representations with Pre-Finetuning

  120. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

  121. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  122. Reflections After Refereeing Papers for NIPS

  123. Understanding deep learning requires rethinking generalization

  124. Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time

  125. Understanding the generalization of ‘lottery tickets’ in neural networks

  126. Bayesian Deep Learning and a Probabilistic Perspective of Generalization

  127. On Linear Identifiability of Learned Representations

  128. Zoom In: An Introduction to Circuits—By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks

  129. Neural Networks, Manifolds, and Topology

  130. Logarithmic Pruning is All You Need

  131. Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

  132. The Shape of Learning Curves: a Review: 6. Ill-Behaved Learning Curves: 6.1. Phase Transitions

  133. The Brain as a Universal Learning Machine

  134. The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost

  135. Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]

  136. difference#efficient-natural-languages

    [Transclude the forward-link's context]

  137. Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

  138. The Legacy of Hiroshima

  139. Hopfield Networks is All You Need

  140. 2019-radford-figure4-gpt2validationloss.jpg

  141. 2020-brown-figure31-gpt3scaling.png

  142. Building a Large Annotated Corpus of English: The Penn Treebank

  143. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

  144. The LAMBADA dataset: Word prediction requiring a broad discourse context

  145. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf#page=5

  146. Estimation of Gap Between Current Language Models and Human Performance

  147. https://arxiv.org/pdf/2005.14165.pdf&org=openai#page=12

  148. Ilya Sutskever: Deep Learning

  149. If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It's Just Gonna Work Just like Supervised Learning. It's the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It's Not Just Vision, It's Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.

  150. What Could Make AI Conscious?

  151. https://wandb.ai/wandb_fc/gradient-dissent/reports/What-could-make-AI-conscious-with-Wojciech-Zaremba-co-founder-of-OpenAI--Vmlldzo3NDk3MDI

  152. Evolution Strategies as a Scalable Alternative to Reinforcement Learning

  153. Proximal Policy Optimization Algorithms

  154. Are we in an AI overhang?

  155. ‘MoE NN’ tag

  156. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

  157. Why didn’t DeepMind build GPT-3?

  158. Tick, tock, tick, tock… BING

  159. The Teenies

  160. Google DeepMind founder and leader in artificial intelligence returns to Hamilton

  161. Goodbye 2010

  162. When Will the First Artificial General Intelligence System Be Devised, Tested, and Publicly Known Of?

  163. Will AI Progress Surprise Us?

  164. Agent57: Outperforming the human Atari benchmark

  165. ‘How GPT-3 Is Shaping Our AI Future’ With Sam Altman/Azeem Azhar (The Exponential View), Wednesday 7 October 2020

  166. DeepMind Lab

  167. June 2020 News § Companies House

    [Transclude the forward-link's context]

  168. Deep Learning Scaling is Predictable, Empirically

  169. Is Science Slowing Down?

  170. Trust Algorithms? The Army Doesn’t Even Trust Its Own AI Developers

  171. ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale

  172. DeepSpeed: Extreme-scale model training for everyone

  173. When will computer hardware match the human brain?

  174. Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman

  175. What Next? A Dozen Information-Technology Research Goals: 3. Turing’s Vision of Machine Intelligence

  176. Exascale Deep Learning for Scientific Inverse Problems

  177. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

  178. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing

  179. Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

  180. $1998

  181. Peter Norvig, Google’s Director of Research—Singularity Is in the Eye of the Beholder: We'Re Thrilled to Have Peter Norvig Who Join Us to Talk about the Evolution of Deep Learning, His Industry-Defining Book, His Work at Google, and What He Thinks the Future Holds for Machine Learning Research (2020-11-20)

  182. $2012

  183. The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

  184. OpenAI Built Gaming Bots That Can Work As a Team With Inhuman Precision

  185. Can a Machine Learn to Write for The New Yorker? Extraordinary Advances in Machine Learning in Recent Years Have Resulted in A.I.s That Can Write for You.

  186. https://news.ycombinator.com/item?id=9109140

  187. TTTTTackling WinoGrande Schemas

  188. A Review of Winograd Schema Challenge Datasets and Approaches

  189. The Defeat of the Winograd Schema Challenge

  190. One Man’s Modus Ponens

  191. There’s No Fire Alarm for Artificial General Intelligence

  192. Appendix F: Personal Observations on the Reliability of the Shuttle

  193. 2019 News § What Progress?

    [Transclude the forward-link's context]

  194. Don’t Worry—It Can’t Happen

  195. Ra

  196. Reward is enough

  197. gpt-3#roleplaying

    [Transclude the forward-link's context]

  198. Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances

  199. Why Tool AIs Want to Be Agent AIs

  200. Simulators

  201. Here’s Another Stabilized Sky Timelapse, This Time at Crater Lake, Oregon. The Water Was Still for Most of It, Which Created a Nice Mirror for the Stars. I Also Got My Astro-Modified Camera Working, Which Provides More Vibrancy in the Nebulae in the Milky Way. #EppurSiMuove

  202. Star Timelapse Revealing the Earth’s Rotation

  203. ‘Story Of Your Life’ Is Not A Time-Travel Story

  204. Surprisingly Turing-Complete