Bibliography (241):

  1. backstop#deep-bayes

    [Transclude the forward-link's context]

  2. ML Scaling subreddit

  3. It Looks Like You’re Trying To Take Over The World

  4. GPT-3: Language Models are Few-Shot Learners

  5. GPT-3 paper § Figure F.1: Four uncurated completions from a context suggesting the model compose a poem in the style of Wallace Stevens with the title ‘Shadows on the Way’

  6. GPT-3 Creative Fiction

  7. GPT-2 Neural Network Poetry

  8. GPT-3 Github JSON Dump Reformatted to Readable HTML

  9. OpenAI API

  10. Better Language Models and Their Implications

  11. GPT-3 Creative Fiction § BPEs

  12. Using Fast Weights to Attend to the Recent Past

  13. https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AMetaRL&include_over_18=on&restrict_sr=on&sort=top

  14. One-shot Learning with Memory-Augmented Neural Networks

  15. Prefrontal cortex as a meta-reinforcement learning system

  16. Matt Botvinick on the spontaneous emergence of learning algorithms

  17. Reinforcement Learning, Fast and Slow

  18. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

  19. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

  20. One Big Net For Everything

  21. Meta-Learning: Learning to Learn Fast

  22. Meta Reinforcement Learning

  23. Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

  24. GPT-1: Improving Language Understanding with Unsupervised Learning

  25. I Recently Came across Https://arxiv.org/abs/2004.08900, Which ‘Assumes 2-3 Runs’ of T5-11B. In Fact, We Trained T5-11B once. That’s Why We Spend 35 Pages Figuring out How We Should Train Before We Start Training. You Don’t Want to Mess up a Training Run That Big.

  26. CERN makes bold push to build €21-billion supercollider: European particle-physics lab will pursue a 100-kilometre machine to uncover the Higgs boson’s secrets—but it doesn’t yet have the funds

  27. Whole Brain Emulation: A Roadmap

  28. 2019 recent trends in GPU price per FLOPS

  29. Measuring the Algorithmic Efficiency of Neural Networks

  30. Dota 2 With Large Scale Deep Reinforcement Learning § Pg11

  31. D.5: Context Dependence

  32. ‘self-attention’ directory

  33. WBE & DRL: a Middle Way of imitation learning on brains

  34. LHOPT: A Generalizable Approach to Learning Optimizers

  35. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

  36. GPT-3 random sample dump: JavaScript tutorial

  37. On the Measure of Intelligence

  38. Deep Learning Hardware: Past, Present, & Future § Pg60

  39. Technology Forecasting: The Garden of Forking Paths

  40. GPT-3: Language Models Are Few-Shot Learners: 5. Limitations

  41. CTRL: A Conditional Transformer Language Model For Controllable Generation

  42. Towards a Human-like Open-Domain Chatbot

  43. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  44. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  45. Turing-NLG: A 17-billion-parameter language model by Microsoft

  46. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

  47. Extracting Training Data from Large Language Models

  48. Does Learning Require Memorization? A Short Tale about a Long Tail

  49. The Computational Limits of Deep Learning

  50. The Unreasonable Effectiveness of Data

  51. Scaling to Very Very Large Corpora for Natural Language Disambiguation

  52. Large Language Models in Machine Translation

  53. 2017-koehn-figure3-bleuscoreswithvaryingamountsoftrainingdata.png

  54. WinoGrande: An Adversarial Winograd Schema Challenge at Scale

  55. Total Compute Used to Train Language Model: Table D.1

  56. AI and Compute

  57. OpenAI's GPT-3 Language Model: A Technical Overview

  58. People I Know at OpenAI Say V4 Is around the Corner and Easily Doable, And...will Be Here Soon (Not Months but Year or So). And They Are Confident It Will Scale and Be around 100--1000×.

  59. Microsoft announces new supercomputer, lays out vision for future AI work

  60. Scaling Laws for Neural Language Models

  61. Scaling Laws for Neural Language Models: Figure 1: Language Modeling Performance Improves Smoothly As We Increase the Model Size, Dataset Size, and Amount of Compute Used for Training.

  62. Scaling Laws for Neural Language Models: Figure 15: Far beyond the Model Sizes We Study Empirically, We Find a Contradiction between Our Equations § Pg17

  63. https://arxiv.org/pdf/2005.14165.pdf#page=11&org=openai

  64. Table 2.2: Datasets Used to Train GPT-3. ‘Weight in Training Mix’ Refers to the Fraction of Examples during Training That Are Drawn from a given Dataset, Which We Intentionally Do Not Make Proportional to the Size of the Dataset. As a Result, When We Train for 300 Billion Tokens, Some Datasets Are Seen up to 3.4 times during Training While Other Datasets Are Seen Less Than Once.

  65. 2020-adiwardana-meena-figure1-humanratingsvslikelihood.png

  66. 2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.jpg

  67. 2020-hendrycks-figure1b-gpt3-qascaling.png

  68. MMLU: Measuring Massive Multitask Language Understanding

  69. https://x.com/geoffreyhinton/status/1270814602931187715

  70. The Bitter Lesson

  71. Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples

  72. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  73. Generative Language Modeling for Automated Theorem Proving

  74. The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing

  75. Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

  76. ‘MLP NN’ directory

  77. A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

  78. Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size

  79. How AI Training Scales

  80. BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M

  81. Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

  82. NVAE: A Deep Hierarchical Variational Autoencoder

  83. Big Transfer (BiT): General Visual Representation Learning

  84. Are we done with ImageNet?

  85. On Robustness and Transferability of Convolutional Neural Networks

  86. Robustness properties of Facebook’s ResNeXt WSL models

  87. Self-training with Noisy Student improves ImageNet classification

  88. Measuring Robustness to Natural Distribution Shifts in Image Classification

  89. Understanding Robustness of Transformers for Image Classification

  90. Distilling the Knowledge in a Neural Network

  91. Smooth Adversarial Training

  92. 12-in-1: Multi-Task Vision and Language Representation Learning

  93. VideoBERT: A Joint Model for Video and Language Representation Learning

  94. The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism

  95. High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

  96. Grandmaster level in StarCraft II using multi-agent reinforcement learning

  97. One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

  98. A Style-Based Generator Architecture for Generative Adversarial Networks

  99. A simple neural network module for relational reasoning

  100. Neural scene representation and rendering

  101. Transformers as Soft Reasoners over Language

  102. Environmental drivers of systematicity and generalization in a situated agent

  103. Gated-Attention Architectures for Task-Oriented Language Grounding

  104. Interactive Grounded Language Acquisition and Generalization in a 2D World

  105. Compositional generalization through meta sequence-to-sequence learning

  106. Imitating Interactive Intelligence

  107. Solving Rubik’s Cube with a Robot Hand

  108. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  109. DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

  110. Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills

  111. Understanding RL Vision: With diverse environments, we can analyze, diagnose and edit deep reinforcement learning models using attribution

  112. Muppet: Massive Multi-task Representations with Pre-Finetuning

  113. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

  114. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  115. Reflections After Refereeing Papers for NIPS

  116. Understanding deep learning requires rethinking generalization

  117. Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time

  118. Understanding the generalization of ‘lottery tickets’ in neural networks

  119. Bayesian Deep Learning and a Probabilistic Perspective of Generalization

  120. On Linear Identifiability of Learned Representations

  121. Zoom In: An Introduction to Circuits—By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks

  122. Neural Networks, Manifolds, and Topology

  123. Logarithmic Pruning is All You Need

  124. Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

  125. The Shape of Learning Curves: a Review: 6. Ill-Behaved Learning Curves: 6.1. Phase Transitions

  126. The Brain as a Universal Learning Machine

  127. The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost

  128. Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]

  129. difference#efficient-natural-languages

    [Transclude the forward-link's context]

  130. Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

  131. The Legacy of Hiroshima

  132. Hopfield Networks is All You Need

  133. 2019-radford-figure4-gpt2validationloss.jpg

  134. 2020-brown-figure31-gpt3scaling.png

  135. Building a Large Annotated Corpus of English: The Penn Treebank

  136. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

  137. The LAMBADA dataset: Word prediction requiring a broad discourse context

  138. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf#page=5

  139. Estimation of Gap Between Current Language Models and Human Performance

  140. https://arxiv.org/pdf/2005.14165.pdf&org=openai#page=12

  141. Ilya Sutskever: Deep Learning

  142. If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It’s Just Gonna Work Just like Supervised Learning. It’s the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It’s Not Just Vision, It’s Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.

  143. What Could Make AI Conscious?

  144. https://wandb.ai/wandb_fc/gradient-dissent/reports/What-could-make-AI-conscious-with-Wojciech-Zaremba-co-founder-of-OpenAI--Vmlldzo3NDk3MDI

  145. Evolution Strategies as a Scalable Alternative to Reinforcement Learning

  146. Proximal Policy Optimization Algorithms

  147. Are we in an AI overhang?

  148. ‘MoE NN’ directory

  149. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

  150. Why didn’t DeepMind build GPT-3?

  151. Tick, tock, tick, tock… BING

  152. The Teenies

  153. Google DeepMind founder and leader in artificial intelligence returns to Hamilton

  154. Goodbye 2010

  155. When Will the First Artificial General Intelligence System Be Devised, Tested, and Publicly Known Of?

  156. Will AI Progress Surprise Us?

  157. Agent57: Outperforming the human Atari benchmark

  158. ‘How GPT-3 Is Shaping Our AI Future’ With Sam Altman/Azeem Azhar (The Exponential View), Wednesday 7 October 2020

  159. DeepMind Lab

  160. June 2020 News § Companies House

    [Transclude the forward-link's context]

  161. Deep Learning Scaling is Predictable, Empirically

  162. Is Science Slowing Down?

  163. Trust Algorithms? The Army Doesn’t Even Trust Its Own AI Developers

  164. ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale

  165. DeepSpeed: Extreme-scale model training for everyone

  166. When will computer hardware match the human brain?

  167. Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman

  168. What Next? A Dozen Information-Technology Research Goals: 3. Turing’s Vision of Machine Intelligence

  169. Exascale Deep Learning for Scientific Inverse Problems

  170. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

  171. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing

  172. Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

  173. Peter Norvig, Google’s Director of Research—Singularity Is in the Eye of the Beholder: We'Re Thrilled to Have Peter Norvig Who Join Us to Talk about the Evolution of Deep Learning, His Industry-Defining Book, His Work at Google, and What He Thinks the Future Holds for Machine Learning Research (2020-11-20)

  174. The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design

  175. OpenAI Built Gaming Bots That Can Work As a Team With Inhuman Precision

  176. Can a Machine Learn to Write for The New Yorker? Extraordinary Advances in Machine Learning in Recent Years Have Resulted in AIs That Can Write for You.

  177. https://news.ycombinator.com/item?id=9109140

  178. TTTTTackling WinoGrande Schemas

  179. A Review of Winograd Schema Challenge Datasets and Approaches

  180. The Defeat of the Winograd Schema Challenge

  181. One Man’s Modus Ponens

  182. There’s No Fire Alarm for Artificial General Intelligence

  183. Appendix F: Personal Observations on the Reliability of the Shuttle

  184. 2019 News § What Progress?

    [Transclude the forward-link's context]

  185. Don’t Worry—It Can’t Happen

  186. Ra

  187. Reward is enough

  188. gpt-3#roleplaying

    [Transclude the forward-link's context]

  189. Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances

  190. Why Tool AIs Want to Be Agent AIs

  191. Simulators

  192. Here’s Another Stabilized Sky Timelapse, This Time at Crater Lake, Oregon. The Water Was Still for Most of It, Which Created a Nice Mirror for the Stars. I Also Got My Astro-Modified Camera Working, Which Provides More Vibrancy in the Nebulae in the Milky Way. #EppurSiMuove

  193. Star Timelapse Revealing the Earth’s Rotation

  194. ‘Story Of Your Life’ Is Not A Time-Travel Story

  195. Surprisingly Turing-Complete

  196. Wikipedia Bibliography:

    1. PDP-11

    2. Lisp machine

    3. ITER

    4. Superconducting Super Collider

    5. Experience curve effects

    6. OpenAI Five

    7. Neural scaling law  :

    8. Winograd schema challenge

    9. Curse of dimensionality § Blessing of dimensionality  :

    10. Great Oxidation Event  :

    11. Niels Bohr

    12. Edward Teller

    13. Brown Corpus

    14. Norbert Wiener

    15. The Human Use of Human Beings

    16. Wojciech Zaremba

    17. Demis Hassabis

    18. Shane Legg

    19. Google DeepMind

    20. Summit (supercomputer)

    21. AlexNet

    22. Peter Norvig

    23. Lukas Biewald  :

    24. ImageNet

    25. Fei-Fei Li  :

    26. Activation function

    27. Sigmoid function

    28. Rectifier (neural networks)

    29. Stochastic gradient descent

    30. Dilution (neural networks)

    31. Geoffrey Hinton

    32. Exclusive or

    33. Intel 8087  :

    34. Coprocessor  :

    35. Ampere (microarchitecture)

    36. Shaka  :

    37. Daniel Dennett

    38. Intentional stance  :

    39. Principle of minimum energy  :

    40. Fermat's principle

    41. Variational principle

    42. Cellular automaton

    43. Conway’s Game of Life

    44. Chunking (psychology)

    45. Glider (Conway’s Game of Life)  :

    46. Still life (cellular automaton)  :