Bibliography (330):

  1. scaling-hypothesis#blessings-of-scale

    [Transclude the forward-link's context]

  2. https://cse-robotics.engr.tamu.edu/dshell/cs689/papers/anderson72more_is_different.pdf

  3. Appendix F: Personal Observations on the Reliability of the Shuttle

  4. unseeing#confirmation-bias

    [Transclude the forward-link's context]

  5. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

  6. BMT: Binarized Neural Machine Translation

  7. Absolute Unit NNs: Regression-Based MLPs for Everything

  8. This Section Presents an Expanded (But Still Quite Compact) Version of the Terse ConvMixer Implementation That We Presented in the Paper. The Code Is given in **Figure 7**. We Also Present an Even More Terse Implementation in **Figure 8**, Which to the Best of Our Knowledge Is the First Model That Achieves the Elusive Dual Goals of 80%+ ImageNet Top-1 Accuracy While Also Fitting into a Tweet.

  9. Rip van Winkle’s Razor, a Simple New Estimate for Adaptive Data Analysis

  10. Reward is enough

  11. backstop#clune-2019

    [Transclude the forward-link's context]

  12. ‘meta-learning’ directory

  13. Meta Learning Backpropagation And Improving It

  14. BLUR: Meta-Learning Bidirectional Update Rules

  15. PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management

  16. Pathways: Asynchronous Distributed Dataflow for ML

  17. DeepSpeed: Accelerating Large-Scale Model Inference and Training via System Optimizations and Compression

  18. ZeRO-Infinity and DeepSpeed: Unlocking Unprecedented Model Scale for Deep Learning Training

  19. GSPMD: General and Scalable Parallelization for ML Computation Graphs

  20. Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

  21. Efficient Large-Scale Language Model Training on GPU Clusters

  22. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

  23. There’s plenty of room at the Top: What will drive computer performance after Moore’s law?

  24. Moore’s Law, AI, and the pace of Progress

  25. Effect of scale on catastrophic forgetting in neural networks

  26. Slowing Moore’s Law: How It Could Happen

  27. Pony Preservation Project Panel 2021—FULL

  28. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

  29. Distributed Deep Learning in Open Collaborations

  30. DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications

  31. ‘experience curve’ directory

  32. AI and Efficiency: We’re releasing an analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months

  33. Measuring the Algorithmic Efficiency of Neural Networks

  34. Robert Oppenheimer

  35. DeepMind and Google: the battle to control artificial intelligence. Demis Hassabis founded a company to build the world’s most powerful AI. Then Google bought him out. Hal Hodson asks who is in charge

  36. An Empirical Model of Large-Batch Training

  37. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

  38. https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization#bdzbeD9YvarEEopCq

  39. One Big Net For Everything

  40. Scaling Laws for Language Transfer Learning

  41. Scaling Laws for Transfer

  42. Why Tool AIs Want to Be Agent AIs

  43. Risks from Learned Optimization in Advanced Machine Learning Systems

  44. index#decisiontransformer-blog-section

    [Transclude the forward-link's context]

  45. https://x.com/arankomatsuzaki/status/1399471244760649729

  46. Codex: Evaluating Large Language Models Trained on Code: Figure 14: When the Prompt Includes Subtle Bugs, Codex Tends to Produce Worse Code Than It Is Capable of Producing. This Gap Increases With Model Size. Including an Instruction to Write Correct Code Helps a Little but Does Not Fix the Problem. Even With No Examples in the Context, Codex Produces Substantially Worse Code Than It Is Capable Of.

  47. gpt-3#roleplaying

    [Transclude the forward-link's context]

  48. The Basic AI Drives

  49. Human-level performance in 3D multiplayer games with population-based reinforcement learning

  50. Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

  51. Gato: A Generalist Agent

  52. GPT-3: Language Models are Few-Shot Learners

  53. DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

  54. BEiT: BERT Pre-Training of Image Transformers

  55. WebGPT: Browser-assisted question-answering with human feedback

  56. Boosting Search Engines with Interactive Agents

  57. A data-driven approach for learning to control computers

  58. Player of Games

  59. Open-Ended Learning Leads to Generally Capable Agents

  60. From Motor Control to Team Play in Simulated Humanoid Football

  61. https://deepmind.google/discover/blog/learning-robust-real-time-cultural-transmission-without-human-data/

  62. Grounded Language Learning Fast and Slow

  63. Imitating Interactive Intelligence

  64. Learning to Ground Multi-Agent Communication with Autoencoders

  65. Collaborating with Humans without Human Data

  66. Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

  67. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

  68. Off-Belief Learning

  69. Multitasking Inhibits Semantic Drift

  70. What Are Bayesian Neural Network Posteriors Really Like?

  71. ‘Codex’ directory

  72. Competitive Programming With AlphaCode

  73. ‘MuZero’ directory

  74. Evolving Normalization-Activation Layers

  75. index#mlp-mixer-why-now

    [Transclude the forward-link's context]

  76. NVAE: A Deep Hierarchical Variational Autoencoder

  77. Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

  78. R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

  79. Cores that don’t count

  80. Microsoft researchers win ImageNet computer vision challenge

  81. Deep Residual Learning for Image Recognition

  82. Learning To Tell Two Spirals Apart

  83. A Recipe for Training Neural Networks

  84. Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior

  85. Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets

  86. The Shape of Learning Curves: a Review

  87. The Phase Transition In Human Cognition

  88. https://www.reddit.com/r/mlscaling/comments/sjzvl0/d_instances_of_nonlog_capability_spikes_or/

  89. In-Context Learning and Induction Heads

  90. The Bitter Lesson

  91. ‘MLP NN’ directory

  92. The Brain as a Universal Learning Machine

  93. Magna Alta Doctrina

  94. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

  95. ‘inner monologue (AI)’ directory

  96. HyperNetworks

  97. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

  98. https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=AAC8jKeDp6xqsZK2K

  99. Shaking the foundations: delusions in sequence models for interaction and control

  100. https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

  101. ‘preference learning’ directory

  102. tank#alternative-examples

    [Transclude the forward-link's context]

  103. 2015-01-28-spidermanandthexmen-vol1-no2-sauron-cancerdinosaurs.jpg

  104. Friendship Is Optimal (Fanfic)

  105. Mathematics on a Distant Planet

  106. Don’t Worry—It Can’t Happen

  107. Now You Can (Try To) Serve Five Terabytes, Too

  108. I Just Want to Serve 5 Terabytes

  109. L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm

  110. Blowing the Lid off the CryptoNote/Bytecoin Scam (With the Exception of Monero)

  111. Rekt - Value DeFi

  112. Really Stupid ‘Smart Contract’ Bug Let Hackers Steal $31 Million in Digital Coin

  113. Crypto Firm Nomad Loses Nearly $200 Million in Bridge Hack

  114. Today’s LiFi hack happened because its internal swap() function would call out to any address using whatever message the attacker passed in

  115. https://milksad.info/disclosure.html

  116. AI Accelerators, Part IV: The Very Rich Landscape

  117. TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs

  118. It’s All about the Benjamins: An Empirical Study on Incentivizing Users to Ignore Security Advice

  119. A Style-Based Generator Architecture for Generative Adversarial Networks

  120. Scammers Created an AI Hologram of Me to Scam Unsuspecting Projects

  121. A Field Guide to Federated Optimization

  122. Net2Net: Accelerating Learning via Knowledge Transfer

  123. M6–10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining

  124. Dota 2 With Large Scale Deep Reinforcement Learning § Pg11

  125. Scaling Scaling Laws with Board Games

  126. https://openai.com/research/formal-math

  127. A Universal Law of Robustness via Isoperimetry

  128. The Dirty Pipe Vulnerability

  129. Surprisingly Turing-Complete

  130. CVE-2022-21449: Psychic Signatures in Java

  131. It Is Nevertheless Funny That There Is a Wycheproof Test for This Bug (Of Course There Is, It’s the Most Basic Implementation Check in ECDSA) and Nobody Bothered to Run It against One of the Most Important ECDSA’s Until Now.

  132. dnm-archive#logout

    [Transclude the forward-link's context]

  133. https://x.com/rombulow/status/990684453734203392

  134. How Many Computers Are In Your Computer?

  135. https://msrc.microsoft.com/update-guide/en-US/vulnerability/CVE-2022-34718

  136. Indefinite survival through backup copies

  137. 2004-perry.html

  138. ‘NN sparsity’ directory

  139. On the Predictability of Pruning Across Scales

  140. Knowledge distillation: A good teacher is patient and consistent

  141. https://x.com/thiteanish/status/1635188333705043969

  142. Community Alert: Ronin Validators Compromised

  143. complexity#control

    [Transclude the forward-link's context]

  144. July 2020 News § ‘Modeling the Human Trajectory’

    [Transclude the forward-link's context]

  145. https://bullfrogreview.substack.com/p/honey-i-hacked-the-empathy-machine

  146. Hackers Gaining Power of Subpoena Via Fake ‘Emergency Data Requests’

  147. Apple and Meta Gave User Data to Hackers Who Used Forged Legal Requests: Hackers compromised the emails of law enforcement agencies; Data was used to enable harassment, may aid financial fraud

  148. GPT-3 Creative Fiction § Literary Parodies

    [Transclude the forward-link's context]

  149. Uber Apparently Hacked by Teen, Employees Thought It Was a Joke: ‘I Think IT Would Appreciate Less Memes While They Handle the Breach’

  150. The Radicalization Risks of GPT-3 and Advanced Neural Language Models

  151. Computer Optimization: Your Computer Is Faster Than You Think

  152. OpenAI Five: 2016–2019

  153. Grandmaster level in StarCraft II using multi-agent reinforcement learning

  154. scaling-hypothesis#meta-learning

    [Transclude the forward-link's context]

  155. The Billion Dollar AI Problem That Just Keeps Scaling

  156. Fermi Estimate of Future Training Runs

  157. https://cset.georgetown.edu/wp-content/uploads/AI-and-Compute-How-Much-Longer-Can-Computing-Power-Drive-Artificial-Intelligence-Progress.pdf

  158. Factored Cognition

  159. CycleGAN, a Master of Steganography

  160. The Toxoplasma Of Rage

  161. Duty Calls

  162. Sort By Controversial

  163. Specialist Ukrainian Drone Unit Picks off Invading Russian Forces As They Sleep

  164. https://www.amazon.com/Genius-Makers-Mavericks-Brought-Facebook/dp/1524742678

  165. CoreWeave

  166. Target Hackers Broke in Via HVAC Company

  167. Chinese Spies Hacked a Livestock App to Breach US State Networks: Vulnerabilities in Animal Tracking Software USAHERDS and Log4j Gave the Notorious APT41 Group a Foothold in Multiple Government Systems.

  168. Supply chain attacks

  169. China Has Already Reached Exascale—On Two Separate Systems

  170. https://x.com/ID_AA_Carmack/status/1300280139717189640

  171. NYU Accidentally Exposed Military Code-Breaking Computer Project to Entire Internet

  172. Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask

  173. Fast Stencil-Code Computation on a Wafer-Scale Processor

  174. UL2: Unifying Language Learning Paradigms

  175. Chinchilla: Training Compute-Optimal Large Language Models

  176. Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

  177. Scaling Laws for Deep Learning

  178. Extrapolating GPT-N performance

  179. https://www.quantamagazine.org/computer-scientists-achieve-crown-jewel-of-cryptography-20201110/

  180. ‘tech economics’ directory

  181. AI and Compute

  182. https://arxiv.org/pdf/2108.07686.pdf#page=85

  183. PILCO: A Model-Based and Data-Efficient Approach to Policy Search

  184. Agile Locomotion via Model-Free Learning

  185. Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

  186. Solving Rubik’s Cube with a Robot Hand

  187. Learning agile and dynamic motor skills for legged robots

  188. Learning robust perceptive locomotion for quadrupedal robots in the wild

  189. Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

  190. Decoupled Neural Interfaces using Synthetic Gradients

  191. Cerebro-cerebellar networks facilitate learning through feedback decoupling

  192. ‘end-to-end’ directory

  193. Russia Will Probably Legalize Some Software Piracy to Mitigate Sanctions

  194. Russian Government Rolls Back Intellectual Property Rights in Response to Western Sanctions

  195. Complexity no Bar to AI

  196. Cellular automata as convolutional neural networks

  197. Differentiable Self-Organizing Systems

  198. Self-Organising Textures

  199. Growing Neural Cellular Automata: Differentiable Model of Morphogenesis

  200. Adversarial Reprogramming of Neural Cellular Automata

  201. Regenerating Soft Robots through Neural Cellular Automata

  202. Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

  203. Texture Generation with Neural Cellular Automata

  204. Variational Neural Cellular Automata

  205. The Future of Artificial Intelligence Is Self-Organizing and Self-Assembling

  206. 𝜇NCA: Texture Generation with Ultra-Compact Neural Cellular Automata

  207. Bioelectric Networks: Taming the Collective Intelligence of Cells for Regenerative Medicine

  208. On Having No Head: Cognition throughout Biological Systems

  209. https://www.quantamagazine.org/flying-fish-and-aquarium-pets-yield-secrets-of-evolution-20220105/

  210. Synthetic living machines: A new window on life

  211. Fundamental behaviors emerge from simulations of a living minimal cell

  212. An Account of Electricity and the Body, Reviewed

  213. Is Bioelectricity the Key to Limb Regeneration?

  214. ‘Amazing Science’: Researchers Find Xenobots Can Give Rise to Offspring Science

  215. Perceptein: A synthetic protein-level neural network in mammalian cells

  216. Living Robots Made from Frog Cells Can Replicate Themselves in a Dish

  217. Cells Form Into ‘Xenobots’ on Their Own: Embryonic cells can self-assemble into new living forms that don’t resemble the bodies they usually generate, challenging old ideas of what defines an organism

  218. 9 Missile Commanders Fired, Others Disciplined In Air Force Scandal

  219. Security Troops on US Nuclear Missile Base Took LSD

  220. Amazing Details from the Drunken Moscow Bender That Got an Air Force General Fired

  221. Joan Rohlfing on how to avoid catastrophic nuclear blunders: The interaction between nuclear weapons and cybersecurity

  222. Hacking the Bomb: Cyber Threats and Nuclear Weapons

  223. The Curious Case of the Accidental Indian Missile Launch

  224. ‘illusion-of-depth bias’ directory

  225. https://arxiv.org/pdf/2109.01517#page=12

  226. Colab Notebook: HQU-V3.4-Light (Jax TPU)

  227. Clippy Desktop Assistant

  228. https://www.aleph.se/papers/Spamming%20the%20universe.pdf

  229. Advantages of Artificial Intelligences, Uploads, and Digital Minds

  230. Intelligence Explosion Microeconomics

  231. There is plenty of time at the bottom: the economics, risk and ethics of time compression

  232. AI Takeoff Tag

  233. Fiction Relevant to AI Futurism

  234. Understand

  235. Slow Tuesday Night

  236. That Alien Message

  237. https://www.ssec.wisc.edu/~billh/g/mcnrsts.html

  238. AI Takeoff Story: a Continuation of Progress by Other Means

  239. Optimality Is the Tiger, and Agents Are Its Teeth

  240. https://press.asimov.com/resources/tinker

  241. https://x.com/robbensinger/status/1503220020175769602

  242. AGI Ruin: A List of Lethalities

  243. Without Specific Countermeasures, the Easiest Path to Transformative AI Likely Leads to AI Takeover

  244. http://skynetsimulator.com/

  245. How AI Takeover Might Happen in 2 Years § Pandora’s 1 GW Box

  246. ML Scaling subreddit

  247. It Looks Like You'Re Trying To Take Over The World

  248. Shah and Yudkowsky on Alignment Failures

  249. https://www.reddit.com/r/slatestarcodex/comments/tag4lm/it_looks_like_youre_trying_to_take_over_the_world/

  250. https://www.reddit.com/r/rational/comments/ta57ag/it_looks_like_youre_trying_to_take_over_the_world/

  251. https://news.ycombinator.com/item?id=30818895

  252. https://news.ycombinator.com/item?id=34808718#34809360

  253. Wikipedia Bibliography:

    1. Neural scaling law  :

    2. Defamiliarization

    3. All your base are belong to us  :

    4. Variance

    5. Backpropagation

    6. System accident  :

    7. Common Crawl

    8. Office Assistant  :

    9. Universal Paperclips  :

    10. Evidential decision theory  :

    11. Wirehead (science fiction)

    12. Expected value

    13. Logit

    14. SQL

    15. SQL injection  :

    16. Metasploit  :

    17. Tornado Cash  :

    18. Panama Papers  :

    19. Paradise Papers  :

    20. Pandora Papers  :

    21. IOTA (technology)  :

    22. Social engineering (security)

    23. Federated learning

    24. Diminishing returns

    25. Elo rating system

    26. Log4Shell  :

    27. JASBUG  :

    28. Spectre (security vulnerability)

    29. Unreachable code § goto fail bug  :

    30. Random number generator attack § Debian OpenSSL  :

    31. Heartbleed

    32. Shellshock (software bug)  :

    33. Teleprinter § Teleprinters in computing  :

    34. Phishing § Spear phishing  :

    35. Great Oxidation Event  :

    36. Human evolution  :

    37. Neolithic Revolution

    38. Industrial Revolution

    39. Warhol worm  :

    40. Storm oil  :

    41. Brandolini's law  :

    42. Mirai (malware)  :

    43. 2020 Twitter account hijacking  :

    44. Amdahl’s law

    45. Iran–U.S. RQ-170 incident  :

    46. Supply chain attack  :

    47. Fluorinert  :

    48. Seymour Cray

    49. Chudnovsky brothers  :

    50. Renaissance Technologies  :

    51. Jim Simons (mathematician)  :

    52. Flatiron Institute  :

    53. Cerebras § Technology  :

    54. Static random-access memory

    55. Entropy (information theory)

    56. Experience curve effects

    57. Autonomous system (Internet)  :

    58. Operation Barbarossa § Soviet preparations  :

    59. Decentralized autonomous organization

    60. Decentralized finance  :

    61. Elden Ring  :

    62. Demoscene

    63. Fat Leonard scandal  :

    64. Stuxnet

    65. Strava § Privacy concerns  :

    66. Permissive action link  :

    67. Nuclear close calls § 25 October 1962  :

    68. Russian invasion of Ukraine  :

    69. Nuclear close calls § 9 November 1979  :

    70. 1983 Soviet nuclear false alarm incident  :

    71. 2018 Hawaii false missile alert § The alert  :

    72. 2017–2018 North Korea crisis  :

    73. Launch on warning § History  :

    74. Dead Hand  :

    75. Self-replicating spacecraft  :

    76. R. A. Lafferty

    77. Accelerando  :

    78. The Last Question  :