Bibliography:

  1. ‘AI scaling’ tag

  2. ‘reduced-precision NNs’ tag

  3. Hardware Hedging Against Scaling Regime Shifts

  4. Computer Optimization: Your Computer Is Faster Than You Think

  5. Slowing Moore’s Law: How It Could Happen

  6. Why a US AI "Manhattan Project" could backfire: notes from conversations in China

  7. Getting AI Datacenters in the UK: Why the UK Needs to Create Special Compute Zones; and How to Do It

  8. The Future of Compute: Nvidia’s Crown Is Slipping

  9. Jake Sullivan: The American Who Waged a Tech War on China

  10. Nvidia’s AI chips are cheaper to rent in China than US: Supply of processors helps Chinese start-ups advance artificial intelligence technology despite Washington’s restrictions

  11. Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine

  12. Chips or Not, Chinese AI Pushes Ahead: A host of Chinese AI startups are attempting to write more efficient code for large language models

  13. Can AI Scaling Continue Through 2030?

  14. 5c00a88806d4f5be233a817b199df13bc601f299.html

  15. UK Government Shelves £1.3bn UK Tech and AI Plans

  16. OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

  17. Huawei Faces Production Challenges with 20% Yield Rate for AI Chip

  18. RAM is practically endless now

  19. Huawei ‘Unable to Secure 3.5 Nanometer Chips’

  20. China Is Losing the Chip War: Xi Jinping picked a fight over semiconductor technology—one he can’t win

  21. Scalable Matmul-free Language Modeling

  22. Elon Musk ordered Nvidia to ship thousands of AI chips reserved for Tesla to Twitter/xAI

  23. Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion

  24. Microsoft, OpenAI plan $100 billion data-center project, media report says

  25. AI and Memory Wall

  26. Singapore’s Temasek in discussions to invest in OpenAI: State-backed group in talks with ChatGPT maker’s chief Sam Altman who is seeking funding to build chips business

  27. China’s military and government acquire Nvidia chips despite US ban

  28. Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

  29. Real-Time AI & The Future of AI Hardware

  30. OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman

  31. How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?

  32. Microsoft Swallows OpenAI’s Core Team § Compute Is King

  33. Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman was fundraising in the Middle East for new chip venture; The project, code-named Tigris, is intended to rival Nvidia

  34. DiLoCo: Distributed Low-Communication Training of Language Models

  35. LSS Transformer: Ultra-Long Sequence Distributed Transformer

  36. ChipNeMo: Domain-Adapted LLMs for Chip Design

  37. GPT-5 hardware rumor

  38. Saudi-China collaboration raises concerns about access to AI chips: Fears grow at Gulf kingdom’s top university that ties to Chinese researchers risk upsetting US government

  39. Efficient Video and Audio processing with Loihi 2

  40. Biden Is Beating China on Chips. It May Not Be Enough.

  41. Deep Mind’s chief on AI’s dangers—and the UK’s £900 million supercomputer: Demis Hassabis says we shouldn’t let AI fall into the wrong hands and the government’s plan to build a supercomputer for AI is likely to be out of date before it has even started

  42. Inflection AI announces $1.3 billion of funding led by current investors, Microsoft, and NVIDIA

  43. U.S. Considers New Curbs on AI Chip Exports to China: Restrictions come amid concerns that China could use AI chips from Nvidia and others for weapon development and hacking

  44. Unleashing True Utility Computing with Quicksand

  45. The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like toilet paper during the pandemic.’ Startups, investors scrounge for computational firepower

  46. Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing

  47. Context on the NVIDIA ChatGPT opportunity—and ramifications of large language model enthusiasm

  48. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

  49. Microsoft and OpenAI extend partnership

  50. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

  51. Efficiently Scaling Transformer Inference

  52. Reserve Capacity of NVIDIA HGX H100s on CoreWeave Now: Available at Scale in Q1 2023 Starting at $2.23/hr

  53. 674677f3e8ac350ac136e733bd3264f75027595a.html

  54. Petals: Collaborative Inference and Fine-tuning of Large Models

  55. Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

  56. Is Integer Arithmetic Enough for Deep Learning Training?

  57. Efficient NLP Inference at the Edge via Elastic Pipelining

  58. Training Transformers Together

  59. Tutel: Adaptive Mixture-of-Experts at Scale

  60. 8-bit Numerical Formats for Deep Neural Networks

  61. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

  62. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

  63. A Low-latency Communication Design for Brain Simulations

  64. Reducing Activation Recomputation in Large Transformer Models

  65. What Language Model to Train if You Have One Million GPU Hours?

  66. Monarch: Expressive Structured Matrices for Efficient and Accurate Training

  67. Pathways: Asynchronous Distributed Dataflow for ML

  68. LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

  69. Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

  70. Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

  71. Introducing the AI Research SuperCluster—Meta’s cutting-edge AI supercomputer for AI research

  72. Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask

  73. Spiking Neural Networks and Their Applications: A Review

  74. On the Working Memory of Humans and Great Apes: Strikingly Similar or Remarkably Different?

  75. Sustainable AI: Environmental Implications, Challenges and Opportunities

  76. China Has Already Reached Exascale—On Two Separate Systems

  77. The Efficiency Misnomer

  78. Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

  79. WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

  80. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

  81. PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management

  82. Demonstration of Decentralized, Physics-Driven Learning

  83. Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

  84. First-Generation Inference Accelerator Deployment at Facebook

  85. Single-chip photonic deep neural network for instantaneous image classification

  86. Distributed Deep Learning in Open Collaborations

  87. Ten Lessons From Three Generations Shaped Google’s TPUv4i

  88. Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks

  89. 2.5-dimensional distributed model training

  90. A Full-stack Accelerator Search Technique for Vision Applications

  91. ChinAI #141: The PanGu Origin Story: Notes from an informative Zhihu Thread on PanGu

  92. GSPMD: General and Scalable Parallelization for ML Computation Graphs

  93. PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

  94. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

  95. How to Train BERT with an Academic Budget

  96. Podracer architectures for scalable Reinforcement Learning

  97. High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)

  98. An Efficient 2D Method for Training Super-Large Deep Learning Models

  99. Efficient Large-Scale Language Model Training on GPU Clusters

  100. Large Batch Simulation for Deep Reinforcement Learning

  101. Warehouse-Scale Video Acceleration (Argos): Co-design and Deployment in the Wild

  102. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

  103. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

  104. ZeRO-Offload: Democratizing Billion-Scale Model Training

  105. The Design Process for Google’s Training Chips: TPUv2 and TPUv3

  106. Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment

  107. Parallel Training of Deep Networks with Local Updates

  108. Exploring the limits of Concurrency in ML Training on Google TPUs

  109. BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters

  110. Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

  111. Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?

  112. Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

  113. L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm

  114. Interlocking Backpropagation: Improving depthwise model-parallelism

  115. DeepSpeed: Extreme-scale model training for everyone

  116. Measuring hardware overhang

  117. The Node Is Nonsense: There are better ways to measure progress than the old Moore’s law metric

  118. Are we in an AI overhang?

  119. HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks

  120. The Computational Limits of Deep Learning

  121. Data Movement Is All You Need: A Case Study on Optimizing Transformers

  122. PyTorch Distributed: Experiences on Accelerating Data Parallel Training

  123. Japanese Supercomputer Is Crowned World’s Speediest: In the race for the most powerful computers, Fugaku, a Japanese supercomputer, recently beat American and Chinese machines

  124. Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS with Asynchronous Reinforcement Learning

  125. PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training

  126. There’s plenty of room at the Top: What will drive computer performance after Moore’s law?

  127. A domain-specific supercomputer for training deep neural networks

  128. Microsoft announces new supercomputer, lays out vision for future AI work

  129. AI and Efficiency: We’re releasing an analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months

  130. Computation in the human cerebral cortex uses less than 0.2 watts yet this great expense is optimal when considering communication costs

  131. Startup Tenstorrent shows AI is changing computing and vice versa: Tenstorrent is one of the rush of AI chip makers founded in 2016 and finally showing product. The new wave of chips represent a substantial departure from how traditional computer chips work, but also point to ways that neural network design may change in the years to come

  132. AI Chips: What They Are and Why They Matter—An AI Chips Reference

  133. 2019 recent trends in GPU price per FLOPS

  134. Pipelined Backpropagation at Scale: Training Large Models without Batches

  135. Ultrafast Machine Vision With 2D Material Neural Network Image Sensors

  136. Towards spike-based machine intelligence with neuromorphic computing

  137. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

  138. Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

  139. Energy and Policy Considerations for Deep Learning in NLP

  140. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

  141. GAP: Generalizable Approximate Graph Partitioning Framework

  142. An Empirical Model of Large-Batch Training

  143. Bayesian Layers: A Module for Neural Network Uncertainty

  144. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

  145. Measuring the Effects of Data Parallelism on Neural Network Training

  146. Mesh-TensorFlow: Deep Learning for Supercomputers

  147. There is plenty of time at the bottom: the economics, risk and ethics of time compression

  148. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in 4 Minutes

  149. AI and Compute

  150. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

  151. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning

  152. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

  153. Mixed Precision Training

  154. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

  155. Training Deep Nets with Sublinear Memory Cost

  156. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server

  157. Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing

  158. Communication-Efficient Learning of Deep Networks from Decentralized Data

  159. Persistent RNNs: Stashing Recurrent Weights On-Chip

  160. The Brain as a Universal Learning Machine

  161. Scaling Distributed Machine Learning with the Parameter Server

  162. Multi-column deep neural network for traffic sign classification

  163. Multi-column Deep Neural Networks for Image Classification

  164. Building high-level features using large scale unsupervised learning

  165. Implications of Historical Trends in the Electrical Efficiency of Computing

  166. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

  167. DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification

  168. Goodbye 2010

  169. Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition

  170. The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses

  171. Large-scale deep unsupervised learning using graphics processors

  172. Bandwidth optimal all-reduce algorithms for clusters of workstations

  173. Whole Brain Emulation: A Roadmap

  174. Moore’s Law and the Technology S-Curve

  175. DARPA and the Quest for Machine Intelligence, 1983–1993

  176. Ultimate physical limits to computation

  177. Matrioshka Brains

  178. When will computer hardware match the human brain?

  179. Superhumanism: According to Hans Moravec § AI Scaling

  180. A Sociological Study of the Official History of the Perceptrons Controversy [1993]

  181. Intelligence As an Emergent Behavior; Or, The Songs of Eden

  182. The Role Of RAW POWER In INTELLIGENCE

  183. Brain Performance in FLOPS

  184. 1ed3a397742812a8b113c337d47877fac67cbcb6.html

  185. Google Demonstrates Leading Performance in Latest MLPerf Benchmarks

  186. cfd3f72d37292ced91bee8d23ce0ab0195b78847.html

  187. H100 GPUs Set Standard for Gen AI in Debut MLPerf Benchmark

  188. Introducing Cerebras Inference: AI at Instant Speed

  189. 93054a04376e115e35f9d567e12d546ce9c4794e.html

  190. Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference

  191. NVIDIA Hopper Architecture In-Depth

  192. Trends in GPU Price-Performance

  193. 16ace4644cad33f483db98603b340abcebfc15bd.html

  194. NVIDIA/Megatron-LM: Ongoing Research Training Transformer Models at Scale

  195. 12 Hours Later, Groq Deploys Llama-3-Instruct (8 & 70B)

  196. 662cdeca6b61c88f42902dcfe9e653fb9e51ab40.html

  197. The Technology Behind BLOOM Training

  198. From Bare Metal to a 70B Model: Infrastructure Set-Up and Scripts

  199. AI Accelerators, Part IV: The Very Rich Landscape

  200. NVIDIA Announces DGX H100 Systems – World’s Most Advanced Enterprise AI Infrastructure

  201. NVIDIA Launches UK’s Most Powerful Supercomputer, for Research in AI and Healthcare

  202. Perlmutter, Said to Be the World's Fastest AI Supercomputer, Comes Online

  203. 25111ddc81802255205e696717cbd0c620b30f2f.html

  204. TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs

  205. Cerebras' Tech Trains "Brain-Scale" AIs

  206. f730e692e977a46aa2bf47424079417a4b210706.html

  207. Fugaku Holds Top Spot, Exascale Remains Elusive

  208. ec0272e1a8b474f3f2b9094120e747539af13f94.html

  209. 342 Transistors for Every Person In the World: Cerebras 2nd Gen Wafer Scale Engine Teased

  210. adf4dfb8b81a22a219673037bf7d6eb80069b0ea.html

  211. Jim Keller Becomes CTO at Tenstorrent: "The Most Promising Architecture Out There"

  212. c4047eb71ea08ce3b22e6ff73217bf7d198ebf98.html

  213. NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems

  214. c065db10b68fdd5859b472c3aaf4704e8d9de985.html

  215. Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield

  216. 2645e9a38b13758a78cc776662225fdefe815a9b.html

  217. AMD Announces Instinct MI200 Accelerator Family: Taking Servers to Exascale and Beyond

  218. 1f63181a48b21b558bd7b182b305ec77b032016e.html

  219. NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder

  220. 916e33bcf29ef23efd87485c62df1c46065ffec0.html

  221. Biological Anchors: A Trick That Might Or Might Not Work

  222. Scaling Up and Out: Training Massive Models on Cerebras Systems Using Weight Streaming

  223. Fermi Estimate of Future Training Runs

  224. 406339a5eb84d61715fb9f289c600916f6724037.html

  225. Carl Shulman #2: AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

  226. Etched Is Making the Biggest Bet in AI

  227. de827f3f96e99801324672b5ad20de1f492133fc.html

  228. The Emerging Age of AI Diplomacy: To Compete With China, the United States Must Walk a Tightrope in the Gulf

  229. The Resilience Myth: Fatal Flaws in the Push to Secure Chip Supply Chains

  230. 23abfe70455f34fc99f78a865404c11a68c5f9e2.html

  231. Compute Funds and Pre-Trained Models

  232. 1faf498a4eedbd00fac7473eb720939b629b3953.html

  233. The Next Big Thing: Introducing IPU-POD128 and IPU-POD256

  234. 5448e88bf1ce4bde10ada3402571c9fccdb9d460.html

  235. The WoW Factor: Graphcore Systems Get Huge Power and Efficiency Boost

  236. 4bc9da4f2d802111e0aed36c5066b4e7039d188b.html

  237. AWS Enables 4,000-GPU UltraClusters With New P4 A100 Instances

  238. bf93d271915af698f8243c85c941ab2904cee930.html

  239. Estimating Training Compute of Deep Learning Models

  240. The Colliding Exponentials of AI

  241. Moore's Law, AI, and the pace of Progress

  242. How Fast Can We Perform a Forward Pass?

  243. Two Interviews With the Founder of DeepSeek

  244. 704bf1520821f8df1ea15ce6ed9bf73bd25f313c.html

  245. "AI and Compute" Trend Isn't Predictive of What Is Happening

  246. Brain Efficiency: Much More Than You Wanted to Know

  247. DeepSpeed: Accelerating Large-Scale Model Inference and Training via System Optimizations and Compression

  248. caf5ba6af01512146528779151d7f031661cceb5.html

  249. ZeRO-Infinity and DeepSpeed: Unlocking Unprecedented Model Scale for Deep Learning Training

  250. e57637ecf7a947a4991692445c4acb6565c3e53d.html

  251. The World’s Largest Computer Chip

  252. The Billion Dollar AI Problem That Just Keeps Scaling

  253. 5dc4055df5f5f1842b5cc5c9a7c1cfb739d6530f.html

  254. TSMC Confirms 3nm Tech for 2022, Could Enable Epic 80 Billion Transistor GPUs

  255. 18ccebd223a8f1c7f2cb634a2fb0a2f6daf773c1.html

  256. ORNL’s Frontier First to Break the Exaflop Ceiling

  257. 877f3d7da4f3add6cee170d5591fe6c095ff4b97.html

  258. Returning to Google DeepMind

  259. How to Accelerate Innovation With AI at Scale

  260. 48:44—Tesla Vision · 1:13:12—Planning and Control · 1:24:35—Manual Labeling · 1:28:11—Auto Labeling · 1:35:15—Simulation · 1:42:10—Hardware Integration · 1:45:40—Dojo

  261. We Ran MoE (2048E,60L) With Bfloat16 Activations With Total of 1 Trillion Model Weights. Although Trainable With Manual Diagnostics, With Deep 1 Trillion Model We Encountered Several Trainability Issues With Numerical Stability. Will Follow Up.

  262. design#future-tag-features

    [Transclude the forward-link's context]

  263. https://www.reuters.com/technology/artificial-intelligence/openai-builds-first-chip-with-broadcom-tsmc-scales-back-foundry-ambition-2024-10-29/

  264. 2021-04-12-jensenhuang-gtc2021keynote-eAn_oiZwUXA.en.vtt.txt

  265. 2021-jouppi-table1-keycharacteristicsoftpus.png

  266. 2021-ren-zerooffload-cpugpudataflow.png

  267. 2020-08-05-hippke-measuringhardwareoverhang-chessscaling19902020.png

  268. 2020-kumar-figure11-tpumultipodspeedups.png

  269. 2017-jouppi.pdf

  270. 1998-moravec-figure2-evolutionofcomputerpowercost19001998.csv

  271. 1998-moravec-figure2-evolutionofcomputerpowercost19001998.jpg

  272. 1998-moravec-figure3-peakcomputeuseinai19501998.jpg

  273. https://ai.facebook.com/blog/meta-training-inference-accelerator-AI-MTIA/

  274. https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/

  275. 52db30d34a0a4233d55f023a38836da9078a751b.html

  276. https://andromeda.ai/

  277. https://apps.fz-juelich.de/jsc/hps/juwels/configuration.html#hardware-configuration-of-the-system-name-booster-module

  278. 4234343ce438622214a38f09f3c93ebc81081679.html#hardware-configuration-of-the-system-name-booster-module

  279. https://austinvernon.site/blog/datacenterpv.html

  280. df3b5aa1bb18d79d43548786c880b7f210e8a503.html

  281. https://blogs.nvidia.com/blog/2021/04/12/cpu-grace-cscs-alps/

  282. https://blogs.nvidia.com/blog/2022/09/08/hopper-mlperf-inference/

  283. https://carnegieendowment.org/2022/11/22/after-chips-act-limits-of-reshoring-and-next-steps-for-u.s.-semiconductor-policy-pub-88439

  284. 3b3d00ff6d6d59e9adfe4c5e4d1e989281e5ea74.html

  285. https://caseyhandmer.wordpress.com/2024/03/12/how-to-feed-the-ais/

  286. d6dc79bb6aa378546717eb573ce09839552d205f.html

  287. https://chipsandcheese.com/2023/07/02/nvidias-h100-funny-l2-and-tons-of-bandwidth/

  288. https://cloud.google.com/blog/products/compute/the-worlds-largest-distributed-llm-training-job-on-tpu-v5e

  289. https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network

  290. https://cset.georgetown.edu/wp-content/uploads/AI-and-Compute-How-Much-Longer-Can-Computing-Power-Drive-Artificial-Intelligence-Progress.pdf

  291. 7d9dd33988e7bf4b8bd76eaa35be0fdb84c5ee8b.pdf

  292. https://edoras.sdsu.edu/~vinge/misc/singularity.html

  293. https://evabehrens.substack.com/p/the-agi-race-between-the-us-and-china

  294. https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/

  295. d89d4f82112bca85cbf3807ffd2d98f634332f73.html

  296. https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/#how-do-the-big-clouds-compare

  297. d89d4f82112bca85cbf3807ffd2d98f634332f73.html#how-do-the-big-clouds-compare

  298. https://groq.com/wp-content/uploads/2020/06/ISCA-TSP.pdf

  299. 3c5069bf2f3618db599e775eccc64cbe6ec2b1a3.pdf

  300. https://hpc.stability.ai/

  301. https://newsletter.pragmaticengineer.com/p/scaling-chatgpt#%C2%A7five-scaling-challenges

  302. de8235663546200800f6174df20b20cb9e48f951.html#%C2%A7five-scaling-challenges

  303. https://openai.com/blog/techniques-for-training-large-neural-networks/

  304. https://openai.com/research/scaling-kubernetes-to-7500-nodes

  305. https://research.google/blog/tensorstore-for-high-performance-scalable-array-storage/

  306. https://restofworld.org/2024/tsmc-arizona-expansion/

  307. e0ff0346ba215e8d8b5f9ee3674b97398424ca16.html

  308. https://rhg.com/research/running-on-ice/

  309. https://siboehm.com/articles/22/CUDA-MMM

  310. d4b3e9d6de0655f490fe6369f4d004936248a833.html

  311. https://spectrum.ieee.org/computing/hardware/the-future-of-deep-learning-is-photonic

  312. 2935be2c3b96629a2837b4264eec4d317b600676.html

  313. https://spectrum.ieee.org/generative-ai-training

  314. https://thechipletter.substack.com/p/googles-first-tpu-architecture

  315. https://thezvi.substack.com/p/on-the-executive-order

  316. https://venturebeat.com/2020/11/17/cerebras-wafer-size-chip-is-10000-times-faster-than-a-gpu/

  317. c3f2906e87b0db99d15c833019eacff6354fa761.html

  318. https://warontherocks.com/2024/04/how-washington-can-save-its-semiconductor-controls-on-china/

  319. 5608b0bd6e231c3714db90832495b04330e3f551.html

  320. https://www.abortretry.fail/p/the-rise-and-fall-of-silicon-graphics

  321. fc99b9cdc59fc2f54cda32646bde7e168c5b654d.html

  322. https://www.bloomberg.com/news/articles/2022-10-10/china-chip-stocks-drop-as-biden-tightens-rules-on-us-tech-access

  323. https://www.businesswire.com/news/home/20241015910376/en/Crusoe-Blue-Owl-Capital-and-Primary-Digital-Infrastructure-Enter-3.4-billion-Joint-Venture-for-AI-Data-Center-Development

  324. https://www.cerebras.net/blog/introducing-gigagpt-gpt-3-sized-models-in-565-lines-of-code

  325. dfda8a66e3496fa3145e60fc5d38aaee429283be.html

  326. https://www.cerebras.net/press-release/cerebras-announces-third-generation-wafer-scale-engine

  327. b3d4f4af70a0d7279d8223b3c117122e2484fce1.html

  328. https://www.chinatalk.media/p/new-chip-export-controls-explained

  329. 22e691db18cb43ef9c068bc2f69bba983f0b77d9.html

  330. https://www.chinatalk.media/p/new-sexport-controls-semianalysis

  331. https://www.ft.com/content/25337df3-5b98-4dd1-b7a9-035dcc130d6a

  332. a4fffe2cf8eaafe5b95e2fac832d5df662785479.html

  333. https://www.fujitsu.com/global/about/resources/news/press-releases/2024/0510-01.html

  334. 1d0533fb286241aa5384b8c069435e9a3ce85082.html

  335. https://www.ibm.com/blogs/research/2020/12/ibm-ai-edge/

  336. 17ff62b769b3c48248322ded636a474e01340f71.html

  337. https://www.lesswrong.com/posts/KsKfvLx7nFBZnWtEu/no-human-brains-are-not-much-more-efficient-than-computers

  338. https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee

  339. https://www.lesswrong.com/posts/cB2Rtnp7DBTpDy3ii/memory-bandwidth-constraints-imply-economies-of-scale-in-ai

  340. https://www.nytimes.com/2022/10/13/us/politics/biden-china-technology-semiconductors.html

  341. https://www.nytimes.com/2023/07/12/magazine/semiconductor-chips-us-china.html

  342. https://www.reddit.com/r/MachineLearning/comments/1dlsogx/d_academic_ml_labs_how_many_gpus/

  343. fc3451adfe0c2fa3ae300ef42099838710af4b6e.html

  344. https://www.reuters.com/technology/coreweave-raises-23-billion-debt-collateralized-by-nvidia-chips-2023-08-03/

  345. 51ae002d5241506804199afbb8027fa44a653103.html

  346. https://www.reuters.com/technology/inside-metas-scramble-catch-up-ai-2023-04-25/

  347. https://www.theinformation.com/articles/microsoft-and-openai-plot-100-billion-stargate-ai-supercomputer

  348. https://www.theregister.com/2023/11/07/bing_gpu_oracle/

  349. https://www.yitay.net/blog/training-great-llms-entirely-from-ground-zero-in-the-wilderness

  350. https://x.com/Altimor/status/1668902393386237953

  351. https://x.com/DanHendrycks/status/1825926885370728881

  352. https://x.com/EMostaque/status/1674479761429504017

  353. https://x.com/EmilWallner/status/1591007449691336704

  354. https://x.com/elonmusk/status/1797382701541990841

  355. https://x.com/emollick/status/1759633391098732967

  356. https://x.com/jordanschnyc/status/1580889342402129921

  357. https://x.com/miolini/status/1634982361757790209

  358. https://x.com/pirroh/status/1694516986561307022

  359. https://x.com/ptrschmdtnlsn/status/1669590814329036803

  360. https://x.com/sama/status/1739360234832052641

  361. https://x.com/thiteanish/status/1635188333705043969

  362. https://x.com/transitive_bs/status/1628118163874516992

  363. https://x.com/ylecun/status/1612182019861094402

  364. Why a US AI "Manhattan Project" could backfire: notes from conversations in China

  365. Benjamin Todd

  366. https%253A%252F%252Fbenjamintodd.substack.com%252Fp%252Fwhy-a-us-ai-manhattan-project-could.html

  367. China Is Losing the Chip War: Xi Jinping picked a fight over semiconductor technology—one he can’t win

  368. https%253A%252F%252Fwww.theatlantic.com%252Finternational%252Farchive%252F2024%252F06%252Fchina-microchip-technology-competition%252F678612%252F.html

  369. Singapore’s Temasek in discussions to invest in OpenAI: State-backed group in talks with ChatGPT maker’s chief Sam Altman who is seeking funding to build chips business

  370. https%253A%252F%252Fwww.ft.com%252Fcontent%252F8e8a65a0-a990-4c77-a6e8-ec4e5d247f80.html

  371. OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman

  372. https%253A%252F%252Fwww.wired.com%252Fstory%252Fopenai-buy-ai-chips-startup-sam-altman%252F.html

  373. How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?

  374. https%253A%252F%252Fwww.newyorker.com%252Fmagazine%252F2023%252F12%252F04%252Fhow-jensen-huangs-nvidia-is-powering-the-ai-revolution.html

  375. Microsoft Swallows OpenAI’s Core Team § Compute Is King

  376. https%253A%252F%252Fwww.semianalysis.com%252Fp%252Fmicrosoft-swallows-openais-core-team%2523%2525C2%2525A7compute-is-king.html

  377. Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman was fundraising in the Middle East for new chip venture; The project, code-named Tigris, is intended to rival Nvidia

  378. https%253A%252F%252Fwww.bloomberg.com%252Fnews%252Farticles%252F2023-11-19%252Faltman-sought-billions-for-ai-chip-venture-before-openai-ouster.html

  379. Saudi-China collaboration raises concerns about access to AI chips: Fears grow at Gulf kingdom’s top university that ties to Chinese researchers risk upsetting US government

  380. https%253A%252F%252Fwww.ft.com%252Fcontent%252F2a636cee-b0d2-45c2-a815-11ca32371763.html

  381. Biden Is Beating China on Chips. It May Not Be Enough.

  382. https%253A%252F%252Fwww.nytimes.com%252F2023%252F07%252F16%252Fopinion%252Fbiden-china-ai-chips-trade.html.html

  383. Deep Mind’s chief on AI’s dangers—and the UK’s £900 million supercomputer: Demis Hassabis says we shouldn’t let AI fall into the wrong hands and the government’s plan to build a supercomputer for AI is likely to be out of date before it has even started

  384. https%253A%252F%252Farchive.is%252Fc5jTk.html

  385. Inflection AI announces $1.3 billion of funding led by current investors, Microsoft, and NVIDIA

  386. https%253A%252F%252Finflection.ai%252Finflection-ai-announces-1-3-billion-of-funding.html

  387. The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like toilet paper during the pandemic.’ Startups, investors scrounge for computational firepower

  388. https://x.com/dseetharaman

  389. https%253A%252F%252Fwww.wsj.com%252Farticles%252Fthe-ai-boom-runs-on-chips-but-it-cant-get-enough-9f76f554.html

  390. Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing

  391. https%253A%252F%252Farxiv.org%252Fabs%252F2305.06946.html

  392. Context on the NVIDIA ChatGPT opportunity—and ramifications of large language model enthusiasm

  393. https%253A%252F%252Fx.com%252Fdavidtayar5%252Fstatus%252F1627690520456691712.html

  394. Microsoft and OpenAI extend partnership

  395. https%253A%252F%252Fblogs.microsoft.com%252Fblog%252F2023%252F01%252F23%252Fmicrosoftandopenaiextendpartnership%252F.html

  396. Efficiently Scaling Transformer Inference

  397. https://x.com/jekbradbury

  398. https%253A%252F%252Farxiv.org%252Fabs%252F2211.05102%2523google.html

  399. Tutel: Adaptive Mixture-of-Experts at Scale

  400. https%253A%252F%252Farxiv.org%252Fabs%252F2206.03382%2523microsoft.html

  401. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

  402. https%253A%252F%252Farxiv.org%252Fabs%252F2206.01861%2523microsoft.html

  403. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

  404. Tri Dao

  405. Stefano Ermon

  406. https%253A%252F%252Farxiv.org%252Fabs%252F2205.14135.html

  407. Monarch: Expressive Structured Matrices for Efficient and Accurate Training

  408. Tri Dao

  409. https%253A%252F%252Farxiv.org%252Fabs%252F2204.00595.html

  410. LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

  411. https%253A%252F%252Farxiv.org%252Fabs%252F2203.02094%2523microsoft.html

  412. Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

  413. https%253A%252F%252Farxiv.org%252Fabs%252F2202.06009%2523microsoft.html

  414. Introducing the AI Research SuperCluster—Meta’s cutting-edge AI supercomputer for AI research

  415. https%253A%252F%252Fai.meta.com%252Fblog%252Fai-rsc%252F.html

  416. Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask

  417. https%253A%252F%252Fsemiengineering.com%252Fis-programmable-overhead-worth-the-cost%252F.html

  418. Distributed Deep Learning in Open Collaborations

  419. Thomas Wolf

  420. https%253A%252F%252Farxiv.org%252Fabs%252F2106.10207.html

  421. Ten Lessons From Three Generations Shaped Google’s TPUv4i

  422. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2021-jouppi.pdf.html

  423. ChinAI #141: The PanGu Origin Story: Notes from an informative Zhihu Thread on PanGu

  424. https%253A%252F%252Fchinai.substack.com%252Fp%252Fchinai-141-the-pangu-origin-story.html

  425. How to Train BERT with an Academic Budget

  426. Omer Levy

  427. https%253A%252F%252Farxiv.org%252Fabs%252F2104.07705.html

  428. Podracer architectures for scalable Reinforcement Learning

  429. https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html

  430. An Efficient 2D Method for Training Super-Large Deep Learning Models

  431. https%253A%252F%252Farxiv.org%252Fabs%252F2104.05343.html

  432. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

  433. https%253A%252F%252Farxiv.org%252Fabs%252F2102.07988.html

  434. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

  435. https%253A%252F%252Farxiv.org%252Fabs%252F2102.03161.html

  436. BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters

  437. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2020-jiang.pdf.html

  438. Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour

  439. https%253A%252F%252Farxiv.org%252Fabs%252F2011.00071%2523google.html

  440. Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

  441. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2020-launay-2.pdf.html

  442. DeepSpeed: Extreme-scale model training for everyone

  443. https%253A%252F%252Fwww.microsoft.com%252Fen-us%252Fresearch%252Fblog%252Fdeepspeed-extreme-scale-model-training-for-everyone%252F.html

  444. Are we in an AI overhang?

  445. Andy Jones

  446. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FN6vZEnCn6A95Xn39p%252Fare-we-in-an-ai-overhang.html

  447. Microsoft announces new supercomputer, lays out vision for future AI work

  448. https%253A%252F%252Fnews.microsoft.com%252Fsource%252Ffeatures%252Fai%252Fopenai-azure-supercomputer%252F.html

  449. Startup Tenstorrent shows AI is changing computing and vice versa: Tenstorrent is one of the rush of AI chip makers founded in 2016 and finally showing product. The new wave of chips represent a substantial departure from how traditional computer chips work, but also point to ways that neural network design may change in the years to come

  450. https%253A%252F%252Fwww.zdnet.com%252Farticle%252Fstartup-tenstorrent-and-competitors-show-how-computing-is-changing-ai-and-vice-versa%252F.html

  451. AI Chips: What They Are and Why They Matter—An AI Chips Reference

  452. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2020-khan.pdf.html

  453. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

  454. Sanjiv Kumar

  455. https%253A%252F%252Farxiv.org%252Fabs%252F1904.00962%2523google.html

  456. Mesh-TensorFlow: Deep Learning for Supercomputers

  457. Niki Parmar

  458. https%253A%252F%252Farxiv.org%252Fabs%252F1811.02084%2523google.html

  459. AI and Compute

  460. https://jack-clark.net/about/

  461. https%253A%252F%252Fopenai.com%252Fresearch%252Fai-and-compute.html

  462. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

  463. https%253A%252F%252Farxiv.org%252Fabs%252F1712.01887.html

  464. DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification

  465. https%253A%252F%252Farxiv.org%252Fabs%252F1102.0183%2523schmidhuber.html

  466. Goodbye 2010

  467. https%253A%252F%252Fwww.vetta.org%252F2010%252F12%252Fgoodbye-2010%252F.html

  468. Large-scale deep unsupervised learning using graphics processors

  469. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2009-raina.pdf.html

  470. Bandwidth optimal all-reduce algorithms for clusters of workstations

  471. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2009-patarasuk.pdf.html

  472. When will computer hardware match the human brain?

  473. https%253A%252F%252Fjetpress.org%252Fvolume1%252Fmoravec.htm.html

  474. Superhumanism: According to Hans Moravec § AI Scaling

  475. https%253A%252F%252Fwww.wired.com%252F1995%252F10%252Fmoravec%252F%2523scaling.html

  476. A Sociological Study of the Official History of the Perceptrons Controversy [1993]

  477. Mikel Olazaran

  478. %252Fdoc%252Fai%252Fnn%252F1993-olazaran.pdf.html

  479. The Role Of RAW POWER In INTELLIGENCE

  480. https%253A%252F%252Fweb.archive.org%252Fweb%252F20230710000944%252Fhttps%253A%252F%252Ffrc.ri.cmu.edu%252F~hpm%252Fproject.archive%252Fgeneral.articles%252F1975%252FRaw.Power.html.html

  481. TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs

  482. https%253A%252F%252Fsites.research.google%252Ftrc%252F.html