- See Also
-
Links
- “OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman”, Dave 2023
- “Microsoft Swallows OpenAI’s Core Team § Compute Is King”, Patel & Nishball 2023
- “Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman Was Fundraising in the Middle East for New Chip Venture; The Project, Code-named Tigris, Is Intended to Rival Nvidia”, Ludlow & Vance 2023
- “LSS Transformer: Ultra-Long Sequence Distributed Transformer”, Wang et al 2023
- “ChipNeMo: Domain-Adapted LLMs for Chip Design”, Liu et al 2023
- “Saudi-China Collaboration Raises Concerns about Access to AI Chips: Fears Grow at Gulf Kingdom’s Top University That Ties to Chinese Researchers Risk Upsetting US Government”, Kerr et al 2023
- “Biden Is Beating China on Chips. It May Not Be Enough.”, Wang 2023
- “Deep Mind’s Chief on AI’s Dangers—and the UK’s £900 Million Supercomputer: Demis Hassabis Says We Shouldn’t Let AI Fall into the Wrong Hands and the Government’s Plan to Build a Supercomputer for AI Is Likely to Be out of Date Before It Has Even Started”, Sellman 2023
- “U.S. Considers New Curbs on AI Chip Exports to China: Restrictions Come amid Concerns That China Could Use AI Chips from Nvidia and Others for Weapon Development and Hacking”, Fitch et al 2023
- “The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like Toilet Paper during the Pandemic.’ Startups, Investors Scrounge for Computational Firepower”, Seetharaman & Dotan 2023
- “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, Mallasén et al 2023
- davidtayar5 @ "2023-02-10"
- “Microsoft and OpenAI Extend Partnership”, Microsoft 2023
- “A 64-core Mixed-signal In-memory Compute Chip Based on Phase-change Memory for Deep Neural Network Inference”, Gallo et al 2022
- “Efficiently Scaling Transformer Inference”, Pope et al 2022
- “Reserve Capacity of NVIDIA HGX H100s on CoreWeave Now: Available at Scale in Q1 2023 Starting at $2.23/hr”, CoreWeave 2022
- “Petals: Collaborative Inference and Fine-tuning of Large Models”, Borzunov et al 2022
- “Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training”, You et al 2022
- “Is Integer Arithmetic Enough for Deep Learning Training?”, Ghaffari et al 2022
- “Efficient NLP Inference at the Edge via Elastic Pipelining”, Guo et al 2022
- “Training Transformers Together”, Borzunov et al 2022
- “Tutel: Adaptive Mixture-of-Experts at Scale”, Hwang et al 2022
- “8-bit Numerical Formats for Deep Neural Networks”, Noune et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
- “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
- “A Low-latency Communication Design for Brain Simulations”, Du 2022
- “Reducing Activation Recomputation in Large Transformer Models”, Korthikanti et al 2022
- “What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
- “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Dao et al 2022
- “Pathways: Asynchronous Distributed Dataflow for ML”, Barham et al 2022
- “LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models”, Javaheripi et al 2022
- “Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads”, Shukla et al 2022
- “Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Lu et al 2022
- “Introducing the AI Research SuperCluster—Meta’s Cutting-edge AI Supercomputer for AI Research”, Lee & Sengupta 2022
- “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Bailey 2022
- “Spiking Neural Networks and Their Applications: A Review”, Yamazaki et al 2022
- “On the Working Memory of Humans and Great Apes: Strikingly Similar or Remarkably Different?”, Read et al 2021
- “SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient”, Ryabinin et al 2021
- “Sustainable AI: Environmental Implications, Challenges and Opportunities”, Wu et al 2021
- “China Has Already Reached Exascale—On Two Separate Systems”, Hemsoth 2021
- “The Efficiency Misnomer”, Dehghani et al 2021
- “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, Rudin et al 2021
- “WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”, Lan et al 2021
- “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, Makoviychuk et al 2021
- “PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management”, Fang et al 2021
- “Demonstration of Decentralized, Physics-Driven Learning”, Dillavou et al 2021
- “Chimera: Efficiently Training Large-Scale Neural Networks With Bidirectional Pipelines”, Li & Hoefler 2021
- “First-Generation Inference Accelerator Deployment at Facebook”, Anderson et al 2021
- “Single-chip Photonic Deep Neural Network for Instantaneous Image Classification”, Ashtiani et al 2021
- “Distributed Deep Learning in Open Collaborations”, Diskin et al 2021
- “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Jouppi et al 2021
- “2.5-dimensional Distributed Model Training”, Wang et al 2021
- “Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks”, Bian et al 2021
- “A Full-stack Accelerator Search Technique for Vision Applications”, Zhang et al 2021
- “ChinAI #141: The PanGu Origin Story: Notes from an Informative Zhihu Thread on PanGu”, Ding 2021
- “GSPMD: General and Scalable Parallelization for ML Computation Graphs”, Xu et al 2021
- “PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models With Auto-parallel Computation”, Zeng et al 2021
- “ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning”, Rajbhandari et al 2021
- “How to Train BERT With an Academic Budget”, Izsak et al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, Hessel et al 2021
- “An Efficient 2D Method for Training Super-Large Deep Learning Models”, Xu et al 2021
- “High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, Mudigere et al 2021
- “Efficient Large-Scale Language Model Training on GPU Clusters”, Narayanan et al 2021
- “Large Batch Simulation for Deep Reinforcement Learning”, Shacklett et al 2021
- “Warehouse-Scale Video Acceleration (Argos): Co-design and Deployment in the Wild”, Ranganathan et al 2021
- “TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models”, Li et al 2021
- “PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers”, He et al 2021
- “ZeRO-Offload: Democratizing Billion-Scale Model Training”, Ren et al 2021
- “The Design Process for Google’s Training Chips: TPUv2 and TPUv3”, Norrie et al 2021
- “Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment”, Launay et al 2020
- “Parallel Training of Deep Networks With Local Updates”, Laskin et al 2020
- “Exploring the Limits of Concurrency in ML Training on Google TPUs”, Kumar et al 2020
- “BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters”, Jiang et al 2020
- “Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour”, Wongpanich et al 2020
- “Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?”, Domke et al 2020
- “Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures”, Launay et al 2020b
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
- “Interlocking Backpropagation: Improving Depthwise Model-parallelism”, Gomez et al 2020
- “DeepSpeed: Extreme-scale Model Training for Everyone”, Team et al 2020
- “Measuring Hardware Overhang”, hippke 2020
- “The Node Is Nonsense: There Are Better Ways to Measure Progress Than the Old Moore’s Law Metric”, Moore 2020
- “Are We in an AI Overhang?”, Jones 2020
- “HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, Garland & Gregg 2020
- “The Computational Limits of Deep Learning”, Thompson et al 2020
- “Data Movement Is All You Need: A Case Study on Optimizing Transformers”, Ivanov et al 2020
- “PyTorch Distributed: Experiences on Accelerating Data Parallel Training”, Li et al 2020
- “Japanese Supercomputer Is Crowned World’s Speediest: In the Race for the Most Powerful Computers, Fugaku, a Japanese Supercomputer, Recently Beat American and Chinese Machines”, Clark 2020
- “Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, Petrenko et al 2020
- “PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training”, Narayanan et al 2020
- “There’s Plenty of Room at the Top: What Will Drive Computer Performance After Moore’s Law?”, Leiserson et al 2020
- “A Domain-specific Supercomputer for Training Deep Neural Networks”, Jouppi et al 2020
- “Microsoft Announces New Supercomputer, Lays out Vision for Future AI Work”, Langston 2020
- “AI and Efficiency: We’re Releasing an Analysis Showing That Since 2012 the Amount of Compute Needed to Train a Neural Net to the Same Performance on ImageNet Classification Has Been Decreasing by a Factor of 2 Every 16 Months”, Hernandez & Brown 2020
- “Computation in the Human Cerebral Cortex Uses Less Than 0.2 Watts yet This Great Expense Is Optimal When considering Communication Costs”, Levy & Calvert 2020
- “Startup Tenstorrent Shows AI Is Changing Computing and vice Versa: Tenstorrent Is One of the Rush of AI Chip Makers Founded in 2016 and Finally Showing Product. The New Wave of Chips Represent a Substantial Departure from How Traditional Computer Chips Work, but Also Point to Ways That Neural Network Design May Change in the Years to Come”, Ray 2020
- “AI Chips: What They Are and Why They Matter—An AI Chips Reference”, Khan & Mann 2020
- “Pipelined Backpropagation at Scale: Training Large Models without Batches”, Kosson et al 2020
- “2019 Recent Trends in GPU Price per FLOPS”, Bergal 2020
- “Ultrafast Machine Vision With 2D Material Neural Network Image Sensors”, Mennel et al 2020
- “Towards Spike-based Machine Intelligence With Neuromorphic Computing”, Roy et al 2019
- “Checkmate: Breaking the Memory Wall With Optimal Tensor Rematerialization”, Jain et al 2019
- “Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos”, Lin et al 2019
- “Energy and Policy Considerations for Deep Learning in NLP”, Strubell et al 2019
- “Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes”, You et al 2019
- “GAP: Generalizable Approximate Graph Partitioning Framework”, Nazi et al 2019
- “An Empirical Model of Large-Batch Training”, McCandlish et al 2018
- “Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018
- “GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism”, Huang et al 2018
- “Measuring the Effects of Data Parallelism on Neural Network Training”, Shallue et al 2018
- “Mesh-TensorFlow: Deep Learning for Supercomputers”, Shazeer et al 2018
- “There Is Plenty of Time at the Bottom: the Economics, Risk and Ethics of Time Compression”, Sandberg 2018
- “Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, Jia et al 2018
- “AI and Compute”, Amodei et al 2018
- “Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions”, Vasilache et al 2018
- “Loihi: A Neuromorphic Manycore Processor With On-Chip Learning”, Davies et al 2018
- “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Lin et al 2017
- “Mixed Precision Training”, Micikevicius et al 2017
- “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”, Keskar et al 2016
- “Training Deep Nets With Sublinear Memory Cost”, Chen et al 2016
- “GeePS: Scalable Deep Learning on Distributed GPUs With a GPU-specialized Parameter Server”, Cui et al 2016
- “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, Esser et al 2016
- “Communication-Efficient Learning of Deep Networks from Decentralized Data”, McMahan et al 2016
- “Persistent RNNs: Stashing Recurrent Weights On-Chip”, Diamos et al 2016
- “The Brain As a Universal Learning Machine”, Cannell 2015
- “Scaling Distributed Machine Learning With the Parameter Server”, Li et al 2014
- “Multi-column Deep Neural Network for Traffic Sign Classification”, Cireşan et al 2012b
- “Slowing Moore’s Law: How It Could Happen”, Gwern 2012
- “Multi-column Deep Neural Networks for Image Classification”, Cireşan et al 2012
- “Implications of Historical Trends in the Electrical Efficiency of Computing”, Koomey et al 2011
- “HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent”, Niu et al 2011
- “DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification”, Ciresan et al 2011
- “Goodbye 2010”, Legg 2010
- “The Cat Is out of the Bag: Cortical Simulations With 109 Neurons, 1013 Synapses”, Ananthanarayanan et al 2009
- “Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations”, Patarasuk & Yuan 2009
- “Whole Brain Emulation: A Roadmap”
- “Moore's Law and the Technology S-Curve”, Bowden 2004
- “DARPA and the Quest for Machine Intelligence, 1983–1993”, Roland & Shiman 2002
- “Ultimate Physical Limits to Computation”, Lloyd 1999
- “Matrioshka Brains”
- “When Will Computer Hardware Match the Human Brain?”, Moravec 1998
- “Superhumanism: According to Hans Moravec § AI Scaling”, Platt 1995
- “A Sociological Study of the Official History of the Perceptrons Controversy [1993]”, Olazaran 1993
- “Intelligence As an Emergent Behavior; Or, The Songs of Eden”, Hillis 1988
- “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-edge Machine Learning Research With Free Cloud TPUs”, TRC 2023
- “48:44—Tesla Vision · 1:13:12—Planning and Control · 1:24:35—Manual Labeling · 1:28:11—Auto Labeling · 1:35:15—Simulation · 1:42:10—Hardware Integration · 1:45:40—Dojo”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman”, Dave 2023
“OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman”
“Microsoft Swallows OpenAI’s Core Team § Compute Is King”, Patel & Nishball 2023
“Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman Was Fundraising in the Middle East for New Chip Venture; The Project, Code-named Tigris, Is Intended to Rival Nvidia”, Ludlow & Vance 2023
“LSS Transformer: Ultra-Long Sequence Distributed Transformer”, Wang et al 2023
“LSS Transformer: Ultra-Long Sequence Distributed Transformer”
“ChipNeMo: Domain-Adapted LLMs for Chip Design”, Liu et al 2023
“Saudi-China Collaboration Raises Concerns about Access to AI Chips: Fears Grow at Gulf Kingdom’s Top University That Ties to Chinese Researchers Risk Upsetting US Government”, Kerr et al 2023
“Biden Is Beating China on Chips. It May Not Be Enough.”, Wang 2023
“Deep Mind’s Chief on AI’s Dangers—and the UK’s £900 Million Supercomputer: Demis Hassabis Says We Shouldn’t Let AI Fall into the Wrong Hands and the Government’s Plan to Build a Supercomputer for AI Is Likely to Be out of Date Before It Has Even Started”, Sellman 2023
“U.S. Considers New Curbs on AI Chip Exports to China: Restrictions Come amid Concerns That China Could Use AI Chips from Nvidia and Others for Weapon Development and Hacking”, Fitch et al 2023
“The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like Toilet Paper during the Pandemic.’ Startups, Investors Scrounge for Computational Firepower”, Seetharaman & Dotan 2023
“Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, Mallasén et al 2023
“Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”
davidtayar5 @ "2023-02-10"
“Context on the NVIDIA ChatGPT opportunity—and ramifications of large language model enthusiasm”
“Microsoft and OpenAI Extend Partnership”, Microsoft 2023
“A 64-core Mixed-signal In-memory Compute Chip Based on Phase-change Memory for Deep Neural Network Inference”, Gallo et al 2022
“Efficiently Scaling Transformer Inference”, Pope et al 2022
“Reserve Capacity of NVIDIA HGX H100s on CoreWeave Now: Available at Scale in Q1 2023 Starting at $2.23/hr”, CoreWeave 2022
“Petals: Collaborative Inference and Fine-tuning of Large Models”, Borzunov et al 2022
“Petals: Collaborative Inference and Fine-tuning of Large Models”
“Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training”, You et al 2022
“Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training”
“Is Integer Arithmetic Enough for Deep Learning Training?”, Ghaffari et al 2022
“Efficient NLP Inference at the Edge via Elastic Pipelining”, Guo et al 2022
“Efficient NLP Inference at the Edge via Elastic Pipelining”
“Training Transformers Together”, Borzunov et al 2022
“Tutel: Adaptive Mixture-of-Experts at Scale”, Hwang et al 2022
“8-bit Numerical Formats for Deep Neural Networks”, Noune et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”
“FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
“FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”
“A Low-latency Communication Design for Brain Simulations”, Du 2022
“Reducing Activation Recomputation in Large Transformer Models”, Korthikanti et al 2022
“Reducing Activation Recomputation in Large Transformer Models”
“What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
“What Language Model to Train if You Have One Million GPU Hours?”
“Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Dao et al 2022
“Monarch: Expressive Structured Matrices for Efficient and Accurate Training”
“Pathways: Asynchronous Distributed Dataflow for ML”, Barham et al 2022
“LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models”, Javaheripi et al 2022
“LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models”
“Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads”, Shukla et al 2022
“Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads”
“Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Lu et al 2022
“Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”
“Introducing the AI Research SuperCluster—Meta’s Cutting-edge AI Supercomputer for AI Research”, Lee & Sengupta 2022
“Introducing the AI Research SuperCluster—Meta’s cutting-edge AI supercomputer for AI research”
“Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Bailey 2022
“Spiking Neural Networks and Their Applications: A Review”, Yamazaki et al 2022
“On the Working Memory of Humans and Great Apes: Strikingly Similar or Remarkably Different?”, Read et al 2021
“On the Working Memory of Humans and Great Apes: Strikingly Similar or Remarkably Different?”
“SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient”, Ryabinin et al 2021
“SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient”
“Sustainable AI: Environmental Implications, Challenges and Opportunities”, Wu et al 2021
“Sustainable AI: Environmental Implications, Challenges and Opportunities”
“China Has Already Reached Exascale—On Two Separate Systems”, Hemsoth 2021
“China Has Already Reached Exascale—On Two Separate Systems”
“The Efficiency Misnomer”, Dehghani et al 2021
“Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, Rudin et al 2021
“Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”
“WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”, Lan et al 2021
“WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”
“Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, Makoviychuk et al 2021
“Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”
“PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management”, Fang et al 2021
“PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management”
“Demonstration of Decentralized, Physics-Driven Learning”, Dillavou et al 2021
“Chimera: Efficiently Training Large-Scale Neural Networks With Bidirectional Pipelines”, Li & Hoefler 2021
“Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines”
“First-Generation Inference Accelerator Deployment at Facebook”, Anderson et al 2021
“First-Generation Inference Accelerator Deployment at Facebook”
“Single-chip Photonic Deep Neural Network for Instantaneous Image Classification”, Ashtiani et al 2021
“Single-chip photonic deep neural network for instantaneous image classification”
“Distributed Deep Learning in Open Collaborations”, Diskin et al 2021
“Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Jouppi et al 2021
“2.5-dimensional Distributed Model Training”, Wang et al 2021
“Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks”, Bian et al 2021
“Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks”
“A Full-stack Accelerator Search Technique for Vision Applications”, Zhang et al 2021
“A Full-stack Accelerator Search Technique for Vision Applications”
“ChinAI #141: The PanGu Origin Story: Notes from an Informative Zhihu Thread on PanGu”, Ding 2021
“ChinAI #141: The PanGu Origin Story: Notes from an informative Zhihu Thread on PanGu”
“GSPMD: General and Scalable Parallelization for ML Computation Graphs”, Xu et al 2021
“GSPMD: General and Scalable Parallelization for ML Computation Graphs”
“PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models With Auto-parallel Computation”, Zeng et al 2021
“ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning”, Rajbhandari et al 2021
“ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning”
“How to Train BERT With an Academic Budget”, Izsak et al 2021
“Podracer Architectures for Scalable Reinforcement Learning”, Hessel et al 2021
“Podracer architectures for scalable Reinforcement Learning”
“An Efficient 2D Method for Training Super-Large Deep Learning Models”, Xu et al 2021
“An Efficient 2D Method for Training Super-Large Deep Learning Models”
“High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, Mudigere et al 2021
“High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”
“Efficient Large-Scale Language Model Training on GPU Clusters”, Narayanan et al 2021
“Efficient Large-Scale Language Model Training on GPU Clusters”
“Large Batch Simulation for Deep Reinforcement Learning”, Shacklett et al 2021
“Warehouse-Scale Video Acceleration (Argos): Co-design and Deployment in the Wild”, Ranganathan et al 2021
“Warehouse-Scale Video Acceleration (Argos): Co-design and Deployment in the Wild”
“TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models”, Li et al 2021
“TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models”
“PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers”, He et al 2021
“PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers”
“ZeRO-Offload: Democratizing Billion-Scale Model Training”, Ren et al 2021
“The Design Process for Google’s Training Chips: TPUv2 and TPUv3”, Norrie et al 2021
“The Design Process for Google’s Training Chips: TPUv2 and TPUv3”
“Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment”, Launay et al 2020
“Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment”
“Parallel Training of Deep Networks With Local Updates”, Laskin et al 2020
“Exploring the Limits of Concurrency in ML Training on Google TPUs”, Kumar et al 2020
“Exploring the limits of Concurrency in ML Training on Google TPUs”
“BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters”, Jiang et al 2020
“Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour”, Wongpanich et al 2020
“Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour”
“Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?”, Domke et al 2020
“Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?”
“Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures”, Launay et al 2020b
“Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures”
“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
“L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm”
“Interlocking Backpropagation: Improving Depthwise Model-parallelism”, Gomez et al 2020
“Interlocking Backpropagation: Improving depthwise model-parallelism”
“DeepSpeed: Extreme-scale Model Training for Everyone”, Team et al 2020
“Measuring Hardware Overhang”, hippke 2020
“The Node Is Nonsense: There Are Better Ways to Measure Progress Than the Old Moore’s Law Metric”, Moore 2020
“The Node Is Nonsense: There are better ways to measure progress than the old Moore’s law metric”
“Are We in an AI Overhang?”, Jones 2020
“HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, Garland & Gregg 2020
“The Computational Limits of Deep Learning”, Thompson et al 2020
“Data Movement Is All You Need: A Case Study on Optimizing Transformers”, Ivanov et al 2020
“Data Movement Is All You Need: A Case Study on Optimizing Transformers”
“PyTorch Distributed: Experiences on Accelerating Data Parallel Training”, Li et al 2020
“PyTorch Distributed: Experiences on Accelerating Data Parallel Training”
“Japanese Supercomputer Is Crowned World’s Speediest: In the Race for the Most Powerful Computers, Fugaku, a Japanese Supercomputer, Recently Beat American and Chinese Machines”, Clark 2020
“Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, Petrenko et al 2020
“PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training”, Narayanan et al 2020
“PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training”
“There’s Plenty of Room at the Top: What Will Drive Computer Performance After Moore’s Law?”, Leiserson et al 2020
“There’s plenty of room at the Top: What will drive computer performance after Moore’s law?”
“A Domain-specific Supercomputer for Training Deep Neural Networks”, Jouppi et al 2020
“A domain-specific supercomputer for training deep neural networks”
“Microsoft Announces New Supercomputer, Lays out Vision for Future AI Work”, Langston 2020
“Microsoft announces new supercomputer, lays out vision for future AI work”
“AI and Efficiency: We’re Releasing an Analysis Showing That Since 2012 the Amount of Compute Needed to Train a Neural Net to the Same Performance on ImageNet Classification Has Been Decreasing by a Factor of 2 Every 16 Months”, Hernandez & Brown 2020
“Computation in the Human Cerebral Cortex Uses Less Than 0.2 Watts yet This Great Expense Is Optimal When considering Communication Costs”, Levy & Calvert 2020
“Startup Tenstorrent Shows AI Is Changing Computing and vice Versa: Tenstorrent Is One of the Rush of AI Chip Makers Founded in 2016 and Finally Showing Product. The New Wave of Chips Represent a Substantial Departure from How Traditional Computer Chips Work, but Also Point to Ways That Neural Network Design May Change in the Years to Come”, Ray 2020
“AI Chips: What They Are and Why They Matter—An AI Chips Reference”, Khan & Mann 2020
“AI Chips: What They Are and Why They Matter—An AI Chips Reference”
“Pipelined Backpropagation at Scale: Training Large Models without Batches”, Kosson et al 2020
“Pipelined Backpropagation at Scale: Training Large Models without Batches”
“2019 Recent Trends in GPU Price per FLOPS”, Bergal 2020
“Ultrafast Machine Vision With 2D Material Neural Network Image Sensors”, Mennel et al 2020
“Ultrafast machine vision with 2D material neural network image sensors”
“Towards Spike-based Machine Intelligence With Neuromorphic Computing”, Roy et al 2019
“Towards spike-based machine intelligence with neuromorphic computing”
“Checkmate: Breaking the Memory Wall With Optimal Tensor Rematerialization”, Jain et al 2019
“Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization”
“Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos”, Lin et al 2019
“Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos”
“Energy and Policy Considerations for Deep Learning in NLP”, Strubell et al 2019
“Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes”, You et al 2019
“Large Batch Optimization for Deep Learning: Training BERT in 76 minutes”
“GAP: Generalizable Approximate Graph Partitioning Framework”, Nazi et al 2019
“GAP: Generalizable Approximate Graph Partitioning Framework”
“An Empirical Model of Large-Batch Training”, McCandlish et al 2018
“Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018
“GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism”, Huang et al 2018
“GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism”
“Measuring the Effects of Data Parallelism on Neural Network Training”, Shallue et al 2018
“Measuring the Effects of Data Parallelism on Neural Network Training”
“Mesh-TensorFlow: Deep Learning for Supercomputers”, Shazeer et al 2018
“There Is Plenty of Time at the Bottom: the Economics, Risk and Ethics of Time Compression”, Sandberg 2018
“There is plenty of time at the bottom: the economics, risk and ethics of time compression”
“Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, Jia et al 2018
“Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in 4 Minutes”
“AI and Compute”, Amodei et al 2018
“Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions”, Vasilache et al 2018
“Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions”
“Loihi: A Neuromorphic Manycore Processor With On-Chip Learning”, Davies et al 2018
“Loihi: A Neuromorphic Manycore Processor with On-Chip Learning”
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Lin et al 2017
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”
“Mixed Precision Training”, Micikevicius et al 2017
“On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”, Keskar et al 2016
“On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”
“Training Deep Nets With Sublinear Memory Cost”, Chen et al 2016
“GeePS: Scalable Deep Learning on Distributed GPUs With a GPU-specialized Parameter Server”, Cui et al 2016
“GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server”
“Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, Esser et al 2016
“Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”
“Communication-Efficient Learning of Deep Networks from Decentralized Data”, McMahan et al 2016
“Communication-Efficient Learning of Deep Networks from Decentralized Data”
“Persistent RNNs: Stashing Recurrent Weights On-Chip”, Diamos et al 2016
“The Brain As a Universal Learning Machine”, Cannell 2015
“Scaling Distributed Machine Learning With the Parameter Server”, Li et al 2014
“Scaling Distributed Machine Learning with the Parameter Server”
“Multi-column Deep Neural Network for Traffic Sign Classification”, Cireşan et al 2012b
“Multi-column deep neural network for traffic sign classification”
“Slowing Moore’s Law: How It Could Happen”, Gwern 2012
“Multi-column Deep Neural Networks for Image Classification”, Cireşan et al 2012
“Multi-column Deep Neural Networks for Image Classification”
“Implications of Historical Trends in the Electrical Efficiency of Computing”, Koomey et al 2011
“Implications of Historical Trends in the Electrical Efficiency of Computing”
“HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent”, Niu et al 2011
“HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent”
“DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification”, Ciresan et al 2011
“DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification”
“Goodbye 2010”, Legg 2010
“The Cat Is out of the Bag: Cortical Simulations With 109 Neurons, 1013 Synapses”, Ananthanarayanan et al 2009
“The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses”
“Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations”, Patarasuk & Yuan 2009
“Bandwidth optimal all-reduce algorithms for clusters of workstations”
“Whole Brain Emulation: A Roadmap”
“Moore's Law and the Technology S-Curve”, Bowden 2004
“DARPA and the Quest for Machine Intelligence, 1983–1993”, Roland & Shiman 2002
“Ultimate Physical Limits to Computation”, Lloyd 1999
“Matrioshka Brains”
“When Will Computer Hardware Match the Human Brain?”, Moravec 1998
“Superhumanism: According to Hans Moravec § AI Scaling”, Platt 1995
“A Sociological Study of the Official History of the Perceptrons Controversy [1993]”, Olazaran 1993
“A Sociological Study of the Official History of the Perceptrons Controversy [1993]”
“Intelligence As an Emergent Behavior; Or, The Songs of Eden”, Hillis 1988
“Intelligence as an Emergent Behavior; or, The Songs of Eden”
“TensorFlow Research Cloud (TRC): Accelerate Your Cutting-edge Machine Learning Research With Free Cloud TPUs”, TRC 2023
“48:44—Tesla Vision · 1:13:12—Planning and Control · 1:24:35—Manual Labeling · 1:28:11—Auto Labeling · 1:35:15—Simulation · 1:42:10—Hardware Integration · 1:45:40—Dojo”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
gpuperformance
deeplearning
compute-hardware
Wikipedia
Miscellaneous
-
/doc/ai/scaling/mixture-of-experts/2021-04-12-jensenhuang-gtc2021keynote-eAn_oiZwUXA.en.vtt.txt
-
/doc/ai/scaling/hardware/2021-ren-zerooffload-cpugpudataflow.png
-
/doc/ai/scaling/hardware/2021-jouppi-table1-keycharacteristicsoftpus.png
-
/doc/ai/scaling/hardware/2020-08-05-hippke-measuringhardwareoverhang-chessscaling19902020.png
-
/doc/ai/scaling/hardware/2020-kumar-figure11-tpumultipodspeedups.png
-
/doc/ai/scaling/hardware/1998-moravec-figure3-peakcomputeuseinai19501998.jpg
-
/doc/ai/scaling/hardware/1998-moravec-figure2-evolutionofcomputerpowercost19001998.jpg
-
/doc/ai/scaling/hardware/1998-moravec-figure2-evolutionofcomputerpowercost19001998.csv
-
https://ai.facebook.com/blog/meta-training-inference-accelerator-AI-MTIA/
-
https://blog.research.google/2022/09/tensorstore-for-high-performance.html
-
https://blogs.nvidia.com/blog/2021/04/12/cpu-grace-cscs-alps/
-
https://blogs.nvidia.com/blog/2022/09/08/hopper-mlperf-inference/
-
https://chipsandcheese.com/2023/07/02/nvidias-h100-funny-l2-and-tons-of-bandwidth/
-
https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network
-
https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
-
https://evabehrens.substack.com/p/the-agi-race-between-the-us-and-china
-
https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/
-
https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/#how-do-the-big-clouds-compare
-
https://medium.com/@adi.fu7/ai-accelerators-part-iv-the-very-rich-landscape-17481be80917
-
https://openai.com/blog/techniques-for-training-large-neural-networks/
-
https://openai.com/research/scaling-kubernetes-to-7500-nodes
-
https://siliconangle.com/2021/05/27/perlmutter-said-worlds-fastest-ai-supercomputer-comes-online/
-
https://spectrum.ieee.org/computing/hardware/the-future-of-deep-learning-is-photonic
-
https://top500.org/news/fugaku-holds-top-spot-exascale-remains-elusive/
-
https://twitter.com/ptrschmdtnlsn/status/1669590814329036803
-
https://twitter.com/transitive_bs/status/1628118163874516992
-
https://venturebeat.com/2020/11/17/cerebras-wafer-size-chip-is-10000-times-faster-than-a-gpu/
-
https://www.anandtech.com/show/17327/nvidia-hopper-gpu-architecture-and-h100-accelerator-announced
-
https://www.astralcodexten.com/p/biological-anchors-a-trick-that-might
-
https://www.chinatalk.media/p/new-chip-export-controls-explained
-
https://www.chinatalk.media/p/new-sexport-controls-semianalysis
-
https://www.ft.com/content/f76534bf-b501-4cbf-9a46-80be9feb670c
-
https://www.governance.ai/post/compute-funds-and-pre-trained-models
-
https://www.graphcore.ai/posts/the-next-big-thing-introducing-ipu-pod128-and-ipu-pod256
-
https://www.graphcore.ai/posts/the-wow-factor-graphcore-systems-get-huge-power-and-efficiency-boost
-
https://www.hpcwire.com/2020/11/02/aws-ultraclusters-with-new-p4-a100-instances/
-
https://www.lesswrong.com/posts/QWuegBA9kGBv3xBFy/the-colliding-exponentials-of-ai
-
https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee
-
https://www.lesswrong.com/posts/aNAFrGbzXddQBMDqh/moore-s-law-ai-and-the-pace-of-progress
-
https://www.lesswrong.com/posts/gPmGTND8Kroxgpgsn/how-fast-can-we-perform-a-forward-pass
-
https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know
-
https://www.newyorker.com/tech/annals-of-technology/the-worlds-largest-computer-chip
-
https://www.nextplatform.com/2021/02/11/the-billion-dollar-ai-problem-that-just-keeps-scaling/
-
https://www.nytimes.com/2022/10/13/us/politics/biden-china-technology-semiconductors.html
-
https://www.nytimes.com/2023/07/12/magazine/semiconductor-chips-us-china.html
-
https://www.top500.org/news/ornls-frontier-first-to-break-the-exaflop-ceiling/
Link Bibliography
-
https://www.wired.com/story/openai-buy-ai-chips-startup-sam-altman/
: “OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman”, Paresh Dave -
https://www.ft.com/content/2a636cee-b0d2-45c2-a815-11ca32371763
: “Saudi-China Collaboration Raises Concerns about Access to AI Chips: Fears Grow at Gulf Kingdom’s Top University That Ties to Chinese Researchers Risk Upsetting US Government”, Simeon Kerr, Samer Al-Atrush, Qianer Liu, Madhumita Murgia -
https://www.nytimes.com/2023/07/16/opinion/biden-china-ai-chips-trade.html
: “Biden Is Beating China on Chips. It May Not Be Enough.”, Dan Wang -
https://archive.ph/c5jTk
: “Deep Mind’s Chief on AI’s Dangers—and the UK’s £900 Million Supercomputer: Demis Hassabis Says We Shouldn’t Let AI Fall into the Wrong Hands and the Government’s Plan to Build a Supercomputer for AI Is Likely to Be out of Date Before It Has Even Started”, Mark Sellman -
https://www.wsj.com/articles/the-ai-boom-runs-on-chips-but-it-cant-get-enough-9f76f554
: “The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like Toilet Paper during the Pandemic.’ Startups, Investors Scrounge for Computational Firepower”, Deepa Seetharaman, Tom Dotan -
https://arxiv.org/abs/2305.06946
: “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, David Mallasén, Alberto A. Del Barrio, Manuel Prieto-Matias -
https://twitter.com/davidtayar5/status/1627690520456691712
: “Context on the NVIDIA ChatGPT Opportunity—and Ramifications of Large Language Model Enthusiasm”, Morgan Stanley -
https://blogs.microsoft.com/blog/2023/01/23/microsoftandopenaiextendpartnership/
: “Microsoft and OpenAI Extend Partnership”, Microsoft -
https://arxiv.org/abs/2211.05102#google
: “Efficiently Scaling Transformer Inference”, -
https://arxiv.org/abs/2206.03382#microsoft
: “Tutel: Adaptive Mixture-of-Experts at Scale”, -
https://arxiv.org/abs/2206.01861#microsoft
: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He -
https://arxiv.org/abs/2205.14135
: “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré -
https://arxiv.org/abs/2204.00595
: “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, -
https://arxiv.org/abs/2203.02094#microsoft
: “LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models”, -
https://arxiv.org/abs/2202.06009#microsoft
: “Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He -
https://ai.facebook.com/blog/ai-rsc
: “Introducing the AI Research SuperCluster—Meta’s Cutting-edge AI Supercomputer for AI Research”, Kevin Lee, Shubho Sengupta -
https://semiengineering.com/is-programmable-overhead-worth-the-cost/
: “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Brian Bailey -
https://arxiv.org/abs/2106.10207
: “Distributed Deep Learning in Open Collaborations”, -
2021-jouppi.pdf
: “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, -
https://chinai.substack.com/p/chinai-141-the-pangu-origin-story
: “ChinAI #141: The PanGu Origin Story: Notes from an Informative Zhihu Thread on PanGu”, Jeffrey Ding -
https://arxiv.org/abs/2104.06272#deepmind
: “Podracer Architectures for Scalable Reinforcement Learning”, Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt -
https://arxiv.org/abs/2102.07988
: “TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models”, Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica -
https://arxiv.org/abs/2102.03161
: “PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers”, Chaoyang He, Shen Li, Mahdi Soltanolkotabi, Salman Avestimehr -
https://arxiv.org/abs/2012.06373#lighton
: “Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment”, -
2020-jiang.pdf
: “BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters”, Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo -
https://arxiv.org/abs/2011.00071#google
: “Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour”, Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le, Yang You, Sameer Kumar -
2020-launay-2.pdf
: “Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures”, Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala -
https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/
: “DeepSpeed: Extreme-scale Model Training for Everyone”, DeepSpeed Team, Rangan Majumder, Junhua Wang -
https://news.microsoft.com/source/features/ai/openai-azure-supercomputer/
: “Microsoft Announces New Supercomputer, Lays out Vision for Future AI Work”, Jennifer Langston -
https://www.zdnet.com/article/startup-tenstorrent-and-competitors-show-how-computing-is-changing-ai-and-vice-versa/
: “Startup Tenstorrent Shows AI Is Changing Computing and vice Versa: Tenstorrent Is One of the Rush of AI Chip Makers Founded in 2016 and Finally Showing Product. The New Wave of Chips Represent a Substantial Departure from How Traditional Computer Chips Work, but Also Point to Ways That Neural Network Design May Change in the Years to Come”, Tiernan Ray -
2020-khan.pdf
: “AI Chips: What They Are and Why They Matter—An AI Chips Reference”, Saif M. Khan, Alexander Mann -
https://arxiv.org/abs/1904.00962#google
: “Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes”, -
https://arxiv.org/abs/1811.02084#google
: “Mesh-TensorFlow: Deep Learning for Supercomputers”, -
https://openai.com/research/ai-and-compute
: “AI and Compute”, Dario Amodei, Danny Hernandez, Girish Sastry, Jack Clark, Greg Brockman, Ilya Sutskever -
https://arxiv.org/abs/1712.01887
: “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally -
slowing-moores-law
: “Slowing Moore’s Law: How It Could Happen”, Gwern -
https://arxiv.org/abs/1102.0183#schmidhuber
: “DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification”, Dan Claudiu Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, Jürgen Schmidhuber -
https://www.vetta.org/2010/12/goodbye-2010/
: “Goodbye 2010”, Shane Legg -
2009-patarasuk.pdf
: “Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations”, Pitch Patarasuk, Xin Yuan -
https://www.wired.com/1995/10/moravec/#scaling
: “Superhumanism: According to Hans Moravec § AI Scaling”, Charles Platt -
1993-olazaran.pdf
: “A Sociological Study of the Official History of the Perceptrons Controversy [1993]”, Mikel Olazaran -
https://sites.research.google/trc/
: “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-edge Machine Learning Research With Free Cloud TPUs”, TRC