“‘AI Hardware’ Tag”,2019-09-03 ():
![]()
Bibliography for tag
ai/scaling/hardware, most recent first: 1 related tag, 224 annotations, & 77 links (parent).
- See Also
- Gwern
- Links
- “Getting AI Datacenters in the UK: Why the UK Needs to Create Special Compute Zones; and How to Do It”, et al 2024
- “The Future of Compute: Nvidia’s Crown Is Slipping”, 2024
- “Jake Sullivan: The American Who Waged a Tech War on China”
- “Nvidia’s AI Chips Are Cheaper to Rent in China Than US: Supply of Processors Helps Chinese Start-Ups Advance Artificial Intelligence Technology despite Washington’s Restrictions”, 2024
- “Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine”, et al 2024
- “Chips or Not, Chinese AI Pushes Ahead: A Host of Chinese AI Startups Are Attempting to Write More Efficient Code for Large Language Models”, 2024
- “Can AI Scaling Continue Through 2030?”, et al 2024
- “UK Government Shelves £1.3bn UK Tech and AI Plans”
- “OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training”, et al 2024
- “Huawei Faces Production Challenges With 20% Yield Rate for AI Chip”, 2024
- “RAM Is Practically Endless Now”, fxtentacles 2024
- “Huawei ‘Unable to Secure 3.5 Nanometer Chips’”, 2024
- “China Is Losing the Chip War: Xi Jinping Picked a Fight over Semiconductor Technology—One He Can’t Win”, 2024
- “Scalable Matmul-Free Language Modeling”, et al 2024
- “Elon Musk Ordered Nvidia to Ship Thousands of AI Chips Reserved for Tesla to Twitter/xAI”, 2024
- “Earnings Call: Tesla Discusses Q1 2024 Challenges and AI Expansion”, 2024
- “Microsoft, OpenAI Plan $100 Billion Data-Center Project, Media Report Says”, 2024
- “AI and Memory Wall”, et al 2024
- “Singapore’s Temasek in Discussions to Invest in OpenAI: State-Backed Group in Talks With ChatGPT Maker’s Chief Sam Altman Who Is Seeking Funding to Build Chips Business”, 2024
- “China’s Military and Government Acquire Nvidia Chips despite US Ban”, 2024
- “Generative AI Beyond LLMs: System Implications of Multi-Modal Generation”, et al 2023
- “Real-Time AI & The Future of AI Hardware”, 2023
- “OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman”, 2023
- “How Jensen Huang’s Nvidia Is Powering the AI Revolution: The Company’s CEO Bet It All on a New Kind of Chip. Now That Nvidia Is One of the Biggest Companies in the World, What Will He Do Next?”, 2023
- “Microsoft Swallows OpenAI’s Core Team § Compute Is King”, 2023
- “Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman Was Fundraising in the Middle East for New Chip Venture; The Project, Code-Named Tigris, Is Intended to Rival Nvidia”, 2023
- “DiLoCo: Distributed Low-Communication Training of Language Models”, et al 2023
- “LSS Transformer: Ultra-Long Sequence Distributed Transformer”, et al 2023
- “ChipNeMo: Domain-Adapted LLMs for Chip Design”, et al 2023
- wagieeacc @ “2023-10-17”
- “Saudi-China Collaboration Raises Concerns about Access to AI Chips: Fears Grow at Gulf Kingdom’s Top University That Ties to Chinese Researchers Risk Upsetting US Government”, et al 2023
- “Efficient Video and Audio Processing With Loihi 2”, et al 2023
- “Biden Is Beating China on Chips. It May Not Be Enough.”, 2023
- “Deep Mind’s Chief on AI’s Dangers—And the UK’s £900 Million Supercomputer: Demis Hassabis Says We Shouldn’t Let AI Fall into the Wrong Hands and the Government’s Plan to Build a Supercomputer for AI Is Likely to Be out of Date Before It Has Even Started”, 2023
- “Inflection AI Announces $1.3 Billion of Funding Led by Current Investors, Microsoft, and NVIDIA”, AI 2023
- “U.S. Considers New Curbs on AI Chip Exports to China: Restrictions Come amid Concerns That China Could Use AI Chips from Nvidia and Others for Weapon Development and Hacking”, et al 2023
- “Unleashing True Utility Computing With Quicksand”, et al 2023
- “The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like Toilet Paper during the Pandemic.’ Startups, Investors Scrounge for Computational Firepower”, 2023
- “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, et al 2023
- davidtayar5 @ “2023-02-10”
- “SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient”, et al 2023
- “Microsoft and OpenAI Extend Partnership”, 2023
- “A 64-Core Mixed-Signal In-Memory Compute Chip Based on Phase-Change Memory for Deep Neural Network Inference”, et al 2022
- “Efficiently Scaling Transformer Inference”, et al 2022
- “Reserve Capacity of NVIDIA HGX H100s on CoreWeave Now: Available at Scale in Q1 2023 Starting at $2.23/hr”, Core2022
- “Petals: Collaborative Inference and Fine-Tuning of Large Models”, et al 2022
- “Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training”, et al 2022
- “Is Integer Arithmetic Enough for Deep Learning Training?”, et al 2022
- “Efficient NLP Inference at the Edge via Elastic Pipelining”, et al 2022
- “Training Transformers Together”, et al 2022
- “Tutel: Adaptive Mixture-Of-Experts at Scale”, et al 2022
- “8-Bit Numerical Formats for Deep Neural Networks”, et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, et al 2022
- “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, et al 2022
- “A Low-Latency Communication Design for Brain Simulations”, 2022
- “Reducing Activation Recomputation in Large Transformer Models”, et al 2022
- “What Language Model to Train If You Have One Million GPU Hours?”, et al 2022
- “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, et al 2022
- “Pathways: Asynchronous Distributed Dataflow for ML”, et al 2022
- “LiteTransformerSearch: Training-Free Neural Architecture Search for Efficient Language Models”, et al 2022
- “Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads”, et al 2022
- “Maximizing Communication Efficiency for Large-Scale Training via 0/1 Adam”, et al 2022
- “Introducing the AI Research SuperCluster—Meta’s Cutting-Edge AI Supercomputer for AI Research”, 2022
- “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, 2022
- “Spiking Neural Networks and Their Applications: A Review”, et al 2022
- “On the Working Memory of Humans and Great Apes: Strikingly Similar or Remarkably Different?”, et al 2021
- “Sustainable AI: Environmental Implications, Challenges and Opportunities”, et al 2021
- “China Has Already Reached Exascale—On Two Separate Systems”, 2021
- “The Efficiency Misnomer”, et al 2021
- “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, et al 2021
- “WarpDrive: Extremely Fast End-To-End Deep Multi-Agent Reinforcement Learning on a GPU”, et al 2021
- “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, et al 2021
- “PatrickStar: Parallel Training of Pre-Trained Models via Chunk-Based Memory Management”, et al 2021
- “Demonstration of Decentralized, Physics-Driven Learning”, et al 2021
- “Chimera: Efficiently Training Large-Scale Neural Networks With Bidirectional Pipelines”, 2021
- “First-Generation Inference Accelerator Deployment at Facebook”, et al 2021
- “Single-Chip Photonic Deep Neural Network for Instantaneous Image Classification”, et al 2021
- “Distributed Deep Learning in Open Collaborations”, et al 2021
- “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, et al 2021
- “Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks”, et al 2021
- “2.5-Dimensional Distributed Model Training”, et al 2021
- “A Full-Stack Accelerator Search Technique for Vision Applications”, et al 2021
- “ChinAI #141: The PanGu Origin Story: Notes from an Informative Zhihu Thread on PanGu”, 2021
- “GSPMD: General and Scalable Parallelization for ML Computation Graphs”, et al 2021
- “PanGu-Α: Large-Scale Autoregressive Pretrained Chinese Language Models With Auto-Parallel Computation”, et al 2021
- “ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning”, et al 2021
- “How to Train BERT With an Academic Budget”, et al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, et al 2021
- “High-Performance, Distributed Training of Large-Scale Deep Learning Recommendation Models (DLRMs)”, et al 2021
- “An Efficient 2D Method for Training Super-Large Deep Learning Models”, et al 2021
- “Efficient Large-Scale Language Model Training on GPU Clusters”, et al 2021
- “Large Batch Simulation for Deep Reinforcement Learning”, et al 2021
- “Warehouse-Scale Video Acceleration (Argos): Co-Design and Deployment in the Wild”, et al 2021
- “TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models”, et al 2021
- “PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers”, et al 2021
- “ZeRO-Offload: Democratizing Billion-Scale Model Training”, et al 2021
- “The Design Process for Google’s Training Chips: TPUv2 and TPUv3”, et al 2021
- “Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment”, et al 2020
- “Parallel Training of Deep Networks With Local Updates”, et al 2020
- “Exploring the Limits of Concurrency in ML Training on Google TPUs”, et al 2020
- “BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters”, et al 2020
- “Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour”, et al 2020
- “Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?”, et al 2020
- “Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures”, et al 2020b
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, et al 2020
- “Interlocking Backpropagation: Improving Depthwise Model-Parallelism”, et al 2020
- “DeepSpeed: Extreme-Scale Model Training for Everyone”, et al 2020
- “Measuring Hardware Overhang”, hippke 2020
- “The Node Is Nonsense: There Are Better Ways to Measure Progress Than the Old Moore’s Law Metric”, 2020
- “Are We in an AI Overhang?”, 2020
- “HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, 2020
- “The Computational Limits of Deep Learning”, et al 2020
- “Data Movement Is All You Need: A Case Study on Optimizing Transformers”, et al 2020
- “PyTorch Distributed: Experiences on Accelerating Data Parallel Training”, et al 2020
- “Japanese Supercomputer Is Crowned World’s Speediest: In the Race for the Most Powerful Computers, Fugaku, a Japanese Supercomputer, Recently Beat American and Chinese Machines”, 2020
- “Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, et al 2020
- “PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training”, et al 2020
- “There’s Plenty of Room at the Top: What Will Drive Computer Performance After Moore’s Law?”, et al 2020
- “A Domain-Specific Supercomputer for Training Deep Neural Networks”, et al 2020
- “Microsoft Announces New Supercomputer, Lays out Vision for Future AI Work”, 2020
- “AI and Efficiency: We’re Releasing an Analysis Showing That Since 2012 the Amount of Compute Needed to Train a Neural Net to the Same Performance on ImageNet Classification Has Been Decreasing by a Factor of 2 Every 16 Months”, 2020
- “Computation in the Human Cerebral Cortex Uses Less Than 0.2 Watts yet This Great Expense Is Optimal When considering Communication Costs”, 2020
- “Startup Tenstorrent Shows AI Is Changing Computing and vice Versa: Tenstorrent Is One of the Rush of AI Chip Makers Founded in 2016 and Finally Showing Product. The New Wave of Chips Represent a Substantial Departure from How Traditional Computer Chips Work, but Also Point to Ways That Neural Network Design May Change in the Years to Come”, 2020
- “AI Chips: What They Are and Why They Matter—An AI Chips Reference”, 2020
- “2019 Recent Trends in GPU Price per FLOPS”, 2020
- “Pipelined Backpropagation at Scale: Training Large Models without Batches”, et al 2020
- “Ultrafast Machine Vision With 2D Material Neural Network Image Sensors”, et al 2020
- “Towards Spike-Based Machine Intelligence With Neuromorphic Computing”, et al 2019
- “Checkmate: Breaking the Memory Wall With Optimal Tensor Rematerialization”, et al 2019
- “Training Kinetics in 15 Minutes: Large-Scale Distributed Training on Videos”, et al 2019
- “Energy and Policy Considerations for Deep Learning in NLP”, et al 2019
- “Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes”, et al 2019
- “GAP: Generalizable Approximate Graph Partitioning Framework”, et al 2019
- “An Empirical Model of Large-Batch Training”, et al 2018
- “Bayesian Layers: A Module for Neural Network Uncertainty”, et al 2018
- “GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism”, et al 2018
- “Measuring the Effects of Data Parallelism on Neural Network Training”, et al 2018
- “Mesh-TensorFlow: Deep Learning for Supercomputers”, et al 2018
- “There Is Plenty of Time at the Bottom: the Economics, Risk and Ethics of Time Compression”, 2018
- “Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, et al 2018
- “AI and Compute”, et al 2018
- “Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions”, et al 2018
- “Loihi: A Neuromorphic Manycore Processor With On-Chip Learning”, et al 2018
- “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, et al 2017
- “Mixed Precision Training”, et al 2017
- “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”, et al 2016
- “Training Deep Nets With Sublinear Memory Cost”, et al 2016
- “GeePS: Scalable Deep Learning on Distributed GPUs With a GPU-Specialized Parameter Server”, et al 2016
- “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, et al 2016
- “Communication-Efficient Learning of Deep Networks from Decentralized Data”, et al 2016
- “Persistent RNNs: Stashing Recurrent Weights On-Chip”, et al 2016
- “The Brain As a Universal Learning Machine”, 2015
- “Scaling Distributed Machine Learning With the Parameter Server”, et al 2014
- “Multi-Column Deep Neural Network for Traffic Sign Classification”, Cireşan et al 2012b
- “Multi-Column Deep Neural Networks for Image Classification”, Cireşan et al 2012
- “Building High-Level Features Using Large Scale Unsupervised Learning”, et al 2011
- “Implications of Historical Trends in the Electrical Efficiency of Computing”, et al 2011
- “HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent”, et al 2011
- “DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification”, et al 2011
- “Goodbye 2010”, 2010
- “Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition”, et al 2010
- “The Cat Is out of the Bag: Cortical Simulations With 109 Neurons, 1013 Synapses”, et al 2009
- “Large-Scale Deep Unsupervised Learning Using Graphics Processors”, et al 2009
- “Bandwidth Optimal All-Reduce Algorithms for Clusters of Workstations”, 2009
- “Whole Brain Emulation: A Roadmap”
- “Moore’s Law and the Technology S-Curve”, 2004
- “DARPA and the Quest for Machine Intelligence, 1983–10199331ya”, 2002
- “Ultimate Physical Limits to Computation”, 1999
- “Matrioshka Brains”
- “When Will Computer Hardware Match the Human Brain?”, 1998
- “Superhumanism: According to Hans Moravec § AI Scaling”, 1995
- “A Sociological Study of the Official History of the Perceptrons Controversy [199331ya]”, 1993
- “Intelligence As an Emergent Behavior; Or, The Songs of Eden”, 1988
- “The Role Of RAW POWER In INTELLIGENCE”, 1976
- “Brain Performance in FLOPS”
- “Google Demonstrates Leading Performance in Latest MLPerf Benchmarks”
- “H100 GPUs Set Standard for Gen AI in Debut MLPerf Benchmark”
- “Introducing Cerebras Inference: AI at Instant Speed”, 2024
- “Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference”
- “NVIDIA Hopper Architecture In-Depth”
- “Trends in GPU Price-Performance”
- “NVIDIA/Megatron-LM: Ongoing Research Training Transformer Models at Scale”
- “12 Hours Later, Groq Deploys Llama-3-Instruct (8 & 70B)”
- “The Technology Behind BLOOM Training”
- “From Bare Metal to a 70B Model: Infrastructure Set-Up and Scripts”
- “AI Accelerators, Part IV: The Very Rich Landscape”, 2024
- “NVIDIA Announces DGX H100 Systems – World’s Most Advanced Enterprise AI Infrastructure”
- “NVIDIA Launches UK’s Most Powerful Supercomputer, for Research in AI and Healthcare”
- “Perlmutter, Said to Be the World’s Fastest AI Supercomputer, Comes Online”
- “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC 2024
- “Cerebras’ Tech Trains “Brain-Scale” AIs”
- “Fugaku Holds Top Spot, Exascale Remains Elusive”
- “342 Transistors for Every Person In the World: Cerebras 2nd Gen Wafer Scale Engine Teased”
- “Jim Keller Becomes CTO at Tenstorrent: “The Most Promising Architecture Out There””
- “NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems”
- “Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield”
- “AMD Announces Instinct MI200 Accelerator Family: Taking Servers to Exascale and Beyond”
- “NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder”
- “Biological Anchors: A Trick That Might Or Might Not Work”
- “Scaling Up and Out: Training Massive Models on Cerebras Systems Using Weight Streaming”
- “Fermi Estimate of Future Training Runs”
- “Carl Shulman #2: AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity’s Far Future”
- “Etched Is Making the Biggest Bet in AI”
- “The Emerging Age of AI Diplomacy: To Compete With China, the United States Must Walk a Tightrope in the Gulf”
- “The Resilience Myth: Fatal Flaws in the Push to Secure Chip Supply Chains”
- “Compute Funds and Pre-Trained Models”
- “The Next Big Thing: Introducing IPU-POD128 and IPU-POD256”
- “The WoW Factor: Graphcore Systems Get Huge Power and Efficiency Boost”
- “AWS Enables 4,000-GPU UltraClusters With New P4 A100 Instances”
- “Estimating Training Compute of Deep Learning Models”
- “The Colliding Exponentials of AI”
- “Moore’s Law, AI, and the pace of Progress”
- “How Fast Can We Perform a Forward Pass?”
- “”AI and Compute” Trend Isn’t Predictive of What Is Happening”
- “Brain Efficiency: Much More Than You Wanted to Know”
- “DeepSpeed: Accelerating Large-Scale Model Inference and Training via System Optimizations and Compression”
- “ZeRO-Infinity and DeepSpeed: Unlocking Unprecedented Model Scale for Deep Learning Training”
- “The World’s Largest Computer Chip”
- “The Billion Dollar AI Problem That Just Keeps Scaling”
- “TSMC Confirms 3nm Tech for 2022, Could Enable Epic 80 Billion Transistor GPUs”
- “ORNL’s Frontier First to Break the Exaflop Ceiling”
- “How to Accelerate Innovation With AI at Scale”
- “48:44—Tesla Vision · 1:13:12—Planning and Control · 1:24:35—Manual Labeling · 1:28:11—Auto Labeling · 1:35:15—Simulation · 1:42:10—Hardware Integration · 1:45:40—Dojo”
- lepikhin
- Wikipedia
- Miscellaneous
- Bibliography