Computer Optimization: Your Computer Is Faster Than You Think
Why a US AI "Manhattan Project" could backfire: notes from conversations in China
Getting AI Datacenters in the UK: Why the UK Needs to Create Special Compute Zones; and How to Do It
Nvidia’s AI chips are cheaper to rent in China than US: Supply of processors helps Chinese start-ups advance artificial intelligence technology despite Washington’s restrictions
Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine
Chips or Not, Chinese AI Pushes Ahead: A host of Chinese AI startups are attempting to write more efficient code for large language models
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
Huawei Faces Production Challenges with 20% Yield Rate for AI Chip
China Is Losing the Chip War: Xi Jinping picked a fight over semiconductor technology—one he can’t win
Elon Musk ordered Nvidia to ship thousands of AI chips reserved for Tesla to Twitter/xAI
Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion
Microsoft, OpenAI plan $100 billion data-center project, media report says
Singapore’s Temasek in discussions to invest in OpenAI: State-backed group in talks with ChatGPT maker’s chief Sam Altman who is seeking funding to build chips business
China’s military and government acquire Nvidia chips despite US ban
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman
How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?
Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman was fundraising in the Middle East for new chip venture; The project, code-named Tigris, is intended to rival Nvidia
DiLoCo: Distributed Low-Communication Training of Language Models
LSS Transformer: Ultra-Long Sequence Distributed Transformer
Saudi-China collaboration raises concerns about access to AI chips: Fears grow at Gulf kingdom’s top university that ties to Chinese researchers risk upsetting US government
Deep Mind’s chief on AI’s dangers—and the UK’s £900 million supercomputer: Demis Hassabis says we shouldn’t let AI fall into the wrong hands and the government’s plan to build a supercomputer for AI is likely to be out of date before it has even started
Inflection AI announces $1.3 billion of funding led by current investors, Microsoft, and NVIDIA
U.S. Considers New Curbs on AI Chip Exports to China: Restrictions come amid concerns that China could use AI chips from Nvidia and others for weapon development and hacking
The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like toilet paper during the pandemic.’ Startups, investors scrounge for computational firepower
Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
Context on the NVIDIA ChatGPT opportunity—and ramifications of large language model enthusiasm
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference
Reserve Capacity of NVIDIA HGX H100s on CoreWeave Now: Available at Scale in Q1 2023 Starting at $2.23/hr
Petals: Collaborative Inference and Fine-tuning of Large Models
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Efficient NLP Inference at the Edge via Elastic Pipelining
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Reducing Activation Recomputation in Large Transformer Models
What Language Model to Train if You Have One Million GPU Hours?
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Introducing the AI Research SuperCluster—Meta’s cutting-edge AI supercomputer for AI research
Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask
On the Working Memory of Humans and Great Apes: Strikingly Similar or Remarkably Different?
Sustainable AI: Environmental Implications, Challenges and Opportunities
China Has Already Reached Exascale—On Two Separate Systems
Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
First-Generation Inference Accelerator Deployment at Facebook
Single-chip photonic deep neural network for instantaneous image classification
Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks
A Full-stack Accelerator Search Technique for Vision Applications
ChinAI #141: The PanGu Origin Story: Notes from an informative Zhihu Thread on PanGu
GSPMD: General and Scalable Parallelization for ML Computation Graphs
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Podracer architectures for scalable Reinforcement Learning
High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)
An Efficient 2D Method for Training Super-Large Deep Learning Models
Efficient Large-Scale Language Model Training on GPU Clusters
Warehouse-Scale Video Acceleration (Argos): Co-design and Deployment in the Wild
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers
The Design Process for Google’s Training Chips: TPUv2 and TPUv3
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
Exploring the limits of Concurrency in ML Training on Google TPUs
BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm
Interlocking Backpropagation: Improving depthwise model-parallelism
The Node Is Nonsense: There are better ways to measure progress than the old Moore’s law metric
HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks
Data Movement Is All You Need: A Case Study on Optimizing Transformers
PyTorch Distributed: Experiences on Accelerating Data Parallel Training
Japanese Supercomputer Is Crowned World’s Speediest: In the race for the most powerful computers, Fugaku, a Japanese supercomputer, recently beat American and Chinese machines
Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS with Asynchronous Reinforcement Learning
PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
There’s plenty of room at the Top: What will drive computer performance after Moore’s law?
A domain-specific supercomputer for training deep neural networks
Microsoft announces new supercomputer, lays out vision for future AI work
AI and Efficiency: We’re releasing an analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months
Computation in the human cerebral cortex uses less than 0.2 watts yet this great expense is optimal when considering communication costs
Startup Tenstorrent shows AI is changing computing and vice versa: Tenstorrent is one of the rush of AI chip makers founded in 2016 and finally showing product. The new wave of chips represent a substantial departure from how traditional computer chips work, but also point to ways that neural network design may change in the years to come
AI Chips: What They Are and Why They Matter—An AI Chips Reference
Pipelined Backpropagation at Scale: Training Large Models without Batches
Ultrafast Machine Vision With 2D Material Neural Network Image Sensors
Towards spike-based machine intelligence with neuromorphic computing
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
GAP: Generalizable Approximate Graph Partitioning Framework
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Measuring the Effects of Data Parallelism on Neural Network Training
There is plenty of time at the bottom: the economics, risk and ethics of time compression
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in 4 Minutes
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
Loihi: A Neuromorphic Manycore Processor with On-Chip Learning
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server
Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing
Communication-Efficient Learning of Deep Networks from Decentralized Data
Scaling Distributed Machine Learning with the Parameter Server
Multi-column deep neural network for traffic sign classification
Multi-column Deep Neural Networks for Image Classification
Building high-level features using large scale unsupervised learning
Implications of Historical Trends in the Electrical Efficiency of Computing
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification
Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition
The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses
Large-scale deep unsupervised learning using graphics processors
Bandwidth optimal all-reduce algorithms for clusters of workstations
A Sociological Study of the Official History of the Perceptrons Controversy [1993]
Intelligence As an Emergent Behavior; Or, The Songs of Eden
Google Demonstrates Leading Performance in Latest MLPerf Benchmarks
H100 GPUs Set Standard for Gen AI in Debut MLPerf Benchmark
Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference
NVIDIA/Megatron-LM: Ongoing Research Training Transformer Models at Scale
From Bare Metal to a 70B Model: Infrastructure Set-Up and Scripts
NVIDIA Announces DGX H100 Systems – World’s Most Advanced Enterprise AI Infrastructure
NVIDIA Launches UK’s Most Powerful Supercomputer, for Research in AI and Healthcare
Perlmutter, Said to Be the World's Fastest AI Supercomputer, Comes Online
TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs
342 Transistors for Every Person In the World: Cerebras 2nd Gen Wafer Scale Engine Teased
Jim Keller Becomes CTO at Tenstorrent: "The Most Promising Architecture Out There"
NVIDIA Unveils Grace: A High-Performance Arm Server CPU For Use In Big AI Systems
Cerebras Unveils Wafer Scale Engine Two (WSE2): 2.6 Trillion Transistors, 100% Yield
AMD Announces Instinct MI200 Accelerator Family: Taking Servers to Exascale and Beyond
NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder
Scaling Up and Out: Training Massive Models on Cerebras Systems Using Weight Streaming
Carl Shulman #2: AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future
The Emerging Age of AI Diplomacy: To Compete With China, the United States Must Walk a Tightrope in the Gulf
The Resilience Myth: Fatal Flaws in the Push to Secure Chip Supply Chains
The WoW Factor: Graphcore Systems Get Huge Power and Efficiency Boost
AWS Enables 4,000-GPU UltraClusters With New P4 A100 Instances
"AI and Compute" Trend Isn't Predictive of What Is Happening
DeepSpeed: Accelerating Large-Scale Model Inference and Training via System Optimizations and Compression
ZeRO-Infinity and DeepSpeed: Unlocking Unprecedented Model Scale for Deep Learning Training
TSMC Confirms 3nm Tech for 2022, Could Enable Epic 80 Billion Transistor GPUs
48:44—Tesla Vision · 1:13:12—Planning and Control · 1:24:35—Manual Labeling · 1:28:11—Auto Labeling · 1:35:15—Simulation · 1:42:10—Hardware Integration · 1:45:40—Dojo
We Ran MoE (2048E,60L) With Bfloat16 Activations With Total of 1 Trillion Model Weights. Although Trainable With Manual Diagnostics, With Deep 1 Trillion Model We Encountered Several Trainability Issues With Numerical Stability. Will Follow Up.
https://www.reuters.com/technology/artificial-intelligence/openai-builds-first-chip-with-broadcom-tsmc-scales-back-foundry-ambition-2024-10-29/
2021-04-12-jensenhuang-gtc2021keynote-eAn_oiZwUXA.en.vtt.txt
2020-08-05-hippke-measuringhardwareoverhang-chessscaling19902020.png
1998-moravec-figure2-evolutionofcomputerpowercost19001998.csv
1998-moravec-figure2-evolutionofcomputerpowercost19001998.jpg
https://ai.facebook.com/blog/meta-training-inference-accelerator-AI-MTIA/
https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/
https://apps.fz-juelich.de/jsc/hps/juwels/configuration.html#hardware-configuration-of-the-system-name-booster-module
4234343ce438622214a38f09f3c93ebc81081679.html#hardware-configuration-of-the-system-name-booster-module
https://blogs.nvidia.com/blog/2021/04/12/cpu-grace-cscs-alps/
https://blogs.nvidia.com/blog/2022/09/08/hopper-mlperf-inference/
https://carnegieendowment.org/2022/11/22/after-chips-act-limits-of-reshoring-and-next-steps-for-u.s.-semiconductor-policy-pub-88439
https://caseyhandmer.wordpress.com/2024/03/12/how-to-feed-the-ais/
https://chipsandcheese.com/2023/07/02/nvidias-h100-funny-l2-and-tons-of-bandwidth/
https://cloud.google.com/blog/products/compute/the-worlds-largest-distributed-llm-training-job-on-tpu-v5e
https://cloud.google.com/blog/topics/systems/the-evolution-of-googles-jupiter-data-center-network
https://cset.georgetown.edu/wp-content/uploads/AI-and-Compute-How-Much-Longer-Can-Computing-Power-Drive-Artificial-Intelligence-Progress.pdf
https://evabehrens.substack.com/p/the-agi-race-between-the-us-and-china
https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/
https://gpus.llm-utils.org/nvidia-h100-gpus-supply-and-demand/#how-do-the-big-clouds-compare
d89d4f82112bca85cbf3807ffd2d98f634332f73.html#how-do-the-big-clouds-compare
https://newsletter.pragmaticengineer.com/p/scaling-chatgpt#%C2%A7five-scaling-challenges
de8235663546200800f6174df20b20cb9e48f951.html#%C2%A7five-scaling-challenges
https://openai.com/blog/techniques-for-training-large-neural-networks/
https://openai.com/research/scaling-kubernetes-to-7500-nodes
https://research.google/blog/tensorstore-for-high-performance-scalable-array-storage/
https://spectrum.ieee.org/computing/hardware/the-future-of-deep-learning-is-photonic
https://thechipletter.substack.com/p/googles-first-tpu-architecture
https://venturebeat.com/2020/11/17/cerebras-wafer-size-chip-is-10000-times-faster-than-a-gpu/
https://warontherocks.com/2024/04/how-washington-can-save-its-semiconductor-controls-on-china/
https://www.abortretry.fail/p/the-rise-and-fall-of-silicon-graphics
https://www.bloomberg.com/news/articles/2022-10-10/china-chip-stocks-drop-as-biden-tightens-rules-on-us-tech-access
https://www.businesswire.com/news/home/20241015910376/en/Crusoe-Blue-Owl-Capital-and-Primary-Digital-Infrastructure-Enter-3.4-billion-Joint-Venture-for-AI-Data-Center-Development
https://www.cerebras.net/blog/introducing-gigagpt-gpt-3-sized-models-in-565-lines-of-code
https://www.cerebras.net/press-release/cerebras-announces-third-generation-wafer-scale-engine
https://www.chinatalk.media/p/new-chip-export-controls-explained
https://www.chinatalk.media/p/new-sexport-controls-semianalysis
https://www.ft.com/content/25337df3-5b98-4dd1-b7a9-035dcc130d6a
https://www.fujitsu.com/global/about/resources/news/press-releases/2024/0510-01.html
https://www.lesswrong.com/posts/KsKfvLx7nFBZnWtEu/no-human-brains-are-not-much-more-efficient-than-computers
https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee
https://www.lesswrong.com/posts/cB2Rtnp7DBTpDy3ii/memory-bandwidth-constraints-imply-economies-of-scale-in-ai
https://www.nytimes.com/2022/10/13/us/politics/biden-china-technology-semiconductors.html
https://www.nytimes.com/2023/07/12/magazine/semiconductor-chips-us-china.html
https://www.reddit.com/r/MachineLearning/comments/1dlsogx/d_academic_ml_labs_how_many_gpus/
https://www.reuters.com/technology/coreweave-raises-23-billion-debt-collateralized-by-nvidia-chips-2023-08-03/
https://www.reuters.com/technology/inside-metas-scramble-catch-up-ai-2023-04-25/
https://www.theinformation.com/articles/microsoft-and-openai-plot-100-billion-stargate-ai-supercomputer
https://www.yitay.net/blog/training-great-llms-entirely-from-ground-zero-in-the-wilderness
Why a US AI "Manhattan Project" could backfire: notes from conversations in China
https%253A%252F%252Fbenjamintodd.substack.com%252Fp%252Fwhy-a-us-ai-manhattan-project-could.html
China Is Losing the Chip War: Xi Jinping picked a fight over semiconductor technology—one he can’t win
https%253A%252F%252Fwww.theatlantic.com%252Finternational%252Farchive%252F2024%252F06%252Fchina-microchip-technology-competition%252F678612%252F.html
Singapore’s Temasek in discussions to invest in OpenAI: State-backed group in talks with ChatGPT maker’s chief Sam Altman who is seeking funding to build chips business
https%253A%252F%252Fwww.ft.com%252Fcontent%252F8e8a65a0-a990-4c77-a6e8-ec4e5d247f80.html
OpenAI Agreed to Buy $51 Million of AI Chips From a Startup Backed by CEO Sam Altman
https%253A%252F%252Fwww.wired.com%252Fstory%252Fopenai-buy-ai-chips-startup-sam-altman%252F.html
How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?
https%253A%252F%252Fwww.newyorker.com%252Fmagazine%252F2023%252F12%252F04%252Fhow-jensen-huangs-nvidia-is-powering-the-ai-revolution.html
https%253A%252F%252Fwww.semianalysis.com%252Fp%252Fmicrosoft-swallows-openais-core-team%2523%2525C2%2525A7compute-is-king.html
Altman Sought Billions For Chip Venture Before OpenAI Ouster: Altman was fundraising in the Middle East for new chip venture; The project, code-named Tigris, is intended to rival Nvidia
https%253A%252F%252Fwww.bloomberg.com%252Fnews%252Farticles%252F2023-11-19%252Faltman-sought-billions-for-ai-chip-venture-before-openai-ouster.html
Saudi-China collaboration raises concerns about access to AI chips: Fears grow at Gulf kingdom’s top university that ties to Chinese researchers risk upsetting US government
https%253A%252F%252Fwww.ft.com%252Fcontent%252F2a636cee-b0d2-45c2-a815-11ca32371763.html
https%253A%252F%252Fwww.nytimes.com%252F2023%252F07%252F16%252Fopinion%252Fbiden-china-ai-chips-trade.html.html
Deep Mind’s chief on AI’s dangers—and the UK’s £900 million supercomputer: Demis Hassabis says we shouldn’t let AI fall into the wrong hands and the government’s plan to build a supercomputer for AI is likely to be out of date before it has even started
Inflection AI announces $1.3 billion of funding led by current investors, Microsoft, and NVIDIA
https%253A%252F%252Finflection.ai%252Finflection-ai-announces-1-3-billion-of-funding.html
The AI Boom Runs on Chips, but It Can’t Get Enough: ‘It’s like toilet paper during the pandemic.’ Startups, investors scrounge for computational firepower
https%253A%252F%252Fwww.wsj.com%252Farticles%252Fthe-ai-boom-runs-on-chips-but-it-cant-get-enough-9f76f554.html
Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
Context on the NVIDIA ChatGPT opportunity—and ramifications of large language model enthusiasm
https%253A%252F%252Fx.com%252Fdavidtayar5%252Fstatus%252F1627690520456691712.html
https%253A%252F%252Fblogs.microsoft.com%252Fblog%252F2023%252F01%252F23%252Fmicrosoftandopenaiextendpartnership%252F.html
https%253A%252F%252Farxiv.org%252Fabs%252F2211.05102%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2206.03382%2523microsoft.html
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2206.01861%2523microsoft.html
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2203.02094%2523microsoft.html
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
https%253A%252F%252Farxiv.org%252Fabs%252F2202.06009%2523microsoft.html
Introducing the AI Research SuperCluster—Meta’s cutting-edge AI supercomputer for AI research
https%253A%252F%252Fai.meta.com%252Fblog%252Fai-rsc%252F.html
Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask
https%253A%252F%252Fsemiengineering.com%252Fis-programmable-overhead-worth-the-cost%252F.html
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2021-jouppi.pdf.html
ChinAI #141: The PanGu Origin Story: Notes from an informative Zhihu Thread on PanGu
https%253A%252F%252Fchinai.substack.com%252Fp%252Fchinai-141-the-pangu-origin-story.html
Podracer architectures for scalable Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html
An Efficient 2D Method for Training Super-Large Deep Learning Models
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers
BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2020-jiang.pdf.html
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
https%253A%252F%252Farxiv.org%252Fabs%252F2011.00071%2523google.html
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2020-launay-2.pdf.html
https%253A%252F%252Fwww.microsoft.com%252Fen-us%252Fresearch%252Fblog%252Fdeepspeed-extreme-scale-model-training-for-everyone%252F.html
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FN6vZEnCn6A95Xn39p%252Fare-we-in-an-ai-overhang.html
Microsoft announces new supercomputer, lays out vision for future AI work
https%253A%252F%252Fnews.microsoft.com%252Fsource%252Ffeatures%252Fai%252Fopenai-azure-supercomputer%252F.html
Startup Tenstorrent shows AI is changing computing and vice versa: Tenstorrent is one of the rush of AI chip makers founded in 2016 and finally showing product. The new wave of chips represent a substantial departure from how traditional computer chips work, but also point to ways that neural network design may change in the years to come
https%253A%252F%252Fwww.zdnet.com%252Farticle%252Fstartup-tenstorrent-and-competitors-show-how-computing-is-changing-ai-and-vice-versa%252F.html
AI Chips: What They Are and Why They Matter—An AI Chips Reference
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2020-khan.pdf.html
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
https%253A%252F%252Farxiv.org%252Fabs%252F1904.00962%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F1811.02084%2523google.html
https%253A%252F%252Fopenai.com%252Fresearch%252Fai-and-compute.html
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
DanNet: Flexible, High Performance Convolutional Neural Networks for Image Classification
https%253A%252F%252Farxiv.org%252Fabs%252F1102.0183%2523schmidhuber.html
https%253A%252F%252Fwww.vetta.org%252F2010%252F12%252Fgoodbye-2010%252F.html
Large-scale deep unsupervised learning using graphics processors
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2009-raina.pdf.html
Bandwidth optimal all-reduce algorithms for clusters of workstations
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2009-patarasuk.pdf.html
https%253A%252F%252Fjetpress.org%252Fvolume1%252Fmoravec.htm.html
https%253A%252F%252Fwww.wired.com%252F1995%252F10%252Fmoravec%252F%2523scaling.html
A Sociological Study of the Official History of the Perceptrons Controversy [1993]
https%253A%252F%252Fweb.archive.org%252Fweb%252F20230710000944%252Fhttps%253A%252F%252Ffrc.ri.cmu.edu%252F~hpm%252Fproject.archive%252Fgeneral.articles%252F1975%252FRaw.Power.html.html
TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs
https%253A%252F%252Fsites.research.google%252Ftrc%252F.html