“DeepSpeed: Extreme-Scale Model Training for Everyone”, DeepSpeed Team, Rangan Majumder, Junhua Wang2020-09-10 (, ; backlinks; similar)⁠:

Today, we are happy to share our new advancements that not only push deep learning training to the extreme, but also democratize it for more people—from data scientists training on massive supercomputers to those training on low-end clusters or even on a single GPU.

More specifically, DeepSpeed adds 4 new system technologies that further the AI at Scale initiative to innovate across Microsoft’s AI products and platforms. These offer extreme compute, memory, and communication efficiency, and they power model training with billions to trillions of parameters. The technologies also allow for extremely long input sequences and power on hardware systems with a single GPU, high-end clusters with thousands of GPUs, or low-end clusters with very slow ethernet networks.