-
https://github.com/microsoft/DeepSpeed
-
https://www.microsoft.com/en-us/research/project/ai-at-scale/
-
ZeRO-Offload: Democratizing Billion-Scale Model Training
-
Attention Is All You Need
-
1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed
-