https://github.com/microsoft/DeepSpeed
https://www.microsoft.com/en-us/research/project/ai-at-scale/
ZeRO-Offload: Democratizing Billion-Scale Model Training
Attention Is All You Need
1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed