Bibliography (6):

  1. https://github.com/microsoft/DeepSpeed

  2. https://www.microsoft.com/en-us/research/project/ai-at-scale/

  3. ZeRO-Offload: Democratizing Billion-Scale Model Training

  4. Attention Is All You Need

  5. 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed