Bibliography (3):
https://dl.acm.org/doi/pdf/10.1145/307400.307435
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
How AI Training Scales