Bibliography (3):

MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Turing-NLG: A 17-billion-parameter language model by Microsoft