Bibliography (3):
Machine Learning Scaling
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
The Pile: An 800GB Dataset of Diverse Text for Language Modeling