https://x.com/sangmichaelxie/status/1660909587070095360
Distributionally Robust Language Modeling
https://pile.eleuther.ai/
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
https://arxiv.org/pdf/2305.10429.pdf#page=22&org=google