-
https://x.com/sangmichaelxie/status/1660909587070095360
-
Distributionally Robust Language Modeling
-
https://pile.eleuther.ai/
-
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
-
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
-
https://arxiv.org/pdf/2305.10429.pdf#page=22&org=google
-