Bibliography (11):

  1. https://x.com/sangmichaelxie/status/1660909587070095360

  2. Distributionally Robust Language Modeling

  3. https://pile.eleuther.ai/

  4. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

  5. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  6. https://arxiv.org/pdf/2305.10429.pdf#page=22&org=google