Bibliography (8):

  1. https://x.com/ZackAnkner/status/1797595682439901565

  2. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  3. https://arxiv.org/pdf/2405.20541#page=8

  4. https://openwebtext2.readthedocs.io/en/latest/