“Table 2.2: Datasets Used to Train GPT-3. ‘Weight in Training Mix’ Refers to the Fraction of Examples during Training That Are Drawn from a given Dataset, Which We Intentionally Do Not Make Proportional to the Size of the Dataset. As a Result, When We Train for 300 Billion Tokens, Some Datasets Are Seen up to 3.4 times during Training While Other Datasets Are Seen Less Than Once” (backlinks)