Bibliography (12):

  1. RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

  2. https://github.com/JeanKaddour/NoTrainNoGain

  3. Accelerating Deep Learning by Focusing on the Biggest Losers

  4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  5. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  6. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  7. https://arxiv.org/pdf/2307.06440.pdf#page=6

  8. https://arxiv.org/pdf/2307.06440.pdf#page=25