Bibliography (16):

RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Beyond neural scaling laws: beating power law scaling via data pruning
https://openwebtext2.com/
Measuring Mathematical Problem Solving With the MATH Dataset
https://arxiv.org/pdf/2404.07965#page=4&org=microsoft
https://arxiv.org/pdf/2404.07965#page=3&org=microsoft
https://arxiv.org/pdf/2404.07965#page=20&org=microsoft
https://arxiv.org/pdf/2404.07965#page=19q&org=microsoft
Top-K Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
https://arxiv.org/pdf/2404.07965#page=26&org=microsoft
Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time
https://arxiv.org/pdf/2404.07965#page=9&org=microsoft
Wikipedia Bibliography:
1. https://en.wikipedia.org/wiki/Perplexity :
  
  https://en.wikipedia.org/wiki/Perplexity
2. Variance
3. Metamath