-
RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
-
Beyond neural scaling laws: beating power law scaling via data pruning
-
https://openwebtext2.com/
-
Measuring Mathematical Problem Solving With the MATH Dataset
-
https://arxiv.org/pdf/2404.07965#page=4&org=microsoft
-
https://arxiv.org/pdf/2404.07965#page=3&org=microsoft
-
https://arxiv.org/pdf/2404.07965#page=20&org=microsoft
-
https://arxiv.org/pdf/2404.07965#page=19q&org=microsoft
-
Top-K Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
-
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
-
https://arxiv.org/pdf/2404.07965#page=26&org=microsoft
-
Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time
-
https://arxiv.org/pdf/2404.07965#page=9&org=microsoft
-