- See Also
- Gwern
-
Links
- “Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, Ankner et al 2024
- “Rho-1: Not All Tokens Are What You Need”, Lin et al 2024
- “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-Zhu & Li 2024
- “A Study in Dataset Pruning for Image Super-Resolution”, Moser et al 2024
- “How to Train Data-Efficient LLMs”, Sachdeva et al 2024
- “Autonomous Data Selection With Language Models for Mathematical Texts”, Zhang et al 2024
- “Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding”, Evans et al 2023
- “Does CLIP’s Generalization Performance Mainly Stem from High Train-Test Similarity?”, Mayilvahanan et al 2023
- “Data Filtering Networks”, Fang et al 2023
- “SlimPajama-DC: Understanding Data Combinations for LLM Training”, Shen et al 2023
- “Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023
- “When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
- “Beyond Scale: the Diversity Coefficient As a Data Quality Metric Demonstrates LLMs Are Pre-Trained on Formally Diverse Data”, Lee et al 2023
- “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, Sorscher et al 2022
- “Unadversarial Examples: Designing Objects for Robust Vision”, Salman et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
- “Dataset Distillation”, Wang et al 2018
- “Machine Teaching for Bayesian Learners in the Exponential Family”, Zhu 2013
- “FineWeb: Decanting the Web for the Finest Text Data at Scale”
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Gwern
“Making Anime Faces With StyleGAN”, Gwern 2019
Links
“Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, Ankner et al 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
“Rho-1: Not All Tokens Are What You Need”, Lin et al 2024
“Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-Zhu & Li 2024
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
“A Study in Dataset Pruning for Image Super-Resolution”, Moser et al 2024
“How to Train Data-Efficient LLMs”, Sachdeva et al 2024
“Autonomous Data Selection With Language Models for Mathematical Texts”, Zhang et al 2024
Autonomous Data Selection with Language Models for Mathematical Texts
“Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding”, Evans et al 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
“Does CLIP’s Generalization Performance Mainly Stem from High Train-Test Similarity?”, Mayilvahanan et al 2023
Does CLIP’s Generalization Performance Mainly Stem from High Train-Test Similarity?
“Data Filtering Networks”, Fang et al 2023
“SlimPajama-DC: Understanding Data Combinations for LLM Training”, Shen et al 2023
SlimPajama-DC: Understanding Data Combinations for LLM Training
“Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023
“When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
“Beyond Scale: the Diversity Coefficient As a Data Quality Metric Demonstrates LLMs Are Pre-Trained on Formally Diverse Data”, Lee et al 2023
“Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, Sorscher et al 2022
Beyond neural scaling laws: beating power law scaling via data pruning
“Unadversarial Examples: Designing Objects for Robust Vision”, Salman et al 2020
“Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
“Dataset Distillation”, Wang et al 2018
“Machine Teaching for Bayesian Learners in the Exponential Family”, Zhu 2013
Machine Teaching for Bayesian Learners in the Exponential Family
“FineWeb: Decanting the Web for the Finest Text Data at Scale”
FineWeb: decanting the web for the finest text data at scale
Wikipedia
-
Coreset:
Miscellaneous
-
https://aclanthology.org/2023.findings-emnlp.18/
:View External Link:
Link Bibliography
-
https://arxiv.org/abs/2405.20541
: “Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, -
https://arxiv.org/abs/2404.07965#microsoft
: “Rho-1: Not All Tokens Are What You Need”, -
https://arxiv.org/abs/2402.07625
: “Autonomous Data Selection With Language Models for Mathematical Texts”, -
https://arxiv.org/abs/2312.05328#deepmind
: “Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding”, -
https://arxiv.org/abs/2309.17425#apple
: “Data Filtering Networks”, -
https://arxiv.org/abs/2309.10818#cerebras
: “SlimPajama-DC: Understanding Data Combinations for LLM Training”, -
https://arxiv.org/abs/2206.14486
: “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”,