“‘Data Pruning’ Tag”,2021-03-20
![]()
Bibliography for tag
reinforcement-learning/exploration/active-learning/data-pruning, most recent first: 24 annotations & 7 links (parent).
- See Also
- Gwern
- Links
- “Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review”, et al 2024
- “Improving Pretraining Data Using Perplexity Correlations”, et al 2024
- “DataComp-LM: In Search of the next Generation of Training Sets for Language Models”, et al 2024
- “Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models”, et al 2024
- “Rho-1: Not All Tokens Are What You Need”, et al 2024
- “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-2024
- “A Study in Dataset Pruning for Image Super-Resolution”, et al 2024
- “How to Train Data-Efficient LLMs”, et al 2024
- “Autonomous Data Selection With Language Models for Mathematical Texts”, et al 2024
- “Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling”, et al 2024
- “Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding”, et al 2023
- “Does CLIP’s Generalization Performance Mainly Stem from High Train-Test Similarity?”, et al 2023
- “Data Filtering Networks”, et al 2023
- “SlimPajama-DC: Understanding Data Combinations for LLM Training”, et al 2023
- “Anchor Points: Benchmarking Models With Much Fewer Examples”, et al 2023
- “When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, et al 2023
- “Beyond Scale: the Diversity Coefficient As a Data Quality Metric Demonstrates LLMs Are Pre-Trained on Formally Diverse Data”, et al 2023
- “Data Selection for Language Models via Importance Resampling”, et al 2023
- “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, et al 2022
- “Unadversarial Examples: Designing Objects for Robust Vision”, et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, et al 2020
- “Dataset Distillation”, et al 2018
- “Machine Teaching for Bayesian Learners in the Exponential Family”, 2013
- “FineWeb: Decanting the Web for the Finest Text Data at Scale”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography