- See Also
-
Links
- “Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning”, Xia et al 2023
- “A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
- “Fast As CHITA: Neural Network Pruning With Combinatorial Optimization”, Benbaki et al 2023
- “Pruning Compact ConvNets for Efficient Inference”, Ghosh et al 2023
- “Lottery Tickets on a Data Diet: Finding Initializations With Sparse Trainable Networks”, Paul et al 2022
- “PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression”, Vo et al 2022
- “Data-Efficient Structured Pruning via Submodular Optimization”, Halabi et al 2022
- “The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks”, Yu et al 2022
- “Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”, Chen et al 2022
- “Fortuitous Forgetting in Connectionist Networks”, Zhou et al 2022
- “How Many Degrees of Freedom Do We Need to Train Deep Networks: a Loss Landscape Perspective”, Larsen et al 2021
- “Prune Once for All: Sparse Pre-Trained Language Models”, Zafrir et al 2021
- “DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models”, Chen et al 2021
- “HALP: Hardware-Aware Latency Pruning”, Shen et al 2021
- “On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”, Lai et al 2021
- “Block Pruning For Faster Transformers”, Lagunas et al 2021
- “Scaling Laws for Deep Learning”, Rosenfeld 2021
- “A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, Diffenderfer et al 2021
- “Chasing Sparsity in Vision Transformers: An End-to-End Exploration”, Chen et al 2021
- “On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Vischer et al 2021
- “Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, Pellegrini & Biroli 2021
- “Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch”, Zhou et al 2021
- “Postnatal Connectomic Development of Inhibition in Mouse Barrel Cortex”, Gour et al 2021
- “ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Song et al 2021
- “A Primer in BERTology: What We Know about How BERT Works”, Rogers et al 2020
- “Bort: Optimal Subarchitecture Extraction For BERT”, Wynter & Perry 2020
- “Pruning Neural Networks at Initialization: Why Are We Missing the Mark?”, Frankle et al 2020
- “Logarithmic Pruning Is All You Need”, Orseau et al 2020
- “On the Predictability of Pruning Across Scales”, Rosenfeld et al 2020
- “Progressive Skeletonization: Trimming More Fat from a Network at Initialization”, Jorge et al 2020
- “Pruning Neural Networks without Any Data by Iteratively Conserving Synaptic Flow”, Tanaka et al 2020
- “Movement Pruning: Adaptive Sparsity by Fine-Tuning”, Sanh et al 2020
- “Bayesian Bits: Unifying Quantization and Pruning”, Baalen et al 2020
- “Lite Transformer With Long-Short Range Attention”, Wu et al 2020
- “On the Effect of Dropping Layers of Pre-trained Transformer Models”, Sajjad et al 2020
- “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Li et al 2020
- “Sparse Networks from Scratch: Faster Training without Losing Performance”, Dettmers & Zettlemoyer 2019
- “Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Yu et al 2019
- “SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”, Fedorov et al 2019
- “Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned”, Voita et al 2019
- “Stabilizing the Lottery Ticket Hypothesis”, Frankle et al 2019
- “The State of Sparsity in Deep Neural Networks”, Gale et al 2019
- “Differential Contribution of Cortical Thickness, Surface Area, and Gyrification to Fluid and Crystallized Intelligence”, Tadayon et al 2019
- “A Closer Look at Structured Pruning for Neural Network Compression”, Crowley et al 2018
- “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, Frankle & Carbin 2018
- “Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018
- “Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Mittal et al 2018
- “Learning to Prune Filters in Convolutional Neural Networks”, Huang et al 2018
- “Faster Gaze Prediction With Dense Networks and Fisher Pruning”, Theis et al 2018
- “Automated Pruning for Deep Neural Network Compression”, Manessi et al 2017
- “Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Sun et al 2017
- “NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm”, Dai et al 2017
- “To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, Zhu & Gupta 2017
- “Bayesian Sparsification of Recurrent Neural Networks”, Lobacheva et al 2017
- “Structured Bayesian Pruning via Log-Normal Multiplicative Noise”, Neklyudov et al 2017
- “Exploring Sparsity in Recurrent Neural Networks”, Narang et al 2017
- “Variational Dropout Sparsifies Deep Neural Networks”, Molchanov et al 2017
- “Iterative Magnitude Pruning: Learning Both Weights and Connections for Efficient Neural Networks”, Han et al 2015
- “Flat Minima”, Hochreiter & Schmidhuber 1997
- “Optimal Brain Surgeon and General Network Pruning”, Hassibi et al 1993
- “Fault Tolerance of Pruned Multilayer Networks”, Segee & Carter 1991
- “Using Relevance to Reduce Network Size Automatically”, Mozer & Smolensky 1989
- “Optimal Brain Damage”, LeCun et al 1989
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning”, Xia et al 2023
“Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning”
“A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
“Fast As CHITA: Neural Network Pruning With Combinatorial Optimization”, Benbaki et al 2023
“Fast as CHITA: Neural Network Pruning with Combinatorial Optimization”
“Pruning Compact ConvNets for Efficient Inference”, Ghosh et al 2023
“Lottery Tickets on a Data Diet: Finding Initializations With Sparse Trainable Networks”, Paul et al 2022
“Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks”
“PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression”, Vo et al 2022
“Data-Efficient Structured Pruning via Submodular Optimization”, Halabi et al 2022
“Data-Efficient Structured Pruning via Submodular Optimization”
“The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks”, Yu et al 2022
“The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks”
“Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”, Chen et al 2022
“Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”
“Fortuitous Forgetting in Connectionist Networks”, Zhou et al 2022
“How Many Degrees of Freedom Do We Need to Train Deep Networks: a Loss Landscape Perspective”, Larsen et al 2021
“How many degrees of freedom do we need to train deep networks: a loss landscape perspective”
“Prune Once for All: Sparse Pre-Trained Language Models”, Zafrir et al 2021
“DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models”, Chen et al 2021
“DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models”
“HALP: Hardware-Aware Latency Pruning”, Shen et al 2021
“On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”, Lai et al 2021
“On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”
“Block Pruning For Faster Transformers”, Lagunas et al 2021
“Scaling Laws for Deep Learning”, Rosenfeld 2021
“A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, Diffenderfer et al 2021
“A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”
“Chasing Sparsity in Vision Transformers: An End-to-End Exploration”, Chen et al 2021
“Chasing Sparsity in Vision Transformers: An End-to-End Exploration”
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Vischer et al 2021
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”
“Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, Pellegrini & Biroli 2021
“Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch”, Zhou et al 2021
“Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch”
“Postnatal Connectomic Development of Inhibition in Mouse Barrel Cortex”, Gour et al 2021
“Postnatal connectomic development of inhibition in mouse barrel cortex”
“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Song et al 2021
“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”
“A Primer in BERTology: What We Know about How BERT Works”, Rogers et al 2020
“Bort: Optimal Subarchitecture Extraction For BERT”, Wynter & Perry 2020
“Pruning Neural Networks at Initialization: Why Are We Missing the Mark?”, Frankle et al 2020
“Pruning Neural Networks at Initialization: Why are We Missing the Mark?”
“Logarithmic Pruning Is All You Need”, Orseau et al 2020
“On the Predictability of Pruning Across Scales”, Rosenfeld et al 2020
“Progressive Skeletonization: Trimming More Fat from a Network at Initialization”, Jorge et al 2020
“Progressive Skeletonization: Trimming more fat from a network at initialization”
“Pruning Neural Networks without Any Data by Iteratively Conserving Synaptic Flow”, Tanaka et al 2020
“Pruning neural networks without any data by iteratively conserving synaptic flow”
“Movement Pruning: Adaptive Sparsity by Fine-Tuning”, Sanh et al 2020
“Bayesian Bits: Unifying Quantization and Pruning”, Baalen et al 2020
“Lite Transformer With Long-Short Range Attention”, Wu et al 2020
“On the Effect of Dropping Layers of Pre-trained Transformer Models”, Sajjad et al 2020
“On the Effect of Dropping Layers of Pre-trained Transformer Models”
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Li et al 2020
“Sparse Networks from Scratch: Faster Training without Losing Performance”, Dettmers & Zettlemoyer 2019
“Sparse Networks from Scratch: Faster Training without Losing Performance”
“Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Yu et al 2019
“Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP”
“SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”, Fedorov et al 2019
“SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”
“Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned”, Voita et al 2019
“Stabilizing the Lottery Ticket Hypothesis”, Frankle et al 2019
“The State of Sparsity in Deep Neural Networks”, Gale et al 2019
“Differential Contribution of Cortical Thickness, Surface Area, and Gyrification to Fluid and Crystallized Intelligence”, Tadayon et al 2019
“A Closer Look at Structured Pruning for Neural Network Compression”, Crowley et al 2018
“A Closer Look at Structured Pruning for Neural Network Compression”
“The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, Frankle & Carbin 2018
“The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”
“Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018
“Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Mittal et al 2018
“Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”
“Learning to Prune Filters in Convolutional Neural Networks”, Huang et al 2018
“Learning to Prune Filters in Convolutional Neural Networks”
“Faster Gaze Prediction With Dense Networks and Fisher Pruning”, Theis et al 2018
“Faster gaze prediction with dense networks and Fisher pruning”
“Automated Pruning for Deep Neural Network Compression”, Manessi et al 2017
“Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Sun et al 2017
“NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm”, Dai et al 2017
“NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm”
“To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, Zhu & Gupta 2017
“To prune, or not to prune: exploring the efficacy of pruning for model compression”
“Bayesian Sparsification of Recurrent Neural Networks”, Lobacheva et al 2017
“Structured Bayesian Pruning via Log-Normal Multiplicative Noise”, Neklyudov et al 2017
“Structured Bayesian Pruning via Log-Normal Multiplicative Noise”
“Exploring Sparsity in Recurrent Neural Networks”, Narang et al 2017
“Variational Dropout Sparsifies Deep Neural Networks”, Molchanov et al 2017
“Iterative Magnitude Pruning: Learning Both Weights and Connections for Efficient Neural Networks”, Han et al 2015
“Iterative Magnitude Pruning: Learning both Weights and Connections for Efficient Neural Networks”
“Flat Minima”, Hochreiter & Schmidhuber 1997
“Optimal Brain Surgeon and General Network Pruning”, Hassibi et al 1993
“Fault Tolerance of Pruned Multilayer Networks”, Segee & Carter 1991
“Using Relevance to Reduce Network Size Automatically”, Mozer & Smolensky 1989
“Optimal Brain Damage”, LeCun et al 1989
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
subnetwork
sparsity
netpruning
neuralpruning
Wikipedia
Miscellaneous
-
/doc/ai/nn/sparsity/pruning/2020-rosenfeld-equation1-functionalformofdlscalingpruninglaw.png
-
/doc/ai/nn/sparsity/pruning/2020-rogers-table1-bertcompression.png
-
https://cprimozic.net/blog/reverse-engineering-a-small-neural-network/
-
https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms
-
https://twitter.com/RamaswmySridhar/status/1621870497070981121
Link Bibliography
-
https://arxiv.org/abs/2310.06694
: “Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning”, Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen -
https://arxiv.org/abs/2202.09844
: “Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”, Tianlong Chen, Zhenyu Zhang, Pengjun Wang, Santosh Balachandra, Haoyu Ma, Zehao Wang, Zhangyang Wang -
https://arxiv.org/abs/2111.05754
: “Prune Once for All: Sparse Pre-Trained Language Models”, Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat -
https://arxiv.org/abs/2111.00160
: “DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models”, Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallah -
https://arxiv.org/abs/2108.07686
: “Scaling Laws for Deep Learning”, Jonathan S. Rosenfeld -
https://arxiv.org/abs/2106.04533
: “Chasing Sparsity in Vision Transformers: An End-to-End Exploration”, Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang -
https://arxiv.org/abs/2006.10621
: “On the Predictability of Pruning Across Scales”, Jonathan S. Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit -
https://arxiv.org/abs/2004.03844
: “On the Effect of Dropping Layers of Pre-trained Transformer Models”, Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov -
https://arxiv.org/abs/1903.01611
: “Stabilizing the Lottery Ticket Hypothesis”, Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin -
https://arxiv.org/abs/1902.09574
: “The State of Sparsity in Deep Neural Networks”, Trevor Gale, Erich Elsen, Sara Hooker -
https://arxiv.org/abs/1810.04622
: “A Closer Look at Structured Pruning for Neural Network Compression”, Elliot J. Crowley, Jack Turner, Amos Storkey, Michael O’Boyle -
https://arxiv.org/abs/1801.10447
: “Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Deepak Mittal, Shweta Bhardwaj, Mitesh M. Khapra, Balaraman Ravindran -
1993-hassibi.pdf
: “Optimal Brain Surgeon and General Network Pruning”, Babak Hassibi, David G. Stork, Gregory J. Wolff