 See Also

Links
 “Sheared LLaMA: Accelerating Language Model Pretraining via Structured Pruning”, Xia et al 2023
 “A Comparative Study between FullParameter and LoRAbased FineTuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
 “Fast As CHITA: Neural Network Pruning With Combinatorial Optimization”, Benbaki et al 2023
 “Pruning Compact ConvNets for Efficient Inference”, Ghosh et al 2023
 “Lottery Tickets on a Data Diet: Finding Initializations With Sparse Trainable Networks”, Paul et al 2022
 “PPCDGAN: Progressive Pruning and ClassAware Distillation for LargeScale Conditional GANs Compression”, Vo et al 2022
 “DataEfficient Structured Pruning via Submodular Optimization”, Halabi et al 2022
 “The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks”, Yu et al 2022
 “Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”, Chen et al 2022
 “Fortuitous Forgetting in Connectionist Networks”, Zhou et al 2022
 “How Many Degrees of Freedom Do We Need to Train Deep Networks: a Loss Landscape Perspective”, Larsen et al 2021
 “Prune Once for All: Sparse PreTrained Language Models”, Zafrir et al 2021
 “DSEE: Dually Sparsityembedded Efficient Tuning of Pretrained Language Models”, Chen et al 2021
 “HALP: HardwareAware Latency Pruning”, Shen et al 2021
 “On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”, Lai et al 2021
 “Block Pruning For Faster Transformers”, Lagunas et al 2021
 “Scaling Laws for Deep Learning”, Rosenfeld 2021
 “A Winning Hand: Compressing Deep Networks Can Improve OutOfDistribution Robustness”, Diffenderfer et al 2021
 “Chasing Sparsity in Vision Transformers: An EndtoEnd Exploration”, Chen et al 2021
 “On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Vischer et al 2021
 “Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, Pellegrini & Biroli 2021
 “Learning N:M Finegrained Structured Sparse Neural Networks From Scratch”, Zhou et al 2021
 “Postnatal Connectomic Development of Inhibition in Mouse Barrel Cortex”, Gour et al 2021
 “ESENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Song et al 2021
 “A Primer in BERTology: What We Know about How BERT Works”, Rogers et al 2020
 “Bort: Optimal Subarchitecture Extraction For BERT”, Wynter & Perry 2020
 “Pruning Neural Networks at Initialization: Why Are We Missing the Mark?”, Frankle et al 2020
 “Logarithmic Pruning Is All You Need”, Orseau et al 2020
 “On the Predictability of Pruning Across Scales”, Rosenfeld et al 2020
 “Progressive Skeletonization: Trimming More Fat from a Network at Initialization”, Jorge et al 2020
 “Pruning Neural Networks without Any Data by Iteratively Conserving Synaptic Flow”, Tanaka et al 2020
 “Movement Pruning: Adaptive Sparsity by FineTuning”, Sanh et al 2020
 “Bayesian Bits: Unifying Quantization and Pruning”, Baalen et al 2020
 “Lite Transformer With LongShort Range Attention”, Wu et al 2020
 “On the Effect of Dropping Layers of Pretrained Transformer Models”, Sajjad et al 2020
 “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Li et al 2020
 “Sparse Networks from Scratch: Faster Training without Losing Performance”, Dettmers & Zettlemoyer 2019
 “Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Yu et al 2019
 “SpArSe: Sparse Architecture Search for CNNs on ResourceConstrained Microcontrollers”, Fedorov et al 2019
 “Analyzing MultiHead SelfAttention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned”, Voita et al 2019
 “Stabilizing the Lottery Ticket Hypothesis”, Frankle et al 2019
 “The State of Sparsity in Deep Neural Networks”, Gale et al 2019
 “Differential Contribution of Cortical Thickness, Surface Area, and Gyrification to Fluid and Crystallized Intelligence”, Tadayon et al 2019
 “A Closer Look at Structured Pruning for Neural Network Compression”, Crowley et al 2018
 “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, Frankle & Carbin 2018
 “Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018
 “Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Mittal et al 2018
 “Learning to Prune Filters in Convolutional Neural Networks”, Huang et al 2018
 “Faster Gaze Prediction With Dense Networks and Fisher Pruning”, Theis et al 2018
 “Automated Pruning for Deep Neural Network Compression”, Manessi et al 2017
 “Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Sun et al 2017
 “NeST: A Neural Network Synthesis Tool Based on a GrowandPrune Paradigm”, Dai et al 2017
 “To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, Zhu & Gupta 2017
 “Bayesian Sparsification of Recurrent Neural Networks”, Lobacheva et al 2017
 “Structured Bayesian Pruning via LogNormal Multiplicative Noise”, Neklyudov et al 2017
 “Exploring Sparsity in Recurrent Neural Networks”, Narang et al 2017
 “Variational Dropout Sparsifies Deep Neural Networks”, Molchanov et al 2017
 “Iterative Magnitude Pruning: Learning Both Weights and Connections for Efficient Neural Networks”, Han et al 2015
 “Flat Minima”, Hochreiter & Schmidhuber 1997
 “Optimal Brain Surgeon and General Network Pruning”, Hassibi et al 1993
 “Fault Tolerance of Pruned Multilayer Networks”, Segee & Carter 1991
 “Using Relevance to Reduce Network Size Automatically”, Mozer & Smolensky 1989
 “Optimal Brain Damage”, LeCun et al 1989
 Sort By Magic
 Wikipedia
 Miscellaneous
 Link Bibliography
See Also
Links
“Sheared LLaMA: Accelerating Language Model Pretraining via Structured Pruning”, Xia et al 2023
“Sheared LLaMA: Accelerating Language Model Pretraining via Structured Pruning”
“A Comparative Study between FullParameter and LoRAbased FineTuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
“Fast As CHITA: Neural Network Pruning With Combinatorial Optimization”, Benbaki et al 2023
“Fast as CHITA: Neural Network Pruning with Combinatorial Optimization”
“Pruning Compact ConvNets for Efficient Inference”, Ghosh et al 2023
“Lottery Tickets on a Data Diet: Finding Initializations With Sparse Trainable Networks”, Paul et al 2022
“Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks”
“PPCDGAN: Progressive Pruning and ClassAware Distillation for LargeScale Conditional GANs Compression”, Vo et al 2022
“DataEfficient Structured Pruning via Submodular Optimization”, Halabi et al 2022
“DataEfficient Structured Pruning via Submodular Optimization”
“The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks”, Yu et al 2022
“The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks”
“Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”, Chen et al 2022
“Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”
“Fortuitous Forgetting in Connectionist Networks”, Zhou et al 2022
“How Many Degrees of Freedom Do We Need to Train Deep Networks: a Loss Landscape Perspective”, Larsen et al 2021
“How many degrees of freedom do we need to train deep networks: a loss landscape perspective”
“Prune Once for All: Sparse PreTrained Language Models”, Zafrir et al 2021
“DSEE: Dually Sparsityembedded Efficient Tuning of Pretrained Language Models”, Chen et al 2021
“DSEE: Dually Sparsityembedded Efficient Tuning of Pretrained Language Models”
“HALP: HardwareAware Latency Pruning”, Shen et al 2021
“On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”, Lai et al 2021
“On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”
“Block Pruning For Faster Transformers”, Lagunas et al 2021
“Scaling Laws for Deep Learning”, Rosenfeld 2021
“A Winning Hand: Compressing Deep Networks Can Improve OutOfDistribution Robustness”, Diffenderfer et al 2021
“A Winning Hand: Compressing Deep Networks Can Improve OutOfDistribution Robustness”
“Chasing Sparsity in Vision Transformers: An EndtoEnd Exploration”, Chen et al 2021
“Chasing Sparsity in Vision Transformers: An EndtoEnd Exploration”
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Vischer et al 2021
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”
“Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, Pellegrini & Biroli 2021
“Learning N:M Finegrained Structured Sparse Neural Networks From Scratch”, Zhou et al 2021
“Learning N:M Finegrained Structured Sparse Neural Networks From Scratch”
“Postnatal Connectomic Development of Inhibition in Mouse Barrel Cortex”, Gour et al 2021
“Postnatal connectomic development of inhibition in mouse barrel cortex”
“ESENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Song et al 2021
“ESENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”
“A Primer in BERTology: What We Know about How BERT Works”, Rogers et al 2020
“Bort: Optimal Subarchitecture Extraction For BERT”, Wynter & Perry 2020
“Pruning Neural Networks at Initialization: Why Are We Missing the Mark?”, Frankle et al 2020
“Pruning Neural Networks at Initialization: Why are We Missing the Mark?”
“Logarithmic Pruning Is All You Need”, Orseau et al 2020
“On the Predictability of Pruning Across Scales”, Rosenfeld et al 2020
“Progressive Skeletonization: Trimming More Fat from a Network at Initialization”, Jorge et al 2020
“Progressive Skeletonization: Trimming more fat from a network at initialization”
“Pruning Neural Networks without Any Data by Iteratively Conserving Synaptic Flow”, Tanaka et al 2020
“Pruning neural networks without any data by iteratively conserving synaptic flow”
“Movement Pruning: Adaptive Sparsity by FineTuning”, Sanh et al 2020
“Bayesian Bits: Unifying Quantization and Pruning”, Baalen et al 2020
“Lite Transformer With LongShort Range Attention”, Wu et al 2020
“On the Effect of Dropping Layers of Pretrained Transformer Models”, Sajjad et al 2020
“On the Effect of Dropping Layers of Pretrained Transformer Models”
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Li et al 2020
“Sparse Networks from Scratch: Faster Training without Losing Performance”, Dettmers & Zettlemoyer 2019
“Sparse Networks from Scratch: Faster Training without Losing Performance”
“Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Yu et al 2019
“Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP”
“SpArSe: Sparse Architecture Search for CNNs on ResourceConstrained Microcontrollers”, Fedorov et al 2019
“SpArSe: Sparse Architecture Search for CNNs on ResourceConstrained Microcontrollers”
“Analyzing MultiHead SelfAttention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned”, Voita et al 2019
“Stabilizing the Lottery Ticket Hypothesis”, Frankle et al 2019
“The State of Sparsity in Deep Neural Networks”, Gale et al 2019
“Differential Contribution of Cortical Thickness, Surface Area, and Gyrification to Fluid and Crystallized Intelligence”, Tadayon et al 2019
“A Closer Look at Structured Pruning for Neural Network Compression”, Crowley et al 2018
“A Closer Look at Structured Pruning for Neural Network Compression”
“The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, Frankle & Carbin 2018
“The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”
“Efficient Neural Audio Synthesis”, Kalchbrenner et al 2018
“Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Mittal et al 2018
“Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”
“Learning to Prune Filters in Convolutional Neural Networks”, Huang et al 2018
“Learning to Prune Filters in Convolutional Neural Networks”
“Faster Gaze Prediction With Dense Networks and Fisher Pruning”, Theis et al 2018
“Faster gaze prediction with dense networks and Fisher pruning”
“Automated Pruning for Deep Neural Network Compression”, Manessi et al 2017
“Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Sun et al 2017
“NeST: A Neural Network Synthesis Tool Based on a GrowandPrune Paradigm”, Dai et al 2017
“NeST: A Neural Network Synthesis Tool Based on a GrowandPrune Paradigm”
“To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, Zhu & Gupta 2017
“To prune, or not to prune: exploring the efficacy of pruning for model compression”
“Bayesian Sparsification of Recurrent Neural Networks”, Lobacheva et al 2017
“Structured Bayesian Pruning via LogNormal Multiplicative Noise”, Neklyudov et al 2017
“Structured Bayesian Pruning via LogNormal Multiplicative Noise”
“Exploring Sparsity in Recurrent Neural Networks”, Narang et al 2017
“Variational Dropout Sparsifies Deep Neural Networks”, Molchanov et al 2017
“Iterative Magnitude Pruning: Learning Both Weights and Connections for Efficient Neural Networks”, Han et al 2015
“Iterative Magnitude Pruning: Learning both Weights and Connections for Efficient Neural Networks”
“Flat Minima”, Hochreiter & Schmidhuber 1997
“Optimal Brain Surgeon and General Network Pruning”, Hassibi et al 1993
“Fault Tolerance of Pruned Multilayer Networks”, Segee & Carter 1991
“Using Relevance to Reduce Network Size Automatically”, Mozer & Smolensky 1989
“Optimal Brain Damage”, LeCun et al 1989
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & autolabeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearestneighbor annotations, creating a progression of topics. For more details, see the link.
subnetwork
sparsity
netpruning
neuralpruning
Wikipedia
Miscellaneous

/doc/ai/nn/sparsity/pruning/2020rosenfeldequation1functionalformofdlscalingpruninglaw.png

/doc/ai/nn/sparsity/pruning/2020rogerstable1bertcompression.png

https://cprimozic.net/blog/reverseengineeringasmallneuralnetwork/

https://magazine.sebastianraschka.com/p/practicaltipsforfinetuningllms

https://twitter.com/RamaswmySridhar/status/1621870497070981121
Link Bibliography

https://arxiv.org/abs/2310.06694
: “Sheared LLaMA: Accelerating Language Model Pretraining via Structured Pruning”, Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen 
https://arxiv.org/abs/2202.09844
: “Sparsity Winning Twice: Better Robust Generalization from More Efficient Training”, Tianlong Chen, Zhenyu Zhang, Pengjun Wang, Santosh Balachandra, Haoyu Ma, Zehao Wang, Zhangyang Wang 
https://arxiv.org/abs/2111.05754
: “Prune Once for All: Sparse PreTrained Language Models”, Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat 
https://arxiv.org/abs/2111.00160
: “DSEE: Dually Sparsityembedded Efficient Tuning of Pretrained Language Models”, Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallah 
https://arxiv.org/abs/2108.07686
: “Scaling Laws for Deep Learning”, Jonathan S. Rosenfeld 
https://arxiv.org/abs/2106.04533
: “Chasing Sparsity in Vision Transformers: An EndtoEnd Exploration”, Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang 
https://arxiv.org/abs/2006.10621
: “On the Predictability of Pruning Across Scales”, Jonathan S. Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit 
https://arxiv.org/abs/2004.03844
: “On the Effect of Dropping Layers of Pretrained Transformer Models”, Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov 
https://arxiv.org/abs/1903.01611
: “Stabilizing the Lottery Ticket Hypothesis”, Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin 
https://arxiv.org/abs/1902.09574
: “The State of Sparsity in Deep Neural Networks”, Trevor Gale, Erich Elsen, Sara Hooker 
https://arxiv.org/abs/1810.04622
: “A Closer Look at Structured Pruning for Neural Network Compression”, Elliot J. Crowley, Jack Turner, Amos Storkey, Michael O’Boyle 
https://arxiv.org/abs/1801.10447
: “Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Deepak Mittal, Shweta Bhardwaj, Mitesh M. Khapra, Balaraman Ravindran 
1993hassibi.pdf
: “Optimal Brain Surgeon and General Network Pruning”, Babak Hassibi, David G. Stork, Gregory J. Wolff