2:4 Sparse Llama: Smaller Models for Efficient GPU Inference
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Streamlining Redundant Layers to Compress Large Language Models
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Weight subcloning: direct initialization of transformers using larger pretrained ones
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks
Heavy-tailed neuronal connectivity arises from Hebbian self–organization
PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression
The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks
Data-Efficient Structured Pruning via Submodular Optimization
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training
How many degrees of freedom do we need to train deep networks: a loss landscape perspective
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
Sifting out the features by pruning: Are convolutional networks the winning lottery ticket of fully connected ones?
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
Postnatal connectomic development of inhibition in mouse barrel cortex
ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution
Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup
Pruning Neural Networks at Initialization: Why are We Missing the Mark?
Progressive Skeletonization: Trimming more fat from a network at initialization
Pruning neural networks without any data by iteratively conserving synaptic flow
On the Effect of Dropping Layers of Pre-trained Transformer Models
Train-by-Reconnect: Decoupling Locations of Weights from their Values (LaPerm)
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Sparse Networks from Scratch: Faster Training without Losing Performance
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
Differential Contribution of Cortical Thickness, Surface Area, and Gyrification to Fluid and Crystallized Intelligence
A Closer Look at Structured Pruning for Neural Network Compression
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
Learning to Prune Filters in Convolutional Neural Networks
Faster gaze prediction with dense networks and Fisher pruning
Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method
NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm
To prune, or not to prune: exploring the efficacy of pruning for model compression
Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Iterative Magnitude Pruning: Learning both Weights and Connections for Efficient Neural Networks
2024-chang-figure3-lotteryticketsemergeearlyintrainingandthengetupweighted.jpg
2020-rosenfeld-equation1-functionalformofdlscalingpruninglaw.png
2020-rosenfeld-figure1-relationshipbetweenpruningsparsificationandclassificationerrorincifar10cnnresnets.jpg
2020-rosenfeld-figure2-extrapolatedvsactualrelationshipbetweenpruningsparsificationandclassificationerrorincifar10cnnresnets.png
2020-rosenfeld-figure8-sweepingwidthparametercountofcifar10resnettofindoptimallylargemodelforbestpossibleprunedmodel.jpg
https://cprimozic.net/blog/reverse-engineering-a-small-neural-network/
https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
https%253A%252F%252Farxiv.org%252Fabs%252F2401.15024%2523microsoft.html
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Pruning Neural Networks at Initialization: Why are We Missing the Mark?
Jonathan Frankle—Chief Neural Network Scientist at Databricks
Jonathan Frankle—Chief Neural Network Scientist at Databricks
Pruning neural networks without any data by iteratively conserving synaptic flow
On the Effect of Dropping Layers of Pre-trained Transformer Models
Jonathan Frankle—Chief Neural Network Scientist at Databricks
A Closer Look at Structured Pruning for Neural Network Compression
Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks
%252Fdoc%252Fai%252Fnn%252Fsparsity%252Fpruning%252F1993-hassibi.pdf.html
Wikipedia Bibliography: