 See Also

Links
 “Any Deep ReLU Network Is Shallow”, Villani & Schoots 2023
 “JaxPruner: A Concise Library for Sparsity Research”, Lee et al 2023
 “Reusing Deep Neural Network Models through Model Reengineering”, Qi et al 2023
 “MUXPLMs: Pretraining Language Models With Data Multiplexing”, Murahari et al 2023
 “DataMUX: Data Multiplexing for Neural Networks”, Murahari et al 2023
 “Noise Transforms FeedForward Networks into Sparse Coding Networks”, Anonymous 2022
 “Exploring Low Rank Training of Deep Neural Networks”, Kamalakara et al 2022
 “Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al 2022
 “More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, Liu et al 2022
 “Building Machine Translation Systems for the Next Thousand Languages”, Bapna et al 2022
 “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Dao et al 2022
 “NeuPL: Neural Population Learning”, Liu et al 2022
 “Datamodels: Predicting Predictions from Training Data”, Ilyas et al 2022
 “Spiking Neural Networks and Their Applications: A Review”, Yamazaki et al 2022
 “Persia: An Open, Hybrid System Scaling Deep Learningbased Recommenders up to 100 Trillion Parameters”, Lian et al 2021
 “EvilModel: Hiding Malware Inside of Neural Network Models”, Wang et al 2021
 “LoRA: LowRank Adaptation of Large Language Models”, Hu et al 2021
 “Clusterability in Neural Networks”, Filan et al 2021
 “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks”, Hoefler et al 2021
 “Scaling down Deep Learning”, Greydanus 2020
 “Extreme Model Compression for Ondevice Natural Language Understanding”, Sathyendra et al 2020
 “Neural Arithmetic Units”, Madsen & Johansen 2020
 “Learning to Seek: Autonomous Source Seeking With Deep Reinforcement Learning Onboard a Nano Drone Microcontroller”, Duisterhof et al 2019
 “Weight Agnostic Neural Networks”, Gaier & Ha 2019
 “StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast EndtoEnd Universal Style Transfer Networks”, An et al 2019
 “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Tan & Le 2019
 “Superposition of Many Models into One”, Cheung et al 2019
 “Playing Atari With Six Neurons”, Cuccu et al 2018
 “Measuring the Intrinsic Dimension of Objective Landscapes”, Li et al 2018
 “SqueezeNext: HardwareAware Neural Network Design”, Gholami et al 2018
 “Wide Compression: Tensor Ring Nets”, Wang et al 2018
 “Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing”, Rosenfeld & Tsotsos 2018
 “Fix Your Classifier: the Marginal Value of Training the Last Weight Layer”, Hoffer et al 2018
 “Learning Compact Recurrent Neural Networks With BlockTerm Tensor Decomposition”, Ye et al 2017
 “3D Semantic Segmentation With Submanifold Sparse Convolutional Networks”, Graham et al 2017
 “XUnit: Learning a Spatial Activation Function for Efficient Image Restoration”, Kligvasser et al 2017
 “Natural Language Processing With Small FeedForward Networks”, Botha et al 2017
 “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”, Zhang et al 2017
 “Submanifold Sparse Convolutional Networks”, Graham & Maaten 2017
 “ShakeShake Regularization of 3branch Residual Networks”, Gastaldi 2017
 “Deep Residual Learning for Image Recognition”, He et al 2015
 “Tensorizing Neural Networks”, Novikov et al 2015
 “Eight Pairs of Descending Visual Neurons in the Dragonfly Give Wing Motor Centers Accurate Population Vector of Prey Direction”, GonzalezBellido et al 2013
 “The Cat Is out of the Bag: Cortical Simulations With 10^{9} Neurons, 10^{13} Synapses”, Ananthanarayanan et al 2009
 “On the Computational Power of Threshold Circuits With Sparse Activity”, Uchizawa et al 2006
 “Networks of Spiking Neurons: The Third Generation of Neural Network Models”, Maass 1997
 “Delivering Realtime AI in the Palm of Your Hand”
 Sort By Magic
 Wikipedia
 Miscellaneous
 Link Bibliography
Neural nets are extremely ‘overparameterized’ in the sense that they have orders of magnitude more parameters than necessary to solve the problems they are trained on, as can be proven by the regular improvements in training smaller/faster but still performant networks but also in directly creating smaller neural nets with similar or identical performance on those problems. Major techniques are: deleting parameters (pruning)/reducing precision of the numeric encoding (quantizing)/training a smaller network from scratch using the original large network somehow (distillation).
Mysteriously, these smaller networks typically cannot be trained from scratch; performance gains can be obtained without the original data; models can be trained to imitate themselves in selfdistillation; despite this indicating overfitting ought to be a major concern, they generalize well; and many of these smaller networks are in some sense already present in the original neural network. This is frequently taken to indicate some sort of blessing of scale in large NNs having smoother loss landscapes, which simple optimizers can successfully traverse to good optima no matter how hard the problem, as compared to smaller networks which may wind up ‘trapped’ at a bad place with no free parameters to let it slip around obstacles and find some way to improve (much less the loss landscape of equivalently powerful but extremely brittle encodings such as Brainf—k or assembler programs). As well as their great theoretical interest—How can we train these small models directly? What does this tell us about how NNs work?—such smaller NNs are critical to practical realworld deployment to servers & smartphones at scale, the design of accelerator hardware supporting reducedprecision operations, and also are an interesting case of capability growth for AI risk: as soon as any NN exists which can achieve performance goal X, it is likely that a much more efficient NN (potentially orders of magnitude smaller or faster) can be created to achieve X thereafter. (These are merely one way that your software can be much faster.)
Some examples of NNs being compressed in size or FLOPs by anywhere from 50% to ~17,000% (an incomplete bibliography, merely papers I have noted during my reading) below.
See Also
Links
“Any Deep ReLU Network Is Shallow”, Villani & Schoots 2023
“JaxPruner: A Concise Library for Sparsity Research”, Lee et al 2023
“Reusing Deep Neural Network Models through Model Reengineering”, Qi et al 2023
“Reusing Deep Neural Network Models through Model Reengineering”
“MUXPLMs: Pretraining Language Models With Data Multiplexing”, Murahari et al 2023
“MUXPLMs: Pretraining Language Models with Data Multiplexing”
“DataMUX: Data Multiplexing for Neural Networks”, Murahari et al 2023
“Noise Transforms FeedForward Networks into Sparse Coding Networks”, Anonymous 2022
“Noise Transforms FeedForward Networks into Sparse Coding Networks”
“Exploring Low Rank Training of Deep Neural Networks”, Kamalakara et al 2022
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al 2022
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”
“More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, Liu et al 2022
“More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity (SLaK)”
“Building Machine Translation Systems for the Next Thousand Languages”, Bapna et al 2022
“Building Machine Translation Systems for the Next Thousand Languages”
“Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Dao et al 2022
“Monarch: Expressive Structured Matrices for Efficient and Accurate Training”
“NeuPL: Neural Population Learning”, Liu et al 2022
“Datamodels: Predicting Predictions from Training Data”, Ilyas et al 2022
“Spiking Neural Networks and Their Applications: A Review”, Yamazaki et al 2022
“Persia: An Open, Hybrid System Scaling Deep Learningbased Recommenders up to 100 Trillion Parameters”, Lian et al 2021
“EvilModel: Hiding Malware Inside of Neural Network Models”, Wang et al 2021
“LoRA: LowRank Adaptation of Large Language Models”, Hu et al 2021
“Clusterability in Neural Networks”, Filan et al 2021
“Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks”, Hoefler et al 2021
“Scaling down Deep Learning”, Greydanus 2020
“Extreme Model Compression for Ondevice Natural Language Understanding”, Sathyendra et al 2020
“Extreme Model Compression for Ondevice Natural Language Understanding”
“Neural Arithmetic Units”, Madsen & Johansen 2020
“Learning to Seek: Autonomous Source Seeking With Deep Reinforcement Learning Onboard a Nano Drone Microcontroller”, Duisterhof et al 2019
“Weight Agnostic Neural Networks”, Gaier & Ha 2019
“StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast EndtoEnd Universal Style Transfer Networks”, An et al 2019
“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Tan & Le 2019
“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”
“Superposition of Many Models into One”, Cheung et al 2019
“Playing Atari With Six Neurons”, Cuccu et al 2018
“Measuring the Intrinsic Dimension of Objective Landscapes”, Li et al 2018
“SqueezeNext: HardwareAware Neural Network Design”, Gholami et al 2018
“Wide Compression: Tensor Ring Nets”, Wang et al 2018
“Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing”, Rosenfeld & Tsotsos 2018
“Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing”
“Fix Your Classifier: the Marginal Value of Training the Last Weight Layer”, Hoffer et al 2018
“Fix your classifier: the marginal value of training the last weight layer”
“Learning Compact Recurrent Neural Networks With BlockTerm Tensor Decomposition”, Ye et al 2017
“Learning Compact Recurrent Neural Networks with BlockTerm Tensor Decomposition”
“3D Semantic Segmentation With Submanifold Sparse Convolutional Networks”, Graham et al 2017
“3D Semantic Segmentation with Submanifold Sparse Convolutional Networks”
“XUnit: Learning a Spatial Activation Function for Efficient Image Restoration”, Kligvasser et al 2017
“xUnit: Learning a Spatial Activation Function for Efficient Image Restoration”
“Natural Language Processing With Small FeedForward Networks”, Botha et al 2017
“Natural Language Processing with Small FeedForward Networks”
“ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”, Zhang et al 2017
“ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”
“Submanifold Sparse Convolutional Networks”, Graham & Maaten 2017
“ShakeShake Regularization of 3branch Residual Networks”, Gastaldi 2017
“Deep Residual Learning for Image Recognition”, He et al 2015
“Tensorizing Neural Networks”, Novikov et al 2015
“Eight Pairs of Descending Visual Neurons in the Dragonfly Give Wing Motor Centers Accurate Population Vector of Prey Direction”, GonzalezBellido et al 2013
“The Cat Is out of the Bag: Cortical Simulations With 10^{9} Neurons, 10^{13} Synapses”, Ananthanarayanan et al 2009
“The cat is out of the bag: cortical simulations with 10^{9} neurons, 10^{13} synapses”
“On the Computational Power of Threshold Circuits With Sparse Activity”, Uchizawa et al 2006
“On the Computational Power of Threshold Circuits with Sparse Activity”
“Networks of Spiking Neurons: The Third Generation of Neural Network Models”, Maass 1997
“Networks of spiking neurons: The third generation of neural network models”
“Delivering Realtime AI in the Palm of Your Hand”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & autolabeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearestneighbor annotations, creating a progression of topics. For more details, see the link.
sparsitycompression
semantic
compression
Wikipedia
Miscellaneous

https://ai.facebook.com/blog/ahighlyefficientrealtimetexttospeechsystemdeployedoncpus/

https://blog.research.google/2018/05/customondevicemlmodels.html

https://blog.research.google/2019/03/anallneuralondevicespeech.html

https://blog.research.google/2021/10/grammarcorrectionasyoutypeonpixel.html

https://blog.research.google/2021/12/trainingmachinelearningmodelsmore.html

https://blog.research.google/2022/03/autogeneratedsummariesingoogledocs.html

https://blog.research.google/2022/08/efficientsequencemodelingforon.html

https://blog.roblox.com/2020/05/scaledbertserve1billiondailyrequestscpus/

https://blog.tensorflow.org/2020/03/higheraccuracyonvisionmodelswithefficientnetlite.html

https://cprimozic.net/blog/growingsparsecomputationalgraphswithrnns/

https://neuralmagic.com/blog/bertlargepruneoncefordistilbertinferenceperformance/

https://tech.piccollage.com/distillationofclipmodelandotherexperimentsf8394b7321ce

https://www.quantamagazine.org/sparseneuralnetworkspointphysiciststousefuldata20230608/
Link Bibliography

https://arxiv.org/abs/2302.12441
: “MUXPLMs: Pretraining Language Models With Data Multiplexing”, Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, Karthik Narasimhan 
https://arxiv.org/abs/2207.03620
: “More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, 
https://arxiv.org/abs/2205.03983#google
: “Building Machine Translation Systems for the Next Thousand Languages”, 
https://arxiv.org/abs/2204.00595
: “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, 
https://arxiv.org/abs/2202.07415#deepmind
: “NeuPL: Neural Population Learning”, Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel 
https://arxiv.org/abs/2106.09685
: “LoRA: LowRank Adaptation of Large Language Models”, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan AllenZhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen 
https://greydanus.github.io/2020/12/01/scalingdown/
: “Scaling down Deep Learning”, Sam Greydanus 
https://arxiv.org/abs/1905.11946#google
: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Mingxing Tan, Quoc V. Le