- See Also
-
Links
- “Convolutional Differentiable Logic Gate Networks”, Petersen et al 2024
- “LoRA vs Full Fine-Tuning: An Illusion of Equivalence”, Shuttleworth et al 2024
- “On the Complexity of Neural Computation in Superposition”, Adler & Shavit 2024
- “GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music”
- “CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models”, Lee et al 2024
- “Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?”, Jin et al 2024
- “ReFT: Representation Finetuning for Language Models”, Wu et al 2024
- “Mechanistic Design and Scaling of Hybrid Architectures”, Poli et al 2024
- “LTE: Training Neural Networks from Scratch With Parallel Low-Rank Adapters”, Huh et al 2024
- “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”
- “Exponentially Faster Language Modeling”, Belcak & Wattenhofer 2023
- “DiLoCo: Distributed Low-Communication Training of Language Models”, Douillard et al 2023
- “Language Models Are Super Mario (DARE): Absorbing Abilities from Homologous Models As a Free Lunch”, Yu et al 2023
- “ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models”, Luo et al 2023
- “The Impact of Depth and Width on Transformer Language Model Generalization”, Petty et al 2023
- “Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time”, Liu et al 2023
- “Fast Feedforward Networks”, Belcak & Wattenhofer 2023
- “Any Deep ReLU Network Is Shallow”, Villani & Schoots 2023
- “JaxPruner: A Concise Library for Sparsity Research”, Lee et al 2023
- “Reusing Deep Neural Network Models through Model Re-Engineering”, Qi et al 2023
- “MUX-PLMs: Pre-Training Language Models With Data Multiplexing”, Murahari et al 2023
- “DataMUX: Data Multiplexing for Neural Networks”, Murahari et al 2023
- “Deep Differentiable Logic Gate Networks”, Petersen et al 2022
- “The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers”, Li et al 2022
- “Noise Transforms Feed-Forward Networks into Sparse Coding Networks”, Anonymous 2022
- “Exploring Low Rank Training of Deep Neural Networks”, Kamalakara et al 2022
- “Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al 2022
- “More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, Liu et al 2022
- “Building Machine Translation Systems for the Next Thousand Languages”, Bapna et al 2022
- “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Dao et al 2022
- “Efficient Language Modeling With Sparse All-MLP”, Yu et al 2022
- “NeuPL: Neural Population Learning”, Liu et al 2022
- “Datamodels: Predicting Predictions from Training Data”, Ilyas et al 2022
- “Spiking Neural Networks and Their Applications: A Review”, Yamazaki et al 2022
- “Persia: An Open, Hybrid System Scaling Deep Learning-Based Recommenders up to 100 Trillion Parameters”, Lian et al 2021
- “EvilModel: Hiding Malware Inside of Neural Network Models”, Wang et al 2021
- “LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
- “On the Distribution, Sparsity, and Inference-Time Quantization of Attention Values in Transformers”, Ji et al 2021
- “The Neural Basis of Intelligence in Fine-Grained Cortical Topographies”, Feilong et al 2021
- “Clusterability in Neural Networks”, Filan et al 2021
- “Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks”, Hoefler et al 2021
- “Scaling down Deep Learning”, Greydanus 2020
- “Extreme Model Compression for On-Device Natural Language Understanding”, Sathyendra et al 2020
- “Training Independent Subnetworks for Robust Prediction”, Havasi et al 2020
- “EventProp: Event-Based Backpropagation Can Compute Exact Gradients for Spiking Neural Networks”, Wunderlich & Pehle 2020
- “On Linear Identifiability of Learned Representations”, Roeder et al 2020
- “Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, Maddox et al 2020
- “Bayesian Deep Learning and a Probabilistic Perspective of Generalization”, Wilson & Izmailov 2020
- “Neural Arithmetic Units”, Madsen & Johansen 2020
- “Linear Mode Connectivity and the Lottery Ticket Hypothesis”, Frankle et al 2019
- “Learning to Seek: Autonomous Source Seeking With Deep Reinforcement Learning Onboard a Nano Drone Microcontroller”, Duisterhof et al 2019
- “Does Learning Require Memorization? A Short Tale about a Long Tail”, Feldman 2019
- “Weight Agnostic Neural Networks”, Gaier & Ha 2019
- “StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-To-End Universal Style Transfer Networks”, An et al 2019
- “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Tan & Le 2019
- “Superposition of Many Models into One”, Cheung et al 2019
- “Playing Atari With Six Neurons”, Cuccu et al 2018
- “Measuring the Intrinsic Dimension of Objective Landscapes”, Li et al 2018
- “SqueezeNext: Hardware-Aware Neural Network Design”, Gholami et al 2018
- “Wide Compression: Tensor Ring Nets”, Wang et al 2018
- “Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing”, Rosenfeld & Tsotsos 2018
- “Fix Your Classifier: the Marginal Value of Training the Last Weight Layer”, Hoffer et al 2018
- “Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition”, Ye et al 2017
- “3D Semantic Segmentation With Submanifold Sparse Convolutional Networks”, Graham et al 2017
- “XUnit: Learning a Spatial Activation Function for Efficient Image Restoration”, Kligvasser et al 2017
- “Natural Language Processing With Small Feed-Forward Networks”, Botha et al 2017
- “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”, Zhang et al 2017
- “Submanifold Sparse Convolutional Networks”, Graham & Maaten 2017
- “Shake-Shake Regularization of 3-Branch Residual Networks”, Gastaldi 2017
- “Using the Output Embedding to Improve Language Models”, Press & Wolf 2016
- “Deep Residual Learning for Image Recognition”, He et al 2015
- “Tensorizing Neural Networks”, Novikov et al 2015
- “Eight Pairs of Descending Visual Neurons in the Dragonfly Give Wing Motor Centers Accurate Population Vector of Prey Direction”, Gonzalez-Bellido et al 2013
- “The Cat Is out of the Bag: Cortical Simulations With 109 Neurons, 1013 Synapses”, Ananthanarayanan et al 2009
- “On the Computational Power of Threshold Circuits With Sparse Activity”, Uchizawa et al 2006
- “Networks of Spiking Neurons: The Third Generation of Neural Network Models”, Maass 1997
- “Characteristics of Sparsely Encoded Associative Memory”, Amari 1989
- “[2110.08152] Kronecker Decomposition for GPT Compression”
- “Higher Accuracy on Vision Models With EfficientNet-Lite”
- “Something Weird Is Happening With LLMs and Chess”, Dynomight 2024
- “Delivering Real-Time AI in the Palm of Your Hand”
- “Sparsity-Aware Deep Learning Inference Runtime for CPUs”
- “Neuralmagic/sparseml: Libraries for Applying Sparsification Recipes to Neural Networks With a Few Lines of Code, Enabling Faster and Smaller Models”
- “An Estimation of the Absolute Number of Axons Indicates That Human Cortical Areas Are Sparsely Connected”
- “Creating a 17 KB Style Transfer Model With Layer Pruning and Quantization”, Toole 2024
- “BERT-Large: Prune Once for DistilBERT Inference Performance”
- “Circuits in Superposition: Compressing Many Small Neural Networks into One”
- “Measuring the Intrinsic Dimension of Objective Landscapes [Video]”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Links
“Convolutional Differentiable Logic Gate Networks”, Petersen et al 2024
“LoRA vs Full Fine-Tuning: An Illusion of Equivalence”, Shuttleworth et al 2024
“On the Complexity of Neural Computation in Superposition”, Adler & Shavit 2024
“GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music”
GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music
“CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models”, Lee et al 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
“Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?”, Jin et al 2024
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
“ReFT: Representation Finetuning for Language Models”, Wu et al 2024
“Mechanistic Design and Scaling of Hybrid Architectures”, Poli et al 2024
“LTE: Training Neural Networks from Scratch With Parallel Low-Rank Adapters”, Huh et al 2024
LTE: Training Neural Networks from Scratch with Parallel Low-Rank Adapters
“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
“Exponentially Faster Language Modeling”, Belcak & Wattenhofer 2023
“DiLoCo: Distributed Low-Communication Training of Language Models”, Douillard et al 2023
DiLoCo: Distributed Low-Communication Training of Language Models
“Language Models Are Super Mario (DARE): Absorbing Abilities from Homologous Models As a Free Lunch”, Yu et al 2023
Language Models are Super Mario (DARE): Absorbing Abilities from Homologous Models as a Free Lunch
“ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models”, Luo et al 2023
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
“The Impact of Depth and Width on Transformer Language Model Generalization”, Petty et al 2023
The Impact of Depth and Width on Transformer Language Model Generalization
“Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time”, Liu et al 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
“Fast Feedforward Networks”, Belcak & Wattenhofer 2023
“Any Deep ReLU Network Is Shallow”, Villani & Schoots 2023
“JaxPruner: A Concise Library for Sparsity Research”, Lee et al 2023
“Reusing Deep Neural Network Models through Model Re-Engineering”, Qi et al 2023
Reusing Deep Neural Network Models through Model Re-engineering
“MUX-PLMs: Pre-Training Language Models With Data Multiplexing”, Murahari et al 2023
MUX-PLMs: Pre-training Language Models with Data Multiplexing
“DataMUX: Data Multiplexing for Neural Networks”, Murahari et al 2023
“Deep Differentiable Logic Gate Networks”, Petersen et al 2022
“The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers”, Li et al 2022
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
“Noise Transforms Feed-Forward Networks into Sparse Coding Networks”, Anonymous 2022
Noise Transforms Feed-Forward Networks into Sparse Coding Networks
“Exploring Low Rank Training of Deep Neural Networks”, Kamalakara et al 2022
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Liu et al 2022
Monolith: Real Time Recommendation System With Collisionless Embedding Table
“More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, Liu et al 2022
More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity (SLaK)
“Building Machine Translation Systems for the Next Thousand Languages”, Bapna et al 2022
Building Machine Translation Systems for the Next Thousand Languages
“Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, Dao et al 2022
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
“Efficient Language Modeling With Sparse All-MLP”, Yu et al 2022
“NeuPL: Neural Population Learning”, Liu et al 2022
“Datamodels: Predicting Predictions from Training Data”, Ilyas et al 2022
“Spiking Neural Networks and Their Applications: A Review”, Yamazaki et al 2022
“Persia: An Open, Hybrid System Scaling Deep Learning-Based Recommenders up to 100 Trillion Parameters”, Lian et al 2021
“EvilModel: Hiding Malware Inside of Neural Network Models”, Wang et al 2021
“LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
“On the Distribution, Sparsity, and Inference-Time Quantization of Attention Values in Transformers”, Ji et al 2021
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
“The Neural Basis of Intelligence in Fine-Grained Cortical Topographies”, Feilong et al 2021
The neural basis of intelligence in fine-grained cortical topographies
“Clusterability in Neural Networks”, Filan et al 2021
“Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks”, Hoefler et al 2021
“Scaling down Deep Learning”, Greydanus 2020
“Extreme Model Compression for On-Device Natural Language Understanding”, Sathyendra et al 2020
Extreme Model Compression for On-device Natural Language Understanding
“Training Independent Subnetworks for Robust Prediction”, Havasi et al 2020
“EventProp: Event-Based Backpropagation Can Compute Exact Gradients for Spiking Neural Networks”, Wunderlich & Pehle 2020
EventProp: Event-Based Backpropagation can compute Exact Gradients for Spiking Neural Networks
“On Linear Identifiability of Learned Representations”, Roeder et al 2020
“Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, Maddox et al 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
“Bayesian Deep Learning and a Probabilistic Perspective of Generalization”, Wilson & Izmailov 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
“Neural Arithmetic Units”, Madsen & Johansen 2020
“Linear Mode Connectivity and the Lottery Ticket Hypothesis”, Frankle et al 2019
“Learning to Seek: Autonomous Source Seeking With Deep Reinforcement Learning Onboard a Nano Drone Microcontroller”, Duisterhof et al 2019
“Does Learning Require Memorization? A Short Tale about a Long Tail”, Feldman 2019
Does Learning Require Memorization? A Short Tale about a Long Tail
“Weight Agnostic Neural Networks”, Gaier & Ha 2019
“StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-To-End Universal Style Transfer Networks”, An et al 2019
“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, Tan & Le 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
“Superposition of Many Models into One”, Cheung et al 2019
“Playing Atari With Six Neurons”, Cuccu et al 2018
“Measuring the Intrinsic Dimension of Objective Landscapes”, Li et al 2018
“SqueezeNext: Hardware-Aware Neural Network Design”, Gholami et al 2018
“Wide Compression: Tensor Ring Nets”, Wang et al 2018
“Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing”, Rosenfeld & Tsotsos 2018
Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing
“Fix Your Classifier: the Marginal Value of Training the Last Weight Layer”, Hoffer et al 2018
Fix your classifier: the marginal value of training the last weight layer
“Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition”, Ye et al 2017
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
“3D Semantic Segmentation With Submanifold Sparse Convolutional Networks”, Graham et al 2017
3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
“XUnit: Learning a Spatial Activation Function for Efficient Image Restoration”, Kligvasser et al 2017
xUnit: Learning a Spatial Activation Function for Efficient Image Restoration
“Natural Language Processing With Small Feed-Forward Networks”, Botha et al 2017
Natural Language Processing with Small Feed-Forward Networks
“ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices”, Zhang et al 2017
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
“Submanifold Sparse Convolutional Networks”, Graham & Maaten 2017
“Shake-Shake Regularization of 3-Branch Residual Networks”, Gastaldi 2017
“Using the Output Embedding to Improve Language Models”, Press & Wolf 2016
“Deep Residual Learning for Image Recognition”, He et al 2015
“Tensorizing Neural Networks”, Novikov et al 2015
“Eight Pairs of Descending Visual Neurons in the Dragonfly Give Wing Motor Centers Accurate Population Vector of Prey Direction”, Gonzalez-Bellido et al 2013
“The Cat Is out of the Bag: Cortical Simulations With 109 Neurons, 1013 Synapses”, Ananthanarayanan et al 2009
The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses
“On the Computational Power of Threshold Circuits With Sparse Activity”, Uchizawa et al 2006
On the Computational Power of Threshold Circuits with Sparse Activity
“Networks of Spiking Neurons: The Third Generation of Neural Network Models”, Maass 1997
Networks of spiking neurons: The third generation of neural network models
“Characteristics of Sparsely Encoded Associative Memory”, Amari 1989
“[2110.08152] Kronecker Decomposition for GPT Compression”
“Higher Accuracy on Vision Models With EfficientNet-Lite”
“Something Weird Is Happening With LLMs and Chess”, Dynomight 2024
“Delivering Real-Time AI in the Palm of Your Hand”
“Sparsity-Aware Deep Learning Inference Runtime for CPUs”
“Neuralmagic/sparseml: Libraries for Applying Sparsification Recipes to Neural Networks With a Few Lines of Code, Enabling Faster and Smaller Models”
“An Estimation of the Absolute Number of Axons Indicates That Human Cortical Areas Are Sparsely Connected”
“Creating a 17 KB Style Transfer Model With Layer Pruning and Quantization”, Toole 2024
Creating a 17 KB style transfer model with layer pruning and quantization
“BERT-Large: Prune Once for DistilBERT Inference Performance”
BERT-Large: Prune Once for DistilBERT Inference Performance:
“Circuits in Superposition: Compressing Many Small Neural Networks into One”
Circuits in Superposition: Compressing many small neural networks into one:
“Measuring the Intrinsic Dimension of Objective Landscapes [Video]”
Measuring the Intrinsic Dimension of Objective Landscapes [video]:
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
malware-hiding
depth-width
low-rank-methods
Wikipedia
Miscellaneous
-
/doc/ai/nn/sparsity/2018-cheng.pdf
:View PDF:
-
/doc/ai/nn/cnn/2017-rawat.pdf
:View PDF:
-
https://ai.facebook.com/blog/a-highly-efficient-real-time-text-to-speech-system-deployed-on-cpus/
-
https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-daily-requests-cpus/
: -
https://cprimozic.net/blog/growing-sparse-computational-graphs-with-rnns/
: -
https://research.google/blog/an-all-neural-on-device-speech-recognizer/
-
https://research.google/blog/auto-generated-summaries-in-google-docs/
-
https://research.google/blog/custom-on-device-ml-models-with-learn2compress/
-
https://research.google/blog/efficient-sequence-modeling-for-on-device-ml/
-
https://research.google/blog/grammar-correction-as-you-type-on-pixel-6/
-
https://tech.pic-collage.com/distillation-of-clip-model-and-other-experiments-f8394b7321ce
: -
https://www.quantamagazine.org/sparse-neural-networks-point-physicists-to-useful-data-20230608/
: -
https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/
Bibliography
-
https://arxiv.org/abs/2403.17844
: “Mechanistic Design and Scaling of Hybrid Architectures”, -
https://arxiv.org/abs/2311.10770
: “Exponentially Faster Language Modeling”, -
https://arxiv.org/abs/2310.17157
: “Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time”, -
https://arxiv.org/abs/2308.14711
: “Fast Feedforward Networks”, -
https://arxiv.org/abs/2302.12441
: “MUX-PLMs: Pre-Training Language Models With Data Multiplexing”, -
https://arxiv.org/abs/2210.06313#google
: “The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers”, -
https://arxiv.org/abs/2207.03620
: “More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK)”, -
https://arxiv.org/abs/2205.03983#google
: “Building Machine Translation Systems for the Next Thousand Languages”, -
https://arxiv.org/abs/2204.00595
: “Monarch: Expressive Structured Matrices for Efficient and Accurate Training”, -
https://arxiv.org/abs/2203.06850
: “Efficient Language Modeling With Sparse All-MLP”, -
https://arxiv.org/abs/2202.07415#deepmind
: “NeuPL: Neural Population Learning”, -
https://arxiv.org/abs/2106.09685#microsoft
: “LoRA: Low-Rank Adaptation of Large Language Models”, -
https://greydanus.github.io/2020/12/01/scaling-down/
: “Scaling down Deep Learning”, -
https://arxiv.org/abs/1905.11946#google
: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, -
https://arxiv.org/abs/1803.10615
: “SqueezeNext: Hardware-Aware Neural Network Design”,