‘NN sparsity’ directory

⁠[page summary]

See Also

Links

“Tina: Tiny Reasoning Models via LoRA ”, Wang et al 2025

⁠Tina: Tiny Reasoning Models via LoRA⁠

“Convolutional Differentiable Logic Gate Networks ”, Petersen et al 2024

Convolutional Differentiable Logic Gate Networks⁠

“LoRA vs Full Fine-Tuning: An Illusion of Equivalence ”, Shuttleworth et al 2024

LoRA vs Full Fine-tuning: An Illusion of Equivalence⁠

“On the Complexity of Neural Computation in Superposition ”, Adler & Shavit 2024

On the Complexity of Neural Computation in Superposition⁠

“GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music ”

⁠GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music

“High-Performance Deep Spiking Neural Networks With 0.3 Spikes per Neuron ”, Stanojevic et al 2024

High-performance deep spiking neural networks with 0.3 spikes per neuron⁠

“LoRA Learns Less and Forgets Less ”, Biderman et al 2024

LoRA Learns Less and Forgets Less⁠

“CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models ”, Lee et al 2024

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models⁠

“Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? ”, Jin et al 2024

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?⁠

“ReFT: Representation Finetuning for Language Models ”, Wu et al 2024

ReFT: Representation Finetuning for Language Models⁠

“Mechanistic Design and Scaling of Hybrid Architectures ”, Poli et al 2024

Mechanistic Design and Scaling of Hybrid Architectures⁠

“LTE: Training Neural Networks from Scratch With Parallel Low-Rank Adapters ”, Huh et al 2024

LTE: Training Neural Networks from Scratch with Parallel Low-Rank Adapters⁠

“Scaling Laws for Fine-Grained Mixture of Experts ”, Krajewski et al 2024

Scaling Laws for Fine-Grained Mixture of Experts⁠

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet ”

⁠Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet⁠

“Exponentially Faster Language Modeling ”, Belcak & Wattenhofer 2023

Exponentially Faster Language Modeling⁠

“DiLoCo: Distributed Low-Communication Training of Language Models ”, Douillard et al 2023

DiLoCo: Distributed Low-Communication Training of Language Models⁠

“Language Models Are Super Mario (DARE): Absorbing Abilities from Homologous Models As a Free Lunch ”, Yu et al 2023

Language Models are Super Mario (DARE): Absorbing Abilities from Homologous Models as a Free Lunch⁠

“ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models ”, Luo et al 2023

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models⁠

“An Exact Mapping from ReLU Networks to Spiking Neural Networks ”, Stanojevic et al 2023

An exact mapping from ReLU networks to spiking neural networks⁠

“The Impact of Depth and Width on Transformer Language Model Generalization ”, Petty et al 2023

The Impact of Depth and Width on Transformer Language Model Generalization⁠

“Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time ”, Liu et al 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time⁠

“Fast Feedforward Networks ”, Belcak & Wattenhofer 2023

Fast Feedforward Networks⁠

“Any Deep ReLU Network Is Shallow ”, Villani & Schoots 2023

Any Deep ReLU Network is Shallow⁠

“Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning [Updated] ”, Lie 2023

Cerebras Architecture Deep Dive: First Look Inside the HW/SW Co-Design for Deep Learning [Updated]⁠

“JaxPruner: A Concise Library for Sparsity Research ”, Lee et al 2023

JaxPruner: A concise library for sparsity research⁠

“Reusing Deep Neural Network Models through Model Re-Engineering ”, Qi et al 2023

Reusing Deep Neural Network Models through Model Re-engineering⁠

“Accelerating Large GPT Training With Sparse Pre-Training and Dense Fine-Tuning ”, Thangarasa 2023

Accelerating Large GPT Training with Sparse Pre-Training and Dense Fine-Tuning⁠

“MUX-PLMs: Pre-Training Language Models With Data Multiplexing ”, Murahari et al 2023

MUX-PLMs: Pre-training Language Models with Data Multiplexing⁠

“DataMUX: Data Multiplexing for Neural Networks ”, Murahari et al 2023

DataMUX: Data Multiplexing for Neural Networks⁠

“Deep Differentiable Logic Gate Networks ”, Petersen et al 2022

Deep Differentiable Logic Gate Networks⁠

“The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers ”, Li et al 2022

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers⁠

“Neural Net Sparsity ”, Gwern 2022

⁠Neural Net Sparsity⁠

“Noise Transforms Feed-Forward Networks into Sparse Coding Networks ”, Anonymous 2022

Noise Transforms Feed-Forward Networks into Sparse Coding Networks⁠

“Exploring Low Rank Training of Deep Neural Networks ”, Kamalakara et al 2022

Exploring Low Rank Training of Deep Neural Networks⁠

“Monolith: Real Time Recommendation System With Collisionless Embedding Table ”, Liu et al 2022

Monolith: Real Time Recommendation System With Collisionless Embedding Table⁠

“More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK) ”, Liu et al 2022

More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity (SLaK)⁠

“Building Machine Translation Systems for the Next Thousand Languages ”, Bapna et al 2022

Building Machine Translation Systems for the Next Thousand Languages⁠

“Monarch: Expressive Structured Matrices for Efficient and Accurate Training ”, Dao et al 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training⁠

“Efficient Language Modeling With Sparse All-MLP ”, Yu et al 2022

Efficient Language Modeling with Sparse All-MLP⁠

“NeuPL: Neural Population Learning ”, Liu et al 2022

NeuPL: Neural Population Learning⁠

“Datamodels: Predicting Predictions from Training Data ”, Ilyas et al 2022

Datamodels: Predicting Predictions from Training Data⁠

“Spiking Neural Networks and Their Applications: A Review ”, Yamazaki et al 2022

Spiking Neural Networks and Their Applications: A Review⁠

“Persia: An Open, Hybrid System Scaling Deep Learning-Based Recommenders up to 100 Trillion Parameters ”, Lian et al 2021

Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters⁠

“EvilModel: Hiding Malware Inside of Neural Network Models ”, Wang et al 2021

EvilModel: Hiding Malware Inside of Neural Network Models⁠

“LoRA: Low-Rank Adaptation of Large Language Models ”, Hu et al 2021

LoRA: Low-Rank Adaptation of Large Language Models⁠

“On the Distribution, Sparsity, and Inference-Time Quantization of Attention Values in Transformers ”, Ji et al 2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers⁠

“The Neural Basis of Intelligence in Fine-Grained Cortical Topographies ”, Feilong et al 2021

The neural basis of intelligence in fine-grained cortical topographies⁠

“Clusterability in Neural Networks ”, Filan et al 2021

Clusterability in Neural Networks⁠

“Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks ”, Hoefler et al 2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks⁠

“Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning ”, Aghajanyan et al 2020

⁠Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning⁠

“Scaling down Deep Learning ”, Greydanus 2020

Scaling down Deep Learning

“Extreme Model Compression for On-Device Natural Language Understanding ”, Sathyendra et al 2020

Extreme Model Compression for On-device Natural Language Understanding⁠

“Training Independent Subnetworks for Robust Prediction ”, Havasi et al 2020

Training independent subnetworks for robust prediction⁠

“EventProp: Event-Based Backpropagation Can Compute Exact Gradients for Spiking Neural Networks ”, Wunderlich & Pehle 2020

EventProp: Event-Based Backpropagation can compute Exact Gradients for Spiking Neural Networks⁠

“On Linear Identifiability of Learned Representations ”, Roeder et al 2020

On Linear Identifiability of Learned Representations⁠

“Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited ”, Maddox et al 2020

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited⁠

“Bayesian Deep Learning and a Probabilistic Perspective of Generalization ”, Wilson & Izmailov 2020

Bayesian Deep Learning and a Probabilistic Perspective of Generalization⁠

“Neural Arithmetic Units ”, Madsen & Johansen 2020

Neural Arithmetic Units⁠

“Linear Mode Connectivity and the Lottery Ticket Hypothesis ”, Frankle et al 2019

Linear Mode Connectivity and the Lottery Ticket Hypothesis⁠

“Learning to Seek: Autonomous Source Seeking With Deep Reinforcement Learning Onboard a Nano Drone Microcontroller ”, Duisterhof et al 2019

Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller⁠

“Does Learning Require Memorization? A Short Tale about a Long Tail ”, Feldman 2019

Does Learning Require Memorization? A Short Tale about a Long Tail⁠

“Weight Agnostic Neural Networks ”, Gaier & Ha 2019

Weight Agnostic Neural Networks⁠

“StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-To-End Universal Style Transfer Networks ”, An et al 2019

StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-to-End Universal Style Transfer Networks⁠

“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ”, Tan & Le 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks⁠

“Superposition of Many Models into One ”, Cheung et al 2019

Superposition of many models into one⁠

“Playing Atari With Six Neurons ”, Cuccu et al 2018

Playing Atari with Six Neurons⁠

“Measuring the Intrinsic Dimension of Objective Landscapes ”, Li et al 2018

Measuring the Intrinsic Dimension of Objective Landscapes⁠

“SqueezeNext: Hardware-Aware Neural Network Design ”, Gholami et al 2018

SqueezeNext: Hardware-Aware Neural Network Design⁠

“Wide Compression: Tensor Ring Nets ”, Wang et al 2018

Wide Compression: Tensor Ring Nets⁠

“Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing ”, Rosenfeld & Tsotsos 2018

Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing⁠

“Fix Your Classifier: the Marginal Value of Training the Last Weight Layer ”, Hoffer et al 2018

Fix your classifier: the marginal value of training the last weight layer⁠

“Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition ”, Ye et al 2017

Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition⁠

“3D Semantic Segmentation With Submanifold Sparse Convolutional Networks ”, Graham et al 2017

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks⁠

“XUnit: Learning a Spatial Activation Function for Efficient Image Restoration ”, Kligvasser et al 2017

xUnit: Learning a Spatial Activation Function for Efficient Image Restoration⁠

“Natural Language Processing With Small Feed-Forward Networks ”, Botha et al 2017

Natural Language Processing with Small Feed-Forward Networks⁠

“ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices ”, Zhang et al 2017

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices⁠

“Submanifold Sparse Convolutional Networks ”, Graham & Maaten 2017

Submanifold Sparse Convolutional Networks⁠

“Shake-Shake Regularization of 3-Branch Residual Networks ”, Gastaldi 2017

Shake-Shake regularization of 3-branch residual networks⁠

“Using the Output Embedding to Improve Language Models ”, Press & Wolf 2016

Using the Output Embedding to Improve Language Models⁠

“Deep Residual Learning for Image Recognition ”, He et al 2015

Deep Residual Learning for Image Recognition⁠

“Tensorizing Neural Networks ”, Novikov et al 2015

Tensorizing Neural Networks⁠

“Eight Pairs of Descending Visual Neurons in the Dragonfly Give Wing Motor Centers Accurate Population Vector of Prey Direction ”, Gonzalez-Bellido et al 2013

Eight pairs of descending visual neurons in the dragonfly give wing motor centers accurate population vector of prey direction⁠

“The Cat Is out of the Bag: Cortical Simulations With 10⁹ Neurons, 10¹³ Synapses ”, Ananthanarayanan et al 2009

The cat is out of the bag: cortical simulations with 10⁹ neurons, 10¹³ synapses⁠

“On the Computational Power of Threshold Circuits With Sparse Activity ”, Uchizawa et al 2006

On the Computational Power of Threshold Circuits with Sparse Activity⁠

“Networks of Spiking Neurons: The Third Generation of Neural Network Models ”, Maass 1997

Networks of spiking neurons: The third generation of neural network models⁠

“Characteristics of Sparsely Encoded Associative Memory ”, Amari 1989

Characteristics of sparsely encoded associative memory⁠

“[2110.08152] Kronecker Decomposition for GPT Compression ”

⁠[2110.08152] Kronecker Decomposition for GPT Compression⁠ :

View PDF:

⁠/doc/www/arxiv.org/ae4a089397d3b8667469ba90ca313ead5a4bdcb0.pdf⁠

“Higher Accuracy on Vision Models With EfficientNet-Lite ”

⁠Higher accuracy on vision models with EfficientNet-Lite⁠ :

View HTML:

⁠/doc/www/blog.tensorflow.org/5190b62fb9f2d53675a2f934d01f87ef413057a8.html⁠

“Something Weird Is Happening With LLMs and Chess ”, Dynomight 2025

⁠Something weird is happening with LLMs and chess⁠

“Delivering Real-Time AI in the Palm of Your Hand ”

⁠Delivering real-time AI in the palm of your hand⁠ :

View HTML:

⁠/doc/www/engineering.fb.com/65910fdbbc7e7f5970d2ecf96c18a0eb77eab3cf.html⁠

“Sparsity-Aware Deep Learning Inference Runtime for CPUs ”

Sparsity-aware deep learning inference runtime for CPUs⁠

“Neuralmagic/sparseml: Libraries for Applying Sparsification Recipes to Neural Networks With a Few Lines of Code, Enabling Faster and Smaller Models ”

neuralmagic/sparseml: Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models⁠

“An Estimation of the Absolute Number of Axons Indicates That Human Cortical Areas Are Sparsely Connected ”

An estimation of the absolute number of axons indicates that human cortical areas are sparsely connected⁠

“Creating a 17 KB Style Transfer Model With Layer Pruning and Quantization ”, Toole 2025

Creating a 17 KB style transfer model with layer pruning and quantization⁠

“BERT-Large: Prune Once for DistilBERT Inference Performance ”

⁠BERT-Large: Prune Once for DistilBERT Inference Performance :

View HTML:

⁠/doc/www/neuralmagic.com/4e89fd35918a0a8e03c1d63ee7c5af3e1d76e968.html⁠

“Circuits in Superposition: Compressing Many Small Neural Networks into One ”

⁠Circuits in Superposition: Compressing many small neural networks into one⁠ :

View HTML:

⁠/doc/www/www.greaterwrong.com/56cb7ccd134aaa922ba1f32126ca7c67fc25fb15.html#Read_in_interference⁠

“Measuring the Intrinsic Dimension of Objective Landscapes [Video] ”

⁠Measuring the Intrinsic Dimension of Objective Landscapes [video]⁠ :

⁠https://www.youtube.com/watch?v=uSZWeRADTFI#uber⁠

Sort By Magic

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`neural-scaling model-reasoning population-embedding deep-reinforcement language-mapping`

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

`data-multiplexing`

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

`sparsity-optimization neural-hardware efficient-architecture spiking-neurons model-compression adaptive-learning`

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

Wikipedia (2)

Miscellaneous

Bibliography

https://arxiv.org/abs/2403.17844: “Mechanistic Design and Scaling of Hybrid Architectures ”⁠, Michael Poli, Armin W. Thomas, Eric Nguyen …, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting⁠, Taiji Suzuki, Brian Hie, Stefano Ermon⁠, Christopher Ré⁠, Ce Zhang, Stefano Massaroli
link-bibliography⁠
https://arxiv.org/abs/2311.10770: “Exponentially Faster Language Modeling ”⁠, Peter Belcak, Roger Wattenhofer⁠
link-bibliography⁠
https://www.sciencedirect.com/science/article/pii/S0893608023005051: “An Exact Mapping from ReLU Networks to Spiking Neural Networks ”⁠, Ana Stanojevic, Stanisław Woźniak, Guillaume Bellec …, Giovanni Cherubini, Angeliki Pantazi, Wulfram Gerstner⁠
link-bibliography⁠
https://arxiv.org/abs/2310.17157: “Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time ”⁠, Zichang Liu, Jue Wang⁠, ⁠Tri Dao …, Tianyi Zhou, Binhang Yuan, Zhao Song⁠, Anshumali Shrivastava, Ce Zhang, ⁠Yuandong Tian, Christopher Re⁠, Beidi Chen
link-bibliography⁠
https://arxiv.org/abs/2308.14711: “Fast Feedforward Networks ”⁠, Peter Belcak, Roger Wattenhofer⁠
link-bibliography⁠
https://arxiv.org/abs/2302.12441: “MUX-PLMs: Pre-Training Language Models With Data Multiplexing ”⁠, Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez …, Izhak Shafran, Mingqiu Wang, Yuan Cao⁠, Karthik Narasimhan
link-bibliography⁠
https://arxiv.org/abs/2210.06313#google: “The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers ”⁠, Zonglin Li, Chong You, Srinadh Bhojanapalli …, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar⁠
link-bibliography⁠
https://arxiv.org/abs/2207.03620: “More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 Using Sparsity (SLaK) ”⁠, Shiwei Liu, Tianlong Chen, Xiaohan Chen …, Xuxi Chen, Qiao Xiao, Boqian Wu, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang
link-bibliography⁠
https://arxiv.org/abs/2205.03983#google: “Building Machine Translation Systems for the Next Thousand Languages ”⁠, Ankur Bapna, Isaac Caswell, Julia Kreutzer …, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao⁠, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang⁠, Zhifeng Chen, Yonghui Wu⁠, Macduff Hughes
link-bibliography⁠
https://arxiv.org/abs/2204.00595: “Monarch: Expressive Structured Matrices for Efficient and Accurate Training ”⁠, ⁠Tri Dao, Beidi Chen, Nimit Sohoni …, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré⁠
link-bibliography⁠
https://arxiv.org/abs/2203.06850: “Efficient Language Modeling With Sparse All-MLP ”⁠, Ping Yu, Mikel Artetxe⁠, Myle Ott …, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li
link-bibliography⁠
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning ”⁠, Siqi Liu, Luke Marris, Daniel Hennes …, Josh Merel, Nicolas Heess⁠, ⁠Thore Graepel
link-bibliography⁠
https://arxiv.org/abs/2106.09685#microsoft: “LoRA: Low-Rank Adaptation of Large Language Models ”⁠, Edward J. Hu, Yelong Shen, Phillip Wallis …, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
link-bibliography⁠
https://arxiv.org/abs/1905.11946#google: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ”⁠, Mingxing Tan, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/1803.10615: “SqueezeNext: Hardware-Aware Neural Network Design ”⁠, Amir Gholami, Kiseok Kwon, Bichen Wu …, Zizheng Tai, Xiangyu Yue, Peter Jin, Sicheng Zhao, Kurt Keutzer⁠
link-bibliography⁠
https://arxiv.org/abs/1512.03385#microsoft: “Deep Residual Learning for Image Recognition ”⁠, Kaiming He⁠, Xiangyu Zhang, Shaoqing Ren, Jian Sun
link-bibliography⁠

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]