Bibliography:

  1. Neural Net Sparsity

  2. ‘neural net’ tag

  3. ‘knowledge distillation’ tag

  4. ‘reduced-precision NNs’ tag

  5. ‘NN pruning’ tag

  6. ‘sparse Transformers’ tag

  7. ‘MoE NN’ tag

  8. Portia spider’ tag

  9. ‘compression’ tag

  10. Convolutional Differentiable Logic Gate Networks

  11. LoRA vs Full Fine-tuning: An Illusion of Equivalence

  12. On the Complexity of Neural Computation in Superposition

  13. GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music

  14. CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

  15. Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

  16. ReFT: Representation Finetuning for Language Models

  17. Mechanistic Design and Scaling of Hybrid Architectures

  18. LTE: Training Neural Networks from Scratch with Parallel Low-Rank Adapters

  19. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

  20. Exponentially Faster Language Modeling

  21. DiLoCo: Distributed Low-Communication Training of Language Models

  22. Language Models are Super Mario (DARE): Absorbing Abilities from Homologous Models as a Free Lunch

  23. ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

  24. The Impact of Depth and Width on Transformer Language Model Generalization

  25. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

  26. Fast Feedforward Networks

  27. Any Deep ReLU Network is Shallow

  28. JaxPruner: A concise library for sparsity research

  29. Reusing Deep Neural Network Models through Model Re-engineering

  30. MUX-PLMs: Pre-training Language Models with Data Multiplexing

  31. DataMUX: Data Multiplexing for Neural Networks

  32. Deep Differentiable Logic Gate Networks

  33. The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

  34. Noise Transforms Feed-Forward Networks into Sparse Coding Networks

  35. Exploring Low Rank Training of Deep Neural Networks

  36. Monolith: Real Time Recommendation System With Collisionless Embedding Table

  37. More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity (SLaK)

  38. Building Machine Translation Systems for the Next Thousand Languages

  39. Monarch: Expressive Structured Matrices for Efficient and Accurate Training

  40. Efficient Language Modeling with Sparse All-MLP

  41. NeuPL: Neural Population Learning

  42. Datamodels: Predicting Predictions from Training Data

  43. Spiking Neural Networks and Their Applications: A Review

  44. Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters

  45. EvilModel: Hiding Malware Inside of Neural Network Models

  46. LoRA: Low-Rank Adaptation of Large Language Models

  47. On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

  48. The neural basis of intelligence in fine-grained cortical topographies

  49. Clusterability in Neural Networks

  50. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

  51. Scaling down Deep Learning

  52. Extreme Model Compression for On-device Natural Language Understanding

  53. Training independent subnetworks for robust prediction

  54. EventProp: Event-Based Backpropagation can compute Exact Gradients for Spiking Neural Networks

  55. On Linear Identifiability of Learned Representations

  56. Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

  57. Bayesian Deep Learning and a Probabilistic Perspective of Generalization

  58. Neural Arithmetic Units

  59. Linear Mode Connectivity and the Lottery Ticket Hypothesis

  60. Learning to Seek: Autonomous Source Seeking with Deep Reinforcement Learning Onboard a Nano Drone Microcontroller

  61. Does Learning Require Memorization? A Short Tale about a Long Tail

  62. Weight Agnostic Neural Networks

  63. StyleNAS: An Empirical Study of Neural Architecture Search to Uncover Surprisingly Fast End-to-End Universal Style Transfer Networks

  64. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

  65. Superposition of many models into one

  66. Playing Atari with Six Neurons

  67. Measuring the Intrinsic Dimension of Objective Landscapes

  68. SqueezeNext: Hardware-Aware Neural Network Design

  69. Wide Compression: Tensor Ring Nets

  70. Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing

  71. Fix your classifier: the marginal value of training the last weight layer

  72. Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

  73. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

  74. xUnit: Learning a Spatial Activation Function for Efficient Image Restoration

  75. Natural Language Processing with Small Feed-Forward Networks

  76. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

  77. Submanifold Sparse Convolutional Networks

  78. Shake-Shake regularization of 3-branch residual networks

  79. Using the Output Embedding to Improve Language Models

  80. Deep Residual Learning for Image Recognition

  81. Tensorizing Neural Networks

  82. Eight pairs of descending visual neurons in the dragonfly give wing motor centers accurate population vector of prey direction

  83. The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses

  84. On the Computational Power of Threshold Circuits with Sparse Activity

  85. Networks of spiking neurons: The third generation of neural network models

  86. Characteristics of sparsely encoded associative memory

  87. [2110.08152] Kronecker Decomposition for GPT Compression

  88. ae4a089397d3b8667469ba90ca313ead5a4bdcb0.pdf

  89. Higher Accuracy on Vision Models With EfficientNet-Lite

  90. 5190b62fb9f2d53675a2f934d01f87ef413057a8.html

  91. Something Weird Is Happening With LLMs and Chess

  92. Delivering Real-Time AI in the Palm of Your Hand

  93. 65910fdbbc7e7f5970d2ecf96c18a0eb77eab3cf.html

  94. Sparsity-Aware Deep Learning Inference Runtime for CPUs

  95. Neuralmagic/sparseml: Libraries for Applying Sparsification Recipes to Neural Networks With a Few Lines of Code, Enabling Faster and Smaller Models

  96. An Estimation of the Absolute Number of Axons Indicates That Human Cortical Areas Are Sparsely Connected

  97. Creating a 17 KB Style Transfer Model With Layer Pruning and Quantization

  98. BERT-Large: Prune Once for DistilBERT Inference Performance

  99. 4e89fd35918a0a8e03c1d63ee7c5af3e1d76e968.html

  100. Circuits in Superposition: Compressing Many Small Neural Networks into One

  101. 56cb7ccd134aaa922ba1f32126ca7c67fc25fb15.html#Read_in_interference

  102. Measuring the Intrinsic Dimension of Objective Landscapes [Video]

  103. design#future-tag-features

    [Transclude the forward-link's context]

  104. 2022-bapna-figure2-googletranslateneuralmachinetranslationscalingbylanguagecorpussize.jpg

  105. 2018-cheng.pdf

  106. 2017-rawat.pdf

  107. http://manikvarma.org/pubs/kusupati18.pdf

  108. 0c4fe9e4e42e662052589c972c597cd55db28216.pdf

  109. https://ai.facebook.com/blog/a-highly-efficient-real-time-text-to-speech-system-deployed-on-cpus/

  110. https://blog.floydhub.com/knowledge-distillation/

  111. fce0500f7d40307993d6a8118acaae98bcd302dd.html

  112. https://blog.roblox.com/2020/05/scaled-bert-serve-1-billion-daily-requests-cpus/

  113. b5c0281e5c48bf3feee0e605ebb3e844050b1bc9.html

  114. https://cprimozic.net/blog/growing-sparse-computational-graphs-with-rnns/

  115. c9d32a6c96999e0be1f657d4316c77ec606c7677.html

  116. https://old.reddit.com/r/slatestarcodex/comments/1201v68/10word_quote_a_short_and_simple_failure_mode_of/jdjsx43/

  117. https://openai.com/pricing#fine-tuning-models

  118. https://research.google/blog/an-all-neural-on-device-speech-recognizer/

  119. https://research.google/blog/auto-generated-summaries-in-google-docs/

  120. https://research.google/blog/custom-on-device-ml-models-with-learn2compress/

  121. https://research.google/blog/efficient-sequence-modeling-for-on-device-ml/

  122. https://research.google/blog/grammar-correction-as-you-type-on-pixel-6/

  123. https://research.google/blog/training-machine-learning-models-more-efficiently-with-dataset-distillation/

  124. https://tech.pic-collage.com/distillation-of-clip-model-and-other-experiments-f8394b7321ce

  125. feb67cfe48e6266bfd331586a8e75b88972f5b9c.html

  126. https://www.lesswrong.com/posts/7fxusXdkMNmAhkAfc/finding-sparse-linear-connections-between-features-in-llms

  127. https://www.lesswrong.com/posts/qmQFHCgCyEEjuy5a7/lora-fine-tuning-efficiently-undoes-safety-training-from

  128. https://www.quantamagazine.org/sparse-neural-networks-point-physicists-to-useful-data-20230608/

  129. 6f5ad29694d194976fd23467d00a4bbabcbca622.html

  130. https://www.reddit.com/r/LocalLLaMA/comments/18luk10/wait_llama_and_falcon_are_also_moe/

  131. https://x.com/CFGeek/status/1826749739502895618

  132. Mechanistic Design and Scaling of Hybrid Architectures

  133. Stefano Ermon

  134. https%253A%252F%252Farxiv.org%252Fabs%252F2403.17844.html

  135. Exponentially Faster Language Modeling

  136. https%253A%252F%252Farxiv.org%252Fabs%252F2311.10770.html

  137. Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

  138. Tri Dao

  139. https%253A%252F%252Farxiv.org%252Fabs%252F2310.17157.html

  140. Fast Feedforward Networks

  141. https%253A%252F%252Farxiv.org%252Fabs%252F2308.14711.html

  142. MUX-PLMs: Pre-training Language Models with Data Multiplexing

  143. https%253A%252F%252Farxiv.org%252Fabs%252F2302.12441.html

  144. The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

  145. Sanjiv Kumar

  146. https%253A%252F%252Farxiv.org%252Fabs%252F2210.06313%2523google.html

  147. More ConvNets in the 2020s: Scaling up Kernels Beyond 51×51 using Sparsity (SLaK)

  148. https%253A%252F%252Farxiv.org%252Fabs%252F2207.03620.html

  149. Building Machine Translation Systems for the Next Thousand Languages

  150. https%253A%252F%252Farxiv.org%252Fabs%252F2205.03983%2523google.html

  151. Monarch: Expressive Structured Matrices for Efficient and Accurate Training

  152. Tri Dao

  153. https%253A%252F%252Farxiv.org%252Fabs%252F2204.00595.html

  154. Efficient Language Modeling with Sparse All-MLP

  155. https%253A%252F%252Farxiv.org%252Fabs%252F2203.06850.html

  156. NeuPL: Neural Population Learning

  157. Nicolas Heess

  158. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html

  159. LoRA: Low-Rank Adaptation of Large Language Models

  160. https%253A%252F%252Farxiv.org%252Fabs%252F2106.09685%2523microsoft.html

  161. Scaling down Deep Learning

  162. About Sam Greydanus

  163. https%253A%252F%252Fgreydanus.github.io%252F2020%252F12%252F01%252Fscaling-down%252F.html

  164. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

  165. https%253A%252F%252Farxiv.org%252Fabs%252F1905.11946%2523google.html

  166. SqueezeNext: Hardware-Aware Neural Network Design

  167. https%253A%252F%252Farxiv.org%252Fabs%252F1803.10615.html