- See Also
-
Links
- “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
- “Binary and Ternary Natural Language Generation”, Liu et al 2023
- “AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration”, Lin et al 2023
- “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, Mallasén et al 2023
- “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
- “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Zhu et al 2023
- “BMT: Binarized Neural Machine Translation”, Zhang et al 2023
- “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
- “Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, Kim et al 2022
- “Efficiently Scaling Transformer Inference”, Pope et al 2022
- “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Frantar et al 2022
- “Fast DistilBERT on CPUs”, Shen et al 2022
- “GLM-130B: An Open Bilingual Pre-trained Model”, Zeng et al 2022
- “FP8 Formats for Deep Learning”, Micikevicius et al 2022
-
“
LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022 - “Is Integer Arithmetic Enough for Deep Learning Training?”, Ghaffari et al 2022
- “On-Device Training Under 256KB Memory”, Lin et al 2022
- “How to Train Accurate BNNs for Embedded Systems?”, Putter & Corporaal 2022
- “Director: Deep Hierarchical Planning from Pixels”, Hafner et al 2022
- “8-bit Numerical Formats for Deep Neural Networks”, Noune et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
- “XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, Wu et al 2022
- “Matryoshka Representations for Adaptive Deployment”, Kusupati et al 2022
- “PLAID: An Efficient Engine for Late Interaction Retrieval”, Santhanam et al 2022
- “Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Lu et al 2022
- “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Bailey 2022
- “Boosted Dense Retriever”, Lewis et al 2021
- “FQ-ViT: Fully Quantized Vision Transformer without Retraining”, Lin et al 2021
- “𝜇NCA: Texture Generation With Ultra-Compact Neural Cellular Automata”, Mordvintsev & Niklasson 2021
- “Prune Once for All: Sparse Pre-Trained Language Models”, Zafrir et al 2021
- “8-bit Optimizers via Block-wise Quantization”, Dettmers et al 2021
- “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, Bondarenko et al 2021
- “A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, Diffenderfer et al 2021
- “Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, Menghani 2021
- “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Jouppi et al 2021
- “High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, Mudigere et al 2021
- “Deep Residual Learning in Spiking Neural Networks”, Fang et al 2021
- “1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Tang et al 2021
- “ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Song et al 2021
- “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Fedus et al 2021
- “A Primer in BERTology: What We Know about How BERT Works”, Rogers et al 2020
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
- “RegDeepDanbooru: Yet Another Deep Danbooru Project”, zyddnys 2020
- “TernaryBERT: Distillation-aware Ultra-low Bit BERT”, Zhang et al 2020
- “HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, Garland & Gregg 2020
- “Bayesian Bits: Unifying Quantization and Pruning”, Baalen et al 2020
- “General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference”, Du et al 2020
- “Lite Transformer With Long-Short Range Attention”, Wu et al 2020
- “Training With Quantization Noise for Extreme Model Compression”, Fan et al 2020
- “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Li et al 2020
- “SWAT: Sparse Weight Activation Training”, Raihan & Aamodt 2020
- “QUARL: Quantized Reinforcement Learning (ActorQ)”, Lam et al 2019
- “SCaNN: Accelerating Large-Scale Inference With Anisotropic Vector Quantization”, Guo et al 2019
- “And the Bit Goes Down: Revisiting the Quantization of Neural Networks”, Stock et al 2019
- “Surrogate Gradient Learning in Spiking Neural Networks”, Neftci et al 2019
- “Rethinking Floating Point for Deep Learning”, Johnson 2018
- “Learning Recurrent Binary/Ternary Weights”, Ardakani et al 2018
- “Rethinking Numerical Representations for Deep Neural Networks”, Hill et al 2018
- “Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, Jia et al 2018
- “Quantization Mimic: Towards Very Tiny CNN for Object Detection”, Wei et al 2018
- “Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, Howard 2018
- “High-Accuracy Low-Precision Training”, Sa et al 2018
- “Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, McDonnell 2018
- “Universal Deep Neural Network Compression”, Choi et al 2018
- “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Lin et al 2017
- “Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, Wu et al 2017
- “Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Sun et al 2017
- “Compressing Word Embeddings via Deep Compositional Code Learning”, Shu & Nakayama 2017
- “Learning Discrete Weights Using the Local Reparameterization Trick”, Shayer et al 2017
- “TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”, Loroch et al 2017
- “Mixed Precision Training”, Micikevicius et al 2017
- “Beating Floating Point at Its Own Game: Posit Arithmetic”, Gustafson & Yonemoto 2017
- “Bolt: Accelerated Data Mining With Fast Vector Compression”, Blalock & Guttag 2017
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Wu et al 2016
- “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, Esser et al 2016
- “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”, Rastegari et al 2016
- “BinaryConnect: Training Deep Neural Networks With Binary Weights during Propagations”, Courbariaux et al 2015
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”
“Binary and Ternary Natural Language Generation”, Liu et al 2023
“AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration”, Lin et al 2023
“AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration”
“Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, Mallasén et al 2023
“Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”
“Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
“SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Zhu et al 2023
“SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks”
“BMT: Binarized Neural Machine Translation”, Zhang et al 2023
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”
“Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, Kim et al 2022
“Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”
“Efficiently Scaling Transformer Inference”, Pope et al 2022
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Frantar et al 2022
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”
“Fast DistilBERT on CPUs”, Shen et al 2022
“GLM-130B: An Open Bilingual Pre-trained Model”, Zeng et al 2022
“FP8 Formats for Deep Learning”, Micikevicius et al 2022
“LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022
“LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”
“Is Integer Arithmetic Enough for Deep Learning Training?”, Ghaffari et al 2022
“On-Device Training Under 256KB Memory”, Lin et al 2022
“How to Train Accurate BNNs for Embedded Systems?”, Putter & Corporaal 2022
“Director: Deep Hierarchical Planning from Pixels”, Hafner et al 2022
“8-bit Numerical Formats for Deep Neural Networks”, Noune et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”
“XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, Wu et al 2022
“XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”
“Matryoshka Representations for Adaptive Deployment”, Kusupati et al 2022
“PLAID: An Efficient Engine for Late Interaction Retrieval”, Santhanam et al 2022
“Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Lu et al 2022
“Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”
“Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Bailey 2022
“Boosted Dense Retriever”, Lewis et al 2021
“FQ-ViT: Fully Quantized Vision Transformer without Retraining”, Lin et al 2021
“FQ-ViT: Fully Quantized Vision Transformer without Retraining”
“𝜇NCA: Texture Generation With Ultra-Compact Neural Cellular Automata”, Mordvintsev & Niklasson 2021
“𝜇NCA: Texture Generation with Ultra-Compact Neural Cellular Automata”
“Prune Once for All: Sparse Pre-Trained Language Models”, Zafrir et al 2021
“8-bit Optimizers via Block-wise Quantization”, Dettmers et al 2021
“Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, Bondarenko et al 2021
“Understanding and Overcoming the Challenges of Efficient Transformer Quantization”
“A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, Diffenderfer et al 2021
“A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”
“Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, Menghani 2021
“Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”
“Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Jouppi et al 2021
“High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, Mudigere et al 2021
“High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”
“Deep Residual Learning in Spiking Neural Networks”, Fang et al 2021
“1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Tang et al 2021
“1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed”
“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Song et al 2021
“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”
“Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Fedus et al 2021
“Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”
“A Primer in BERTology: What We Know about How BERT Works”, Rogers et al 2020
“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
“L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm”
“RegDeepDanbooru: Yet Another Deep Danbooru Project”, zyddnys 2020
“TernaryBERT: Distillation-aware Ultra-low Bit BERT”, Zhang et al 2020
“HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, Garland & Gregg 2020
“Bayesian Bits: Unifying Quantization and Pruning”, Baalen et al 2020
“General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference”, Du et al 2020
“General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference”
“Lite Transformer With Long-Short Range Attention”, Wu et al 2020
“Training With Quantization Noise for Extreme Model Compression”, Fan et al 2020
“Training with Quantization Noise for Extreme Model Compression”
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Li et al 2020
“SWAT: Sparse Weight Activation Training”, Raihan & Aamodt 2020
“QUARL: Quantized Reinforcement Learning (ActorQ)”, Lam et al 2019
“SCaNN: Accelerating Large-Scale Inference With Anisotropic Vector Quantization”, Guo et al 2019
“SCaNN: Accelerating Large-Scale Inference with Anisotropic Vector Quantization”
“And the Bit Goes Down: Revisiting the Quantization of Neural Networks”, Stock et al 2019
“And the Bit Goes Down: Revisiting the Quantization of Neural Networks”
“Surrogate Gradient Learning in Spiking Neural Networks”, Neftci et al 2019
“Rethinking Floating Point for Deep Learning”, Johnson 2018
“Learning Recurrent Binary/Ternary Weights”, Ardakani et al 2018
“Rethinking Numerical Representations for Deep Neural Networks”, Hill et al 2018
“Rethinking Numerical Representations for Deep Neural Networks”
“Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, Jia et al 2018
“Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in 4 Minutes”
“Quantization Mimic: Towards Very Tiny CNN for Object Detection”, Wei et al 2018
“Quantization Mimic: Towards Very Tiny CNN for Object Detection”
“Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, Howard 2018
“Training Imagenet in 3 hours for $25; and CIFAR-10 for $0.26”
“High-Accuracy Low-Precision Training”, Sa et al 2018
“Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, McDonnell 2018
“Training wide residual networks for deployment using a single bit for each weight”
“Universal Deep Neural Network Compression”, Choi et al 2018
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Lin et al 2017
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”
“Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, Wu et al 2017
“Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”
“Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Sun et al 2017
“Compressing Word Embeddings via Deep Compositional Code Learning”, Shu & Nakayama 2017
“Compressing Word Embeddings via Deep Compositional Code Learning”
“Learning Discrete Weights Using the Local Reparameterization Trick”, Shayer et al 2017
“Learning Discrete Weights Using the Local Reparameterization Trick”
“TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”, Loroch et al 2017
“TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”
“Mixed Precision Training”, Micikevicius et al 2017
“Beating Floating Point at Its Own Game: Posit Arithmetic”, Gustafson & Yonemoto 2017
“Bolt: Accelerated Data Mining With Fast Vector Compression”, Blalock & Guttag 2017
“Bolt: Accelerated Data Mining with Fast Vector Compression”
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Wu et al 2016
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”
“Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, Esser et al 2016
“Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”
“XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”, Rastegari et al 2016
“XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”
“BinaryConnect: Training Deep Neural Networks With Binary Weights during Propagations”, Courbariaux et al 2015
“BinaryConnect: Training Deep Neural Networks with binary weights during propagations”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
hardware-optimization
compression
quantization
Wikipedia
Miscellaneous
-
/doc/ai/nn/sparsity/low-precision/2021-fedus-figure1-switchmoetransformerscaling.png
-
https://blog.research.google/2022/09/quantization-for-fast-and.html
-
https://github.com/vitoplantamura/OnnxStream/tree/846da873570a737b49154e8f835704264864b0fe
-
https://observablehq.com/@rreusser/half-precision-floating-point-visualized
-
https://twitter.com/thecharlieblake/status/1581913495670755328
-
https://www.reddit.com/r/mlscaling/comments/146rgq2/chatgpt_is_running_quantized/
Link Bibliography
-
https://arxiv.org/abs/2305.06946
: “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, David Mallasén, Alberto A. Del Barrio, Manuel Prieto-Matias -
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and
: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org -
https://arxiv.org/abs/2302.13939
: “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian -
https://arxiv.org/abs/2302.04907#google
: “BMT: Binarized Neural Machine Translation”, Yichi Zhang, Ankush Garg, Yuan Cao, Łukasz Lew, Behrooz Ghorbani, Zhiru Zhang, Orhan Firat -
https://arxiv.org/abs/2211.10438
: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han -
https://arxiv.org/abs/2211.05102#google
: “Efficiently Scaling Transformer Inference”, -
https://arxiv.org/abs/2210.17323
: “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh -
https://arxiv.org/abs/2210.02414#baai
: “GLM-130B: An Open Bilingual Pre-trained Model”, -
https://arxiv.org/abs/2206.15472
: “On-Device Training Under 256KB Memory”, Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han -
https://arxiv.org/abs/2206.04114#google
: “Director: Deep Hierarchical Planning from Pixels”, Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel -
https://arxiv.org/abs/2206.01861#microsoft
: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He -
https://arxiv.org/abs/2206.01859#microsoft
: “XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He -
https://arxiv.org/abs/2205.13147
: “Matryoshka Representations for Adaptive Deployment”, -
https://arxiv.org/abs/2202.06009#microsoft
: “Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He -
https://semiengineering.com/is-programmable-overhead-worth-the-cost/
: “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Brian Bailey -
https://arxiv.org/abs/2111.13824
: “FQ-ViT: Fully Quantized Vision Transformer without Retraining”, Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, Shuchang Zhou -
https://arxiv.org/abs/2111.05754
: “Prune Once for All: Sparse Pre-Trained Language Models”, Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat -
https://arxiv.org/abs/2110.02861
: “8-bit Optimizers via Block-wise Quantization”, Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer -
https://arxiv.org/abs/2109.12948
: “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort -
2021-jouppi.pdf
: “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, -
https://arxiv.org/abs/2102.04159
: “Deep Residual Learning in Spiking Neural Networks”, Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian -
https://arxiv.org/abs/2102.02888#microsoft
: “1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, -
https://arxiv.org/abs/2101.03961#google
: “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, William Fedus, Barret Zoph, Noam Shazeer -
https://arxiv.org/abs/2004.07320#facebook
: “Training With Quantization Noise for Extreme Model Compression”, Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Arm, Joulin -
https://arxiv.org/abs/2001.01969
: “SWAT: Sparse Weight Activation Training”, Md Aamir Raihan, Tor M. Aamodt -
https://arxiv.org/abs/1910.01055#google
: “QUARL: Quantized Reinforcement Learning (ActorQ)”, -
https://www.fast.ai/2018/04/30/dawnbench-fastai/
: “Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, Jeremy Howard -
https://arxiv.org/abs/1802.08530
: “Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, Mark D. McDonnell -
https://arxiv.org/abs/1712.01887
: “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally -
https://arxiv.org/abs/1711.08141
: “Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, -
https://arxiv.org/abs/1603.05279
: “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”, Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi