- See Also
-
Links
- “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, Nolano.org 2023
- “BMT: Binarized Neural Machine Translation”, Et Al 2023
- “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Et Al 2022
- “Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, Et Al 2022
- “Efficiently Scaling Transformer Inference”, Et Al 2022
- “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Et Al 2022
- “Fast DistilBERT on CPUs”, Et Al 2022
- “GLM-130B: An Open Bilingual Pre-trained Model”, Et Al 2022
- “FP8 Formats for Deep Learning”, Et Al 2022
- “Is Integer Arithmetic Enough for Deep Learning Training?”, Et Al 2022
- “On-Device Training Under 256KB Memory”, Et Al 2022
- “How to Train Accurate BNNs for Embedded Systems?”, 2022
- “Director: Deep Hierarchical Planning from Pixels”, Et Al 2022
- “8-bit Numerical Formats for Deep Neural Networks”, Et Al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Et Al 2022
- “XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, Et Al 2022
- “Matryoshka Representations for Adaptive Deployment”, Et Al 2022
- “Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Et Al 2022
- “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, 2022
- “Boosted Dense Retriever”, Et Al 2021
- “FQ-ViT: Fully Quantized Vision Transformer without Retraining”, Et Al 2021
- “𝜇NCA: Texture Generation With Ultra-Compact Neural Cellular Automata”, 2021
- “Prune Once for All: Sparse Pre-Trained Language Models”, Et Al 2021
- “8-bit Optimizers via Block-wise Quantization”, Et Al 2021
- “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, Et Al 2021
- “A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, Et Al 2021
- “Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, 2021
- “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Et Al 2021
- “High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, Et Al 2021
- “1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Et Al 2021
- “ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Et Al 2021
- “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Et Al 2021
- “A Primer in BERTology: What We Know about How BERT Works”, Et Al 2020
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Et Al 2020
- “RegDeepDanbooru: Yet Another Deep Danbooru Project”, Zyddnys 2020
- “TernaryBERT: Distillation-aware Ultra-low Bit BERT”, Et Al 2020
- “HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, 2020
- “Bayesian Bits: Unifying Quantization and Pruning”, Et Al 2020
- “General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference”, Et Al 2020
- “Lite Transformer With Long-Short Range Attention”, Et Al 2020
- “Training With Quantization Noise for Extreme Model Compression”, Et Al 2020
- “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Et Al 2020
- “SWAT: Sparse Weight Activation Training”, 2020
- “QUARL: Quantized Reinforcement Learning (ActorQ)”, Et Al 2019
- “SCaNN: Accelerating Large-Scale Inference With Anisotropic Vector Quantization”, Et Al 2019
- “And the Bit Goes Down: Revisiting the Quantization of Neural Networks”, Et Al 2019
- “Rethinking Floating Point for Deep Learning”, 2018
- “Learning Recurrent Binary/Ternary Weights”, Et Al 2018
- “Rethinking Numerical Representations for Deep Neural Networks”, Et Al 2018
- “Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, Et Al 2018
- “Quantization Mimic: Towards Very Tiny CNN for Object Detection”, Et Al 2018
- “Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, 2018
- “High-Accuracy Low-Precision Training”, Et Al 2018
- “Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, 2018
- “Universal Deep Neural Network Compression”, Et Al 2018
- “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Et Al 2017
- “Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, Et Al 2017
- “Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Et Al 2017
- “Compressing Word Embeddings via Deep Compositional Code Learning”, 2017
- “Learning Discrete Weights Using the Local Reparameterization Trick”, Et Al 2017
- “TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”, Et Al 2017
- “Mixed Precision Training”, Et Al 2017
- “Bolt: Accelerated Data Mining With Fast Vector Compression”, 2017
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Et Al 2016
- “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, Et Al 2016
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, Nolano.org 2023
“Int-4 LLaMa is not enough—Int-3 and beyond: More compression, easier to build apps on LLMs that run locally”, 2023-03-13 ( ; similar; bibliography)
“BMT: Binarized Neural Machine Translation”, Et Al 2023
“BMT: Binarized Neural Machine Translation”, 2023-02-09 ( ; similar; bibliography)
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Et Al 2022
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, 2022-11-18 ( ; similar; bibliography)
“Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, Et Al 2022
“Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, 2022-11-18 ( ; similar)
“Efficiently Scaling Transformer Inference”, Et Al 2022
“Efficiently Scaling Transformer Inference”, 2022-11-09 ( ; similar; bibliography)
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Et Al 2022
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, 2022-10-31 ( ; backlinks; similar; bibliography)
“Fast DistilBERT on CPUs”, Et Al 2022
“Fast DistilBERT on CPUs”, 2022-10-27 ( ; similar)
“GLM-130B: An Open Bilingual Pre-trained Model”, Et Al 2022
“GLM-130B: An Open Bilingual Pre-trained Model”, 2022-10-05 ( ; similar; bibliography)
“FP8 Formats for Deep Learning”, Et Al 2022
“FP8 Formats for Deep Learning”, 2022-09-12 ( ; similar)
“Is Integer Arithmetic Enough for Deep Learning Training?”, Et Al 2022
“Is Integer Arithmetic Enough for Deep Learning Training?”, 2022-07-18 ( ; similar)
“On-Device Training Under 256KB Memory”, Et Al 2022
“On-Device Training Under 256KB Memory”, 2022-06-30 ( ; similar; bibliography)
“How to Train Accurate BNNs for Embedded Systems?”, 2022
“How to train accurate BNNs for embedded systems?”, 2022-06-24 (similar)
“Director: Deep Hierarchical Planning from Pixels”, Et Al 2022
“Director: Deep Hierarchical Planning from Pixels”, 2022-06-08 ( ; similar; bibliography)
“8-bit Numerical Formats for Deep Neural Networks”, Et Al 2022
“8-bit Numerical Formats for Deep Neural Networks”, 2022-06-06 ( ; similar)
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Et Al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, 2022-06-04 ( ; similar; bibliography)
“XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, Et Al 2022
“XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, 2022-06-04 ( ; similar; bibliography)
“Matryoshka Representations for Adaptive Deployment”, Et Al 2022
“Matryoshka Representations for Adaptive Deployment”, 2022-05-26 (similar; bibliography)
“Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, Et Al 2022
“Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam”, 2022-02-12 ( ; similar; bibliography)
“Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, 2022
“Is Programmable Overhead Worth The Cost? How much do we pay for a system to be programmable? It depends upon who you ask”, 2022-01-13 ( ; backlinks; similar; bibliography)
“Boosted Dense Retriever”, Et Al 2021
“Boosted Dense Retriever”, 2021-12-14 ( ; similar)
“FQ-ViT: Fully Quantized Vision Transformer without Retraining”, Et Al 2021
“FQ-ViT: Fully Quantized Vision Transformer without Retraining”, 2021-11-27 ( ; similar; bibliography)
“𝜇NCA: Texture Generation With Ultra-Compact Neural Cellular Automata”, 2021
“𝜇NCA: Texture Generation with Ultra-Compact Neural Cellular Automata”, 2021-11-26 ( ; backlinks; similar)
“Prune Once for All: Sparse Pre-Trained Language Models”, Et Al 2021
“Prune Once for All: Sparse Pre-Trained Language Models”, 2021-11-10 ( ; similar; bibliography)
“8-bit Optimizers via Block-wise Quantization”, Et Al 2021
“8-bit Optimizers via Block-wise Quantization”, 2021-10-06 (similar; bibliography)
“Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, Et Al 2021
“Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, 2021-09-27 ( ; similar; bibliography)
“A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, Et Al 2021
“A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, 2021-06-16 ( ; similar)
“Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, 2021
“Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, 2021-06-16 ( ; similar)
“Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Et Al 2021
“Ten Lessons From Three Generations Shaped Google’s TPUv4i”, 2021-06-14 ( ; similar; bibliography)
“High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, Et Al 2021
“High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models (DLRMs)”, 2021-04-12 ( ; similar)
“1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Et Al 2021
“1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed”, 2021-02-04 ( ; similar; bibliography)
“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, Et Al 2021
“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, 2021-01-19 ( ; similar)
“Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Et Al 2021
“Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”, 2021-01-11 ( ; similar; bibliography)
“A Primer in BERTology: What We Know about How BERT Works”, Et Al 2020
“A Primer in BERTology: What we know about how BERT works”, 2020-11-09 ( ; similar)
“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Et Al 2020
“L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm”, 2020-10-16 ( ; similar)
“RegDeepDanbooru: Yet Another Deep Danbooru Project”, Zyddnys 2020
“RegDeepDanbooru: Yet another Deep Danbooru project”, 2020-10-11 ( ; backlinks)
“TernaryBERT: Distillation-aware Ultra-low Bit BERT”, Et Al 2020
“TernaryBERT: Distillation-aware Ultra-low Bit BERT”, 2020-09-27 ( ; similar)
“HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, 2020
“HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, 2020-07-11 ( ; similar)
“Bayesian Bits: Unifying Quantization and Pruning”, Et Al 2020
“Bayesian Bits: Unifying Quantization and Pruning”, 2020-05-14 ( ; similar)
“General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference”, Et Al 2020
“General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference”, 2020-04-29 ( ; similar)
“Lite Transformer With Long-Short Range Attention”, Et Al 2020
“Lite Transformer with Long-Short Range Attention”, 2020-04-24 ( ; backlinks; similar)
“Training With Quantization Noise for Extreme Model Compression”, Et Al 2020
“Training with Quantization Noise for Extreme Model Compression”, 2020-04-15 (similar; bibliography)
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Et Al 2020
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, 2020-02-26 ( ; backlinks; similar)
“SWAT: Sparse Weight Activation Training”, 2020
“SWAT: Sparse Weight Activation Training”, 2020-01-07 (similar; bibliography)
“QUARL: Quantized Reinforcement Learning (ActorQ)”, Et Al 2019
“QUARL: Quantized Reinforcement Learning (ActorQ)”, 2019-10-02 ( ; similar; bibliography)
“SCaNN: Accelerating Large-Scale Inference With Anisotropic Vector Quantization”, Et Al 2019
“SCaNN: Accelerating Large-Scale Inference with Anisotropic Vector Quantization”, 2019-08-27 (similar)
“And the Bit Goes Down: Revisiting the Quantization of Neural Networks”, Et Al 2019
“And the Bit Goes Down: Revisiting the Quantization of Neural Networks”, 2019-07-12 (similar)
“Rethinking Floating Point for Deep Learning”, 2018
“Rethinking floating point for deep learning”, 2018-11-01 (similar)
“Learning Recurrent Binary/Ternary Weights”, Et Al 2018
“Learning Recurrent Binary/Ternary Weights”, 2018-09-28 ( ; similar)
“Rethinking Numerical Representations for Deep Neural Networks”, Et Al 2018
“Rethinking Numerical Representations for Deep Neural Networks”, 2018-08-07 (similar)
“Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, Et Al 2018
“Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in 4 Minutes”, 2018-07-30 ( ; similar)
“Quantization Mimic: Towards Very Tiny CNN for Object Detection”, Et Al 2018
“Quantization Mimic: Towards Very Tiny CNN for Object Detection”, 2018-05-06 (similar)
“Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, 2018
“Training Imagenet in 3 hours for $25; and CIFAR-10 for $0.26”, 2018-04-30 ( ; backlinks; similar; bibliography)
“High-Accuracy Low-Precision Training”, Et Al 2018
“High-Accuracy Low-Precision Training”, 2018-03-09 (similar)
“Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, 2018
“Training wide residual networks for deployment using a single bit for each weight”, 2018-02-23 (similar; bibliography)
“Universal Deep Neural Network Compression”, Et Al 2018
“Universal Deep Neural Network Compression”, 2018-02-07 (similar)
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Et Al 2017
“Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, 2017-12-05 ( ; similar; bibliography)
“Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, Et Al 2017
“Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, 2017-11-22 (similar; bibliography)
“Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, Et Al 2017
“Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, 2017-11-17 ( ; similar)
“Compressing Word Embeddings via Deep Compositional Code Learning”, 2017
“Compressing Word Embeddings via Deep Compositional Code Learning”, 2017-11-03 (similar)
“Learning Discrete Weights Using the Local Reparameterization Trick”, Et Al 2017
“Learning Discrete Weights Using the Local Reparameterization Trick”, 2017-10-21 (similar)
“TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”, Et Al 2017
“TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”, 2017-10-13 (backlinks; similar)
“Mixed Precision Training”, Et Al 2017
“Mixed Precision Training”, 2017-10-10 ( ; similar)
“Bolt: Accelerated Data Mining With Fast Vector Compression”, 2017
“Bolt: Accelerated Data Mining with Fast Vector Compression”, 2017-06-30 ( ; similar)
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Et Al 2016
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, 2016-09-26 ( ; similar)
“Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, Et Al 2016
“Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, 2016-03-28 ( ; similar)
Wikipedia
Miscellaneous
Link Bibliography
-
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and
: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org: -
https://arxiv.org/abs/2302.04907#google
: “BMT: Binarized Neural Machine Translation”, Yichi Zhang, Ankush Garg, Yuan Cao, Łukasz Lew, Behrooz Ghorbani, Zhiru Zhang, Orhan Firat: -
https://arxiv.org/abs/2211.10438
: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han: -
https://arxiv.org/abs/2211.05102#google
: “Efficiently Scaling Transformer Inference”, : -
https://arxiv.org/abs/2210.17323
: “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh: -
https://arxiv.org/abs/2210.02414#baai
: “GLM-130B: An Open Bilingual Pre-trained Model”, : -
https://arxiv.org/abs/2206.15472
: “On-Device Training Under 256KB Memory”, Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han: -
https://arxiv.org/abs/2206.04114#google
: “Director: Deep Hierarchical Planning from Pixels”, Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel: -
https://arxiv.org/abs/2206.01861#microsoft
: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He: -
https://arxiv.org/abs/2206.01859#microsoft
: “XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient”, Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He: -
https://arxiv.org/abs/2205.13147
: “Matryoshka Representations for Adaptive Deployment”, : -
https://arxiv.org/abs/2202.06009#microsoft
: “Maximizing Communication Efficiency for Large-scale Training via 0 / 1 Adam”, Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He: -
https://semiengineering.com/is-programmable-overhead-worth-the-cost/
: “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, Brian Bailey: -
https://arxiv.org/abs/2111.13824
: “FQ-ViT: Fully Quantized Vision Transformer without Retraining”, Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, Shuchang Zhou: -
https://arxiv.org/abs/2111.05754
: “Prune Once for All: Sparse Pre-Trained Language Models”, Ofir Zafrir, Ariel Larey, Guy Boudoukh, Haihao Shen, Moshe Wasserblat: -
https://arxiv.org/abs/2110.02861
: “8-bit Optimizers via Block-wise Quantization”, Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer: -
https://arxiv.org/abs/2109.12948
: “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort: -
2021-jouppi.pdf
: “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, : -
https://arxiv.org/abs/2102.02888#microsoft
: “1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, : -
https://arxiv.org/abs/2101.03961#google
: “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, William Fedus, Barret Zoph, Noam Shazeer: -
https://arxiv.org/abs/2004.07320#facebook
: “Training With Quantization Noise for Extreme Model Compression”, Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Arm, Joulin: -
https://arxiv.org/abs/2001.01969
: “SWAT: Sparse Weight Activation Training”, Md Aamir Raihan, Tor M. Aamodt: -
https://arxiv.org/abs/1910.01055#google
: “QUARL: Quantized Reinforcement Learning (ActorQ)”, : -
https://www.fast.ai/2018/04/30/dawnbench-fastai/
: “Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, Jeremy Howard: -
https://arxiv.org/abs/1802.08530
: “Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, Mark D. McDonnell: -
https://arxiv.org/abs/1712.01887
: “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally: -
https://arxiv.org/abs/1711.08141
: “Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, :