“‘Reduced-Precision NNs’ Tag”,2019-12-27 (; backlinks):
![]()
Bibliography for tag
ai/nn/sparsity/low-precision, most recent first: 1 related tag, 106 annotations, & 24 links (parent).
- See Also
- Links
- “Model Equality Testing: Which Model Is This API Serving?”, et al 2024
- “A Visual Guide to Quantization”, 2024
- “OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training”, et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, et al 2024
- “Nemotron-4 340B Technical Report”, et al 2024
- “Scalable Matmul-Free Language Modeling”, et al 2024
- “Neural Networks (MNIST Inference) on the ‘3¢’ Microcontroller”, cpldcpu 2024
- “How Good Are Low-Bit Quantized LLaMA-3 Models? An Empirical Study”, et al 2024
- “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-2024
- “LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models”, et al 2024
- “Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression”, et al 2024
- “The Era of 1-Bit LLMs: All Large Language Models Are in 1.58 Bits”, et al 2024
- “FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design”, et al 2024
- “Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws”, 2023
- “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, et al 2023
- “LLM-FP4: 4-Bit Floating-Point Quantized Transformers”, et al 2023
- “Training Transformers With 4-Bit Integers”, et al 2023
- “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, et al 2023
- “Binary and Ternary Natural Language Generation”, et al 2023
- “AWQ: Activation-Aware Weight Quantization for LLM Compression and Acceleration”, et al 2023
- “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing”, et al 2023
- “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
- “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, et al 2023
- “BMT: Binarized Neural Machine Translation”, et al 2023
- “Self-Compressing Neural Networks”, 2023
- “Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, et al 2022
- “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, et al 2022
- “Efficiently Scaling Transformer Inference”, et al 2022
- “GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers”, et al 2022
- “Fast DistilBERT on CPUs”, et al 2022
- “Broken Neural Scaling Laws”, et al 2022
- “GLM-130B: An Open Bilingual Pre-Trained Model”, et al 2022
- “FP8 Formats for Deep Learning”, et al 2022
- “
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale”, et al 2022- “Is Integer Arithmetic Enough for Deep Learning Training?”, et al 2022
- “On-Device Training Under 256KB Memory”, et al 2022
- “How to Train Accurate BNNs for Embedded Systems?”, 2022
- “Director: Deep Hierarchical Planning from Pixels”, et al 2022
- “8-Bit Numerical Formats for Deep Neural Networks”, et al 2022
- “XTC: Extreme Compression for Pre-Trained Transformers Made Simple and Efficient”, et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, et al 2022
- “Matryoshka Representations for Adaptive Deployment”, et al 2022
- “PLAID: An Efficient Engine for Late Interaction Retrieval”, et al 2022
- “Maximizing Communication Efficiency for Large-Scale Training via 0/1 Adam”, et al 2022
- “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask”, 2022
- “Boosted Dense Retriever”, et al 2021
- “FQ-ViT: Fully Quantized Vision Transformer without Retraining”, et al 2021
- “𝜇NCA: Texture Generation With Ultra-Compact Neural Cellular Automata”, 2021
- “Prune Once for All: Sparse Pre-Trained Language Models”, et al 2021
- “8-Bit Optimizers via Block-Wise Quantization”, et al 2021
- “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, et al 2021
- “Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, 2021
- “A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness”, et al 2021
- “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, et al 2021
- “High-Performance, Distributed Training of Large-Scale Deep Learning Recommendation Models (DLRMs)”, et al 2021
- “Deep Residual Learning in Spiking Neural Networks”, et al 2021
- “1-Bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, et al 2021
- “ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution”, et al 2021
- “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, et al 2021
- “A Primer in BERTology: What We Know about How BERT Works”, et al 2020
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, et al 2020
- “RegDeepDanbooru: Yet Another Deep Danbooru Project”, zyddnys 2020
- “TernaryBERT: Distillation-Aware Ultra-Low Bit BERT”, et al 2020
- “HOBFLOPS CNNs: Hardware Optimized Bitslice-Parallel Floating-Point Operations for Convolutional Neural Networks”, 2020
- “Bayesian Bits: Unifying Quantization and Pruning”, et al 2020
- “General Purpose Text Embeddings from Pre-Trained Language Models for Scalable Inference”, et al 2020
- “Lite Transformer With Long-Short Range Attention”, et al 2020
- “Training With Quantization Noise for Extreme Model Compression”, et al 2020
- “Moniqua: Modulo Quantized Communication in Decentralized SGD”, 2020
- “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, et al 2020
- “SWAT: Sparse Weight Activation Training”, 2020
- “QUARL: Quantized Reinforcement Learning (ActorQ)”, et al 2019
- “SCaNN: Accelerating Large-Scale Inference With Anisotropic Vector Quantization”, et al 2019
- “And the Bit Goes Down: Revisiting the Quantization of Neural Networks”, et al 2019
- “Surrogate Gradient Learning in Spiking Neural Networks”, et al 2019
- “Rethinking Floating Point for Deep Learning”, 2018
- “Learning Recurrent Binary/Ternary Weights”, et al 2018
- “Rethinking Numerical Representations for Deep Neural Networks”, et al 2018
- “Highly Scalable Deep Learning Training System With Mixed-Precision: Training ImageNet in 4 Minutes”, et al 2018
- “Quantization Mimic: Towards Very Tiny CNN for Object Detection”, et al 2018
- “Training Imagenet in 3 Hours for $25; and CIFAR-10 for $0.26”, 2018
- “High-Accuracy Low-Precision Training”, et al 2018
- “Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight”, 2018
- “Universal Deep Neural Network Compression”, et al 2018
- “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training”, et al 2017
- “Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions”, et al 2017
- “Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method”, et al 2017
- “Compressing Word Embeddings via Deep Compositional Code Learning”, 2017
- “Learning Discrete Weights Using the Local Reparameterization Trick”, et al 2017
- “TensorQuant—A Simulation Toolbox for Deep Neural Network Quantization”, et al 2017
- “Mixed Precision Training”, et al 2017
- “BitNet: Bit-Regularized Deep Neural Networks”, et al 2017
- “Beating Floating Point at Its Own Game: Posit Arithmetic”, 2017
- “Bolt: Accelerated Data Mining With Fast Vector Compression”, 2017
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, et al 2016
- “Ternary Neural Networks for Resource-Efficient AI Applications”, et al 2016
- “Deep Neural Networks Are Robust to Weight Binarization and Other Non-Linear Distortions”, et al 2016
- “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, et al 2016
- “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks”, et al 2016
- “Binarized Neural Networks: Training Deep Neural Networks With Weights and Activations Constrained to +1 or −1”, et al 2016
- “BinaryConnect: Training Deep Neural Networks With Binary Weights during Propagations”, et al 2015
- “Efficient Supervised Learning in Networks With Binary Synapses”, et al 2007
- “A Self-Optimizing, Non-Symmetrical Neural Net for Content Addressable Memory and Pattern Recognition”, 1986
- “Binary Vector Embeddings Are so Cool”
- “Building a Vector Database in 2GB for 36 Million Wikipedia Passages”
- “FlashAttention-3: Fast and Accurate Attention With Asynchrony and Low-Precision”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography