‘low-precision NN’ directory

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2406.11233: “Probing the Decision Boundaries of In-Context Learning in Large Language Models ”⁠, Siyan Zhao, Tung Nguyen, Aditya Grover⁠
link-bibliography⁠
https://arxiv.org/abs/2404.14047: “How Good Are Low-Bit Quantized LLaMA-3 Models? An Empirical Study ”⁠, Wei Huang⁠, Xudong Ma, Haotong Qin …, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno
link-bibliography⁠
https://arxiv.org/abs/2401.14112#microsoft: “FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design ”⁠, Haojun Xia, Zhen Zheng, Xiaoxia Wu …, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song
link-bibliography⁠
https://arxiv.org/abs/2312.16862: “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones ”⁠, Zhengqing Yuan, Zhaoxu Li, Lichao Sun
link-bibliography⁠
https://arxiv.org/abs/2310.16836: “LLM-FP4: 4-Bit Floating-Point Quantized Transformers ”⁠, Shih-yang Liu, Zechun Liu, Xijie Huang …, Pingcheng Dong, Kwang-Ting Cheng
link-bibliography⁠
https://arxiv.org/abs/2305.06946: “Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing ”⁠, David Mallasén, Alberto A. Del Barrio, Manuel Prieto-Matias
link-bibliography⁠
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally ”⁠, nolano.org
link-bibliography⁠
https://arxiv.org/abs/2302.13939: “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks ”⁠, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian
link-bibliography⁠
https://arxiv.org/abs/2211.10438: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models ”⁠, Guangxuan Xiao, Ji Lin, Mickael Seznec …, Julien Demouth, Song Han
link-bibliography⁠
https://arxiv.org/abs/2211.05102#google: “Efficiently Scaling Transformer Inference ”⁠, Reiner Pope, Sholto Douglas⁠, Aakanksha Chowdhery …, Jacob Devlin, James Bradbury⁠, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean⁠
link-bibliography⁠
https://arxiv.org/abs/2210.17323: “GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers ”⁠, Elias Frantar, Saleh Ashkboos, Torsten Hoefler⁠, Dan Alistarh
link-bibliography⁠
https://arxiv.org/abs/2210.02414#baai: “GLM-130B: An Open Bilingual Pre-Trained Model ”⁠, Aohan Zeng, Xiao Liu, Zhengxiao Du …, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu⁠, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang⁠
link-bibliography⁠
https://arxiv.org/abs/2206.15472: “On-Device Training Under 256KB Memory ”⁠, Ji Lin, Ligeng Zhu, Wei-Ming Chen …, Wei-Chen Wang, Chuang Gan, Song Han
link-bibliography⁠
https://arxiv.org/abs/2206.04114#google: “Director: Deep Hierarchical Planning from Pixels ”⁠, Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel⁠
link-bibliography⁠
https://arxiv.org/abs/2206.01859#microsoft: “XTC: Extreme Compression for Pre-Trained Transformers Made Simple and Efficient ”⁠, Xiaoxia Wu, Zhewei Yao, Minjia Zhang …, Conglong Li, Yuxiong He
link-bibliography⁠
https://arxiv.org/abs/2206.01861#microsoft: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers ”⁠, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang …, Xiaoxia Wu, Conglong Li, Yuxiong He
link-bibliography⁠
https://arxiv.org/abs/2205.13147: “Matryoshka Representations for Adaptive Deployment ”⁠, Aditya Kusupati, Gantavya Bhatt, Aniket Rege …, Matthew Wallingford, Aditya Sinha⁠, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade⁠, Prateek Jain⁠, Ali Farhadi⁠
link-bibliography⁠
https://arxiv.org/abs/2202.06009#microsoft: “Maximizing Communication Efficiency for Large-Scale Training via 0/1 Adam ”⁠, Yucheng Lu, Conglong Li, Minjia Zhang …, Christopher De Sa, Yuxiong He
link-bibliography⁠
https://semiengineering.com/is-programmable-overhead-worth-the-cost/: “Is Programmable Overhead Worth The Cost? How Much Do We Pay for a System to Be Programmable? It Depends upon Who You Ask ”, Brian Bailey
link-bibliography⁠
https://arxiv.org/abs/2111.13824: “FQ-ViT: Fully Quantized Vision Transformer without Retraining ”⁠, Yang Lin, Tianyu Zhang, Peiqin Sun …, Zheng Li, Shuchang Zhou
link-bibliography⁠
https://arxiv.org/abs/2111.05754: “Prune Once for All: Sparse Pre-Trained Language Models ”⁠, Ofir Zafrir, Ariel Larey, Guy Boudoukh …, Haihao Shen, Moshe Wasserblat
link-bibliography⁠
https://arxiv.org/abs/2110.02861: “8-Bit Optimizers via Block-Wise Quantization ”⁠, Tim Dettmers, Mike Lewis⁠, Sam Shleifer, Luke Zettlemoyer⁠
link-bibliography⁠
https://arxiv.org/abs/2109.12948: “Understanding and Overcoming the Challenges of Efficient Transformer Quantization ”⁠, Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort
link-bibliography⁠
2021-jouppi.pdf: “Ten Lessons From Three Generations Shaped Google’s TPUv4i ”⁠, Norman P. Jouppi⁠, Doe Hyun Yoon, Matthew Ashcraft …, Mark Gottscho, Thomas B. Jablin, George Kurian⁠, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, David Patterson
link-bibliography⁠
https://arxiv.org/abs/2102.04159: “Deep Residual Learning in Spiking Neural Networks ”⁠, Wei Fang⁠, Zhaofei Yu, Yanqi Chen …, Tiejun Huang, Timothée Masquelier, Yonghong Tian
link-bibliography⁠
https://arxiv.org/abs/2102.02888#microsoft: “1-Bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed ”⁠, Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan …, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He
link-bibliography⁠
https://arxiv.org/abs/2101.03961#google: “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity ”⁠, William Fedus⁠, ⁠Barret Zoph, Noam Shazeer⁠
link-bibliography⁠
https://arxiv.org/abs/2004.07320#facebook: “Training With Quantization Noise for Extreme Model Compression ”⁠, Angela Fan, Pierre Stock, Benjamin Graham⁠ …, Edouard Grave, Remi Gribonval⁠, Herve Jegou, Armand Joulin⁠
link-bibliography⁠
https://arxiv.org/abs/2001.01969: “SWAT: Sparse Weight Activation Training ”⁠, Md Aamir Raihan, Tor⁠ M. Aamodt
link-bibliography⁠
https://arxiv.org/abs/1910.01055#google: “QUARL: Quantized Reinforcement Learning (ActorQ) ”⁠, Maximilian Lam, Sharad Chitlangia, Srivatsan Krishnan …, Zishen Wan, Gabriel Barth-Maron, Aleksandra Faust, Vijay Janapa Reddi
link-bibliography⁠
https://www.fast.ai/2018/04/30/dawnbench-fastai/: “Training ImageNet in 3 Hours for $25; and CIFAR-10 for $0.26 ”⁠, Jeremy Howard
link-bibliography⁠
https://arxiv.org/abs/1802.08530: “Training Wide Residual Networks for Deployment Using a Single Bit for Each Weight ”⁠, Mark D. McDonnell⁠
link-bibliography⁠
https://arxiv.org/abs/1712.01887: “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training ”⁠, Yujun Lin, Song Han, Huizi Mao …, Yu Wang, William J. Dally⁠
link-bibliography⁠
https://arxiv.org/abs/1711.08141: “Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions ”⁠, Bichen Wu, Alvin Wan, Xiangyu Yue …, Peter Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph Gonzalez, Kurt Keutzer⁠
link-bibliography⁠
https://arxiv.org/abs/1603.05279: “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks ”⁠, Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi⁠
link-bibliography⁠