“2:4 Sparse Llama: Smaller Models for Efficient GPU Inference”, Eldar Kurtić, Alexandre Marques, Mark Kurtz, Dan Alistarh, Shubhra Pandit2024-11-25 (reduced-precision NNs, NN pruning, GPT)