“2:4 Sparse Llama: Smaller Models for Efficient GPU Inference”, Eldar Kurtić, Alexandre Marques, Mark Kurtz, Dan Alistarh, Shubhra Pandit2024-11-25 (, , )