Emp, R, T, OA"Scaling Laws for Autoregressive Generative Modeling", Henighan et al 2020 (arxiv.org)
submitted by gwernTK - announcement
Emp, Theory, R, T, OA"Scaling Laws for Transfer", Hernandez et al 2021 ("We find that pre-training effectively multiplies the fine-tuning dataset size") (arxiv.org)
submitted by gwernTK - announcement
Theory, R, C, G"Explaining Neural Scaling Laws", Bahri et al 2021 (arxiv.org)
submitted by gwernTK
Emp, R, T"MSA Transformer", Rao et al 2021 (tied attention on 4.3TB of protein data) (biorxiv.org)
submitted by gwernTK
Emp, R, C, GScaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision (arxiv.org)
submitted by disc_pl
Emp, R, T, DM"Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers", Hendricks et al 2021 (arxiv.org)
submitted by gwernTK
Hardware, R, MS"1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed", Tang et al 2021 (arxiv.org)
submitted by gwernTK
Hardware, R, T"PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers", He et al 2021 (arxiv.org)
submitted by gwernTK
MD, T, MSDeBERTa-0.9b/1.5b checkpoints released (SuperGLUE score: 89.9%) (github.com)
submitted by gwernTK
Emp, R, T, DM"Pitfalls of Static Language Modelling", Lazaridou et al 2021 (on the need for online learning) (arxiv.org)
submitted by gwernTK
R, T, G"Towards End-to-End In-Image Neural Machine Translation", Mansimov et al 2020 (arxiv.org)
submitted by gwernTK
Forecast, Econ"Why the OECD wants to calculate the AI compute needs of national governments" (venturebeat.com)
submitted by gwernTK
Emp, R, T, FB"Muppet: Massive Multi-task Representations with Pre-Finetuning", Aghajanyan et al 2021 (arxiv.org)
submitted by gwernTK