‘AI scaling’ directory

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Miscellaneous

Bibliography

https://arxiv.org/abs/2503.17074: “Emuru: Zero-Shot Styled Text Image Generation, but Make It Autoregressive ”⁠, Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli …, Alessio Tonioni, Rita Cucchiara
link-bibliography⁠
https://arxiv.org/abs/2502.09992: “LLaDA: Large Language Diffusion Models ”⁠, Shen Nie, Fengqi Zhu, Zebin You …, Xiaolu Zhang, Jingyang Ou, Jun Hu⁠, Jun Zhou, Yankai Lin, Ji-Rong Wen, Chongxuan Li
link-bibliography⁠
https://arxiv.org/abs/2501.09038#deepmind: “Do Generative Video Models Learn Physical Principles from Watching Videos? ”⁠, Saman Motamed, Laura Culp, Kevin Swersky …, Priyank Jaini, Robert Geirhos⁠
link-bibliography⁠
https://arxiv.org/abs/2410.18514: “Scaling up Masked Diffusion Models on Text ”⁠, Shen Nie, Fengqi Zhu, Chao Du …, Tianyu Pang, Qian Liu⁠, Guangtao Zeng, Min Lin, Chongxuan Li
link-bibliography⁠
https://research.google/blog/taking-medical-imaging-embeddings-3d/: “CT Foundation: Taking Medical Imaging Embeddings 3D ”⁠, Atilla Kiraly, Madeleine Traverse
link-bibliography⁠
https://arxiv.org/abs/2407.04108: “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs ”⁠, Sara Price, Arjun Panickssery, ⁠Samuel R. Bowman, Asa Cooper Stickland
link-bibliography⁠
https://arxiv.org/abs/2406.13121#google: “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? ”⁠, Jinhyuk Lee, Anthony Chen⁠, Zhuyun Dai …, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu
link-bibliography⁠
https://arxiv.org/abs/2406.11233: “Probing the Decision Boundaries of In-Context Learning in Large Language Models ”⁠, Siyan Zhao, Tung Nguyen, Aditya Grover⁠
link-bibliography⁠
https://www.biorxiv.org/content/10.1101/2024.06.06.597716.full: “Training Compute-Optimal Protein Language Models ”⁠, Xingyi Cheng, Bo Chen, Pan Li …, Jing Gong, Jie Tang⁠, Le Song
link-bibliography⁠
https://arxiv.org/abs/2405.14930: “AstroPT: Scaling Large Observation Models for Astronomy ”⁠, Michael J. Smith, Ryan J. Roberts, Eirini Angeloudi, Marc Huertas-Company
link-bibliography⁠
https://arxiv.org/abs/2405.00332#scale: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”⁠, Hugh Zhang, Jeff Da, Dean Lee …, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue
link-bibliography⁠
https://lab42.global/community-interview-jack-cole/: “Test-Time Augmentation to Solve ARC ”, Jack Cole
link-bibliography⁠
https://arxiv.org/abs/2404.09937: “Compression Represents Intelligence Linearly ”⁠, Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He
link-bibliography⁠
https://arxiv.org/abs/2404.06664: “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge ”⁠, Yu Ying Chiu, Liwei Jiang, Maria Antoniak …, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Yejin Choi⁠
link-bibliography⁠
https://arxiv.org/abs/2404.02905#bytedance: “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction ”⁠, Keyu Tian, Yi Jiang⁠, Zehuan Yuan …, Bingyue Peng, Liwei Wang
link-bibliography⁠
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models ”⁠, Jerry Wei, Chengrun Yang, Xinying Song …, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/2403.17844: “Mechanistic Design and Scaling of Hybrid Architectures ”⁠, Michael Poli, Armin W. Thomas, Eric Nguyen …, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting⁠, Taiji Suzuki, Brian Hie, Stefano Ermon⁠, Christopher Ré⁠, Ce Zhang, Stefano Massaroli
link-bibliography⁠
https://www.wired.com/story/eight-google-employees-invented-modern-ai-transformers-paper/: “8 Google Employees Invented Modern AI. Here’s the Inside Story: They Met by Chance, Got Hooked on an Idea, and Wrote the Transformers Paper—The Most Consequential Tech Breakthrough in Recent History ”⁠, Steven Levy⁠
link-bibliography⁠
https://inflection.ai/inflection-2-5: “Inflection-2.5: Meet the World’s Best Personal AI ”, Inflection
link-bibliography⁠
https://arxiv.org/abs/2402.17152#facebook: “Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU) ”⁠, Jiaqi Zhai, Lucy Liao, Xing Liu …, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi⁠
link-bibliography⁠
https://arxiv.org/abs/2402.16671: “StructLM: Towards Building Generalist Models for Structured Knowledge Grounding ”⁠, Alex Zhuang, Ge Zhang, Tianyu Zheng …, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen
link-bibliography⁠
https://arxiv.org/abs/2312.15770#alibaba: “TF-T2V: A Recipe for Scaling up Text-To-Video Generation With Text-Free Videos ”⁠, Xiang Wang, Shiwei Zhang, Hangjie Yuan …, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang
link-bibliography⁠
https://arxiv.org/abs/2312.04927: “Zoology: Measuring and Improving Recall in Efficient Language Models ”⁠, Simran Arora, Sabri Eyuboglu, Aman Timalsina …, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré⁠
link-bibliography⁠
https://arxiv.org/abs/2312.03876: “Scaling Transformer Neural Networks for Skillful and Reliable Medium-Range Weather Forecasting ”⁠, Tung Nguyen, Rohan Shah, Hritik Bansal …, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Aditya Grover⁠
link-bibliography⁠
https://arxiv.org/abs/2312.00752: “Mamba: Linear-Time Sequence Modeling With Selective State Spaces ”⁠, Albert Gu⁠, ⁠Tri Dao
link-bibliography⁠
https://arxiv.org/abs/2311.15599#tencent: “UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition ”⁠, Xiaohan Ding, Yiyuan Zhang, Yixiao Ge …, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan⁠
link-bibliography⁠
https://arxiv.org/abs/2311.04145#alibaba: “I2VGen-XL: High-Quality Image-To-Video Synthesis via Cascaded Diffusion Models ”⁠, Shiwei Zhang, Jiayu Wang, Yingya Zhang …, Kang Zhao⁠, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou
link-bibliography⁠
https://arxiv.org/abs/2310.16764#deepmind: “ConvNets Match Vision Transformers at Scale ”⁠, Samuel L. Smith⁠, Andrew Brock⁠, Leonard Berrada, Soham De
link-bibliography⁠
https://arxiv.org/abs/2310.09199#google: “PaLI-3 Vision Language Models: Smaller, Faster, Stronger ”⁠, Xi Chen⁠, Xiao Wang⁠, Lucas Beyer⁠ …, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai⁠, Radu Soricut
link-bibliography⁠
https://arxiv.org/abs/2310.06213: “GeoLLM: Extracting Geospatial Knowledge from Large Language Models ”⁠, Rohin Manvi, Samar Khanna, Gengchen Mai …, Marshall Burke, David Lobell⁠, Stefano Ermon⁠
link-bibliography⁠
https://arxiv.org/abs/2310.06694: “Sheared LLaMA: Accelerating Language Model Pre-Training via Structured Pruning ”⁠, Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, Danqi Chen⁠
link-bibliography⁠
https://arxiv.org/abs/2310.03214#google: “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation ”⁠, Tu Vu, Mohit Iyyer, Xuezhi Wang …, Noah Constant⁠, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, ⁠Denny Zhou, Quoc V. Le⁠, Thang Luong
link-bibliography⁠
https://arxiv.org/abs/2310.02980: “Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors ”⁠, Ido Amos, ⁠Jonathan Berant, Ankit Gupta
link-bibliography⁠
https://arxiv.org/abs/2309.00667: “Taken out of Context: On Measuring Situational Awareness in LLMs ”⁠, Lukas Berglund, Asa Cooper Stickland, Mikita Balesni …, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo⁠, ⁠Owain Evans
link-bibliography⁠
https://arxiv.org/abs/2308.11596#facebook: “SeamlessM4T: Massively Multilingual & Multimodal Machine Translation ”⁠, Seamless Communication, Loïc Barrault, Yu-An Chung …, Mariano Cora Meglioli, David Dale⁠, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard⁠, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun⁠, Kevin Tran⁠, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang⁠, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee⁠, Alexandre Mourachko, Juan Pino⁠, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang
link-bibliography⁠
https://arxiv.org/abs/2308.03958#deepmind: “Simple Synthetic Data Reduces Sycophancy in Large Language Models ”⁠, Jerry Wei, Da Huang, Yifeng Lu …, ⁠Denny Zhou, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/2307.05300#microsoft: “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration ”⁠, Zhenhailong Wang, Shaoguang Mao, Wenshan Wu …, Tao Ge, Furu Wei⁠, Heng Ji⁠
link-bibliography⁠
https://openai.com/index/introducing-superalignment/: “Introducing Superalignment ”⁠, ⁠Jan Leike, Ilya Sutskever⁠
link-bibliography⁠
https://www.youtube.com/watch?v=lfXxzAVtdpU&t=1763s: “Gödel, Escher, Bach Author Douglas Hofstadter on the State of AI Today § What about AI Terrifies You? ”⁠, Douglas Hofstadter⁠, Amy Jo Kim
link-bibliography⁠
https://arxiv.org/abs/2306.13575: “Scaling MLPs: A Tale of Inductive Bias ”⁠, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann⁠
link-bibliography⁠
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models ”⁠, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman
link-bibliography⁠
https://arxiv.org/abs/2305.15717: “The False Promise of Imitating Proprietary LLMs ”⁠, Arnav Gudibande, Eric Wallace⁠, Charlie Snell⁠ …, Xinyang Geng, Hao Liu, Pieter Abbeel⁠, Sergey Levine⁠, Dawn Song⁠
link-bibliography⁠
https://arxiv.org/abs/2305.11863: “Scaling Laws for Language Encoding Models in FMRI ”⁠, Richard Antonello, Aditya Vaidya, Alexander G. Huth
link-bibliography⁠
https://www.cnbc.com/2023/05/16/googles-palm-2-uses-nearly-five-times-more-text-data-than-predecessor.html: “Google’s Newest AI Model Uses Nearly 5× More Text Data for Training Than Its Predecessor ”⁠, Jennifer Elias
link-bibliography⁠
https://arxiv.org/abs/2305.07759#microsoft: “TinyStories: How Small Can Language Models Be and Still Speak Coherent English? ”⁠, Ronen Eldan⁠, Yuanzhi Li
link-bibliography⁠
https://arxiv.org/abs/2305.05665#facebook: “ImageBind: One Embedding Space To Bind Them All ”⁠, Rohit Girdhar, Alaaeldin El-Nouby, ⁠Zhuang Liu …, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin⁠, Ishan Misra
link-bibliography⁠
https://www.ft.com/content/f4f73815-6fc2-4016-bd97-4bace459e95e: “Google’s DeepMind-Brain Merger: Tech Giant Regroups for AI Battle ”⁠, Madhumita Murgia
link-bibliography⁠
https://arxiv.org/abs/2304.07193#facebook: “DINOv2: Learning Robust Visual Features without Supervision ”⁠, Maxime Oquab, Timothée Darcet, Théo Moutakanni …, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin⁠, Piotr Bojanowski
link-bibliography⁠
https://arxiv.org/abs/2303.15343#google: “Sigmoid Loss for Language Image Pre-Training ”⁠, Xiaohua Zhai⁠, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer⁠
link-bibliography⁠
https://arxiv.org/abs/2304.02015#alibaba: “How Well Do Large Language Models Perform in Arithmetic Tasks? ”⁠, Zheng Yuan, Hongyi Yuan, Chuanqi Tan …, Wei Wang, Songfang Huang
link-bibliography⁠
https://jameswphillips.substack.com/p/securing-liberal-democratic-control: “Securing Liberal Democratic Control of AGI through UK Leadership ”⁠, James W. Phillips
link-bibliography⁠
https://arxiv.org/abs/2303.05511#adobe: “GigaGAN: Scaling up GANs for Text-To-Image Synthesis ”⁠, Minguk Kang, Jun-Yan Zhu, Richard Zhang …, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park
link-bibliography⁠
https://arxiv.org/abs/2302.05442#google: “Scaling Vision Transformers to 22 Billion Parameters ”⁠, Mostafa Dehghani, Josip Djolonga, Basil Mustafa …, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos⁠, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer⁠, ⁠Michael Tschannen, Anurag Arnab, Xiao Wang⁠, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, ⁠Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai⁠, Daniel Keysers, Jeremiah Harmsen, ⁠Neil Houlsby
link-bibliography⁠
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards ”⁠, John Nay
link-bibliography⁠
https://arxiv.org/abs/2301.09515#nvidia: “StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-To-Image Synthesis ”⁠, Axel Sauer, Tero Karras⁠, ⁠Samuli Laine …, Andreas Geiger, Timo Aila⁠
link-bibliography⁠
https://arxiv.org/abs/2301.07088#bytedance: “MUG: Vision Learners Meet Web Image-Text Pairs ”⁠, Bingchen Zhao, Quan Cui, Hao Wu …, Osamu Yoshie, Cheng Yang
link-bibliography⁠
https://arxiv.org/abs/2301.04408: “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities ”⁠, Jillian Bommarito, Michael Bommarito, Daniel Martin Katz, Jessica Katz
link-bibliography⁠
https://arxiv.org/abs/2301.03728#facebook: “Scaling Laws for Generative Mixed-Modal Language Models ”⁠, Armen Aghajanyan, Lili Yu, Alexis Conneau …, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy⁠, Luke Zettlemoyer⁠
link-bibliography⁠
https://arxiv.org/abs/2301.02111#microsoft: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers ”⁠, Chengyi Wang, Sanyuan Chen, Yu Wu …, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li⁠, Lei He, Sheng Zhao, Furu Wei⁠
link-bibliography⁠
https://arxiv.org/abs/2212.14402: “GPT-3 Takes the Bar Exam ”⁠, Michael Bommarito II, Daniel Martin Katz
link-bibliography⁠
https://arxiv.org/abs/2212.09741: “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR) ”⁠, Hongjin Su, Weijia Shi, Jungo Kasai …, ⁠Yizhong Wang, Yushi Hu, Mari Ostendorf⁠, Wen-tau Yih, Noah Smith⁠, Luke Zettlemoyer⁠, Tao Yu
link-bibliography⁠
https://arxiv.org/abs/2212.07143: “Reproducible Scaling Laws for Contrastive Language-Image Learning ”⁠, Mehdi Cherti, Romain Beaumont, Ross Wightman⁠ …, ⁠Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt⁠, Jenia Jitsev
link-bibliography⁠
https://arxiv.org/abs/2212.04979#google: “VideoCoCa: Video-Text Modeling With Zero-Shot Transfer from Contrastive Captioners ”⁠, Shen Yan, Tao Zhu⁠, Zirui Wang …, Yuan Cao⁠, Mi Zhang⁠, Soham Ghosh⁠, Yonghui Wu⁠, Jiahui Yu
link-bibliography⁠
https://arxiv.org/abs/2212.05051: “VindLU: A Recipe for Effective Video-And-Language Pretraining ”⁠, Feng Cheng, Xizi Wang, Jie Lei …, David Crandall, ⁠Mohit Bansal, Gedas Bertasius
link-bibliography⁠
https://arxiv.org/abs/2212.04356#openai: “Whisper: Robust Speech Recognition via Large-Scale Weak Supervision ”⁠, Alec Radford⁠, ⁠Jong Wook Kim, Tao Xu …, Greg Brockman⁠, Christine McLeavey, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/2211.09085#facebook: “Galactica: A Large Language Model for Science ”⁠, Ross Taylor⁠, Marcin Kardas, Guillem Cucurull …, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic
link-bibliography⁠
https://arxiv.org/abs/2211.08411: “Large Language Models Struggle to Learn Long-Tail Knowledge ”⁠, Nikhil Kandpal, Haikang Deng, Adam Roberts⁠ …, Eric Wallace⁠, ⁠Colin Raffel
link-bibliography⁠
https://arxiv.org/abs/2211.07636#baai: “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale ”⁠, Yuxin Fang, Wen Wang⁠, Binhui Xie …, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
link-bibliography⁠
https://arxiv.org/abs/2211.00241: “Adversarial Policies Beat Superhuman Go AIs ”⁠, Tony T. Wang, Adam Gleave, Tom Tseng …, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine⁠, Stuart Russell
link-bibliography⁠
https://www.youtube.com/watch?v=Q-TJFyUoenc&t=2444s: “Increments Podcast: #45—4 Central Fallacies of AI Research (With Melanie Mitchell) ”⁠, Melanie Mitchell⁠, Benny Chugg
link-bibliography⁠
https://arxiv.org/abs/2210.16859: “A Solvable Model of Neural Scaling Laws ”⁠, Alexander Maloney, Daniel A. Roberts, James Sully
link-bibliography⁠
https://arxiv.org/abs/2210.13673#nvidia: “Evaluating Parameter Efficient Learning for Generation ”⁠, Peng Xu, Mostofa Patwary, Shrimai Prabhumoye …, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro⁠
link-bibliography⁠
https://arxiv.org/abs/2210.11416#google: “FLAN: Scaling Instruction-Finetuned Language Models ”⁠, Hyung Won Chung, Le Hou, Shayne Longpre …, ⁠Barret Zoph, ⁠Yi Tay, William Fedus⁠, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu⁠, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi⁠, Jeff Dean⁠, Jacob Devlin, Adam Roberts⁠, ⁠Denny Zhou, Quoc V. Le⁠, Jason Wei
link-bibliography⁠
https://arxiv.org/abs/2210.10341#microsoft: “BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining ”⁠, Renqian Luo, Liai Sun, Yingce Xia …, Tao Qin⁠, Sheng Zhang, Hoifung Poon, Tie-Yan Liu⁠
link-bibliography⁠
https://arxiv.org/abs/2210.06423#microsoft: “Foundation Transformers ”⁠, Hongyu Wang, Shuming Ma, Shaohan Huang …, Li Dong⁠, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei⁠
link-bibliography⁠
https://arxiv.org/abs/2210.03350#allen: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle) ”⁠, Ofir Press, Muru Zhang, Sewon Min …, Ludwig Schmidt⁠, ⁠Noah A. Smith, Mike Lewis⁠
link-bibliography⁠
https://arxiv.org/abs/2210.02414#baai: “GLM-130B: An Open Bilingual Pre-Trained Model ”⁠, Aohan Zeng, Xiao Liu, Zhengxiao Du …, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu⁠, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang⁠
link-bibliography⁠
https://arxiv.org/abs/2210.02441: “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models ”⁠, Simran Arora, Avanika Narayan, Mayee F. Chen …, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré⁠
link-bibliography⁠
https://arxiv.org/abs/2208.05516: “Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP ”⁠, Thao Nguyen, Gabriel Ilharco, ⁠Mitchell Wortsman …, Sewoong Oh, Ludwig Schmidt⁠
link-bibliography⁠
https://arxiv.org/abs/2207.06991: “PIXEL: Language Modeling With Pixels ”⁠, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello …, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott⁠
link-bibliography⁠
https://arxiv.org/abs/2207.05221#anthropic: “Language Models (Mostly) Know What They Know ”⁠, Saurav Kadavath⁠, Tom Conerly, ⁠Amanda Askell …, Tom Henighan, Dawn Drain, ⁠Ethan Perez, Nicholas Schiefer, Zac Hatfield Dodds, Nova DasSarma, Eli Tran-Johnson, Scott Johnston⁠, Sheer El-Showk, ⁠Andy L. Jones, ⁠Nelson Elhage, Tristan Hume, Anna Chen, Yuntao Bai⁠, ⁠Samuel R. Bowman, Stanislav Fort, ⁠Deep Ganguli, Danny Hernandez⁠, Josh Jacobson, ⁠Jackson Kernion, Shauna Kravec, Liane Lovitt, Kamal Ndousse, Catherine Olsson, Sam Ringer, Dario Amodei⁠, Tom B. Brown⁠, ⁠Jack Clark⁠, Nicholas Joseph, Ben Mann, Sam McCandlish⁠, Chris Olah, Jared Kaplan
link-bibliography⁠
https://arxiv.org/abs/2206.15472: “On-Device Training Under 256KB Memory ”⁠, Ji Lin, Ligeng Zhu, Wei-Ming Chen …, Wei-Chen Wang, Chuang Gan, Song Han
link-bibliography⁠
https://arxiv.org/abs/2206.04658#nvidia: “BigVGAN: A Universal Neural Vocoder With Large-Scale Training ”⁠, Sang-gil Lee, Wei Ping, Boris Ginsburg …, Bryan Catanzaro⁠, Sungroh Yoon
link-bibliography⁠
https://arxiv.org/abs/2206.01685: “Toward a Realistic Model of Speech Processing in the Brain With Self-Supervised Learning ”⁠, Juliette Millet, Charlotte Caucheteux, Pierre Orhan …, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King
link-bibliography⁠
https://arxiv.org/abs/2205.14204#google: “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations ”⁠, Xinyang Geng, Hao Liu, Lisa Lee⁠ …, Dale Schuurams, Sergey Levine⁠, Pieter Abbeel⁠
link-bibliography⁠
https://arxiv.org/abs/2205.10625#google: “Least-To-Most Prompting Enables Complex Reasoning in Large Language Models ”⁠, ⁠Denny Zhou, Nathanael Schärli, Le Hou …, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc V. Le⁠, Ed Chi⁠
link-bibliography⁠
https://arxiv.org/abs/2205.09073#google: “Dialog Inpainting: Turning Documents into Dialogues ”⁠, Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao …, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu
link-bibliography⁠
https://arxiv.org/abs/2205.05131#google: “UL2: Unifying Language Learning Paradigms ”⁠, ⁠Yi Tay, Mostafa Dehghani, Vinh Q. Tran …, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, ⁠Neil Houlsby, Donald Metzler
link-bibliography⁠
https://arxiv.org/abs/2205.03983#google: “Building Machine Translation Systems for the Next Thousand Languages ”⁠, Ankur Bapna, Isaac Caswell, Julia Kreutzer …, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao⁠, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu⁠, Macduff Hughes
link-bibliography⁠
https://arxiv.org/abs/2205.04596#google: “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet ”⁠, Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes …, Sara Fridovich-Keil, Rebecca Roelofs
link-bibliography⁠
https://arxiv.org/abs/2205.01917#google: “CoCa: Contrastive Captioners Are Image-Text Foundation Models ”⁠, Jiahui Yu, Zirui Wang, Vijay Vasudevan …, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu⁠
link-bibliography⁠
https://arxiv.org/abs/2205.01397: “Data Determines Distributional Robustness in Contrastive Language Image Pre-Training (CLIP) ”⁠, Alex Fang, Gabriel Ilharco, ⁠Mitchell Wortsman …, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt⁠
link-bibliography⁠
https://arxiv.org/abs/2204.14198#deepmind: “Flamingo: a Visual Language Model for Few-Shot Learning ”⁠, Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc …, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock⁠, Aida Nematzadeh, Sah, Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals⁠, Andrew Zisserman⁠, Karen Simonyan⁠
link-bibliography⁠
https://arxiv.org/abs/2204.10149: “WebFace260M: A Benchmark for Million-Scale Deep Face Recognition ”⁠, Zheng Zhu⁠, Guan Huang, Jiankang Deng …, Yun Ye, Junjie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie Zhou
link-bibliography⁠
https://www.lesswrong.com/posts/SbAgRYo8tkHwhd9Qx/deepmind-the-podcast-excerpts-on-agi: “DeepMind: The Podcast—Excerpts on AGI ”⁠, William Kiely
link-bibliography⁠
https://arxiv.org/abs/2203.15556#deepmind: “Chinchilla: Training Compute-Optimal Large Language Models ”⁠, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch …, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan⁠, Erich Elsen, Jack W. Rae, Oriol Vinyals⁠, Laurent Sifre⁠
link-bibliography⁠
https://arxiv.org/abs/2203.11171#google: “Self-Consistency Improves Chain-Of-Thought Reasoning in Language Models ”⁠, Xuezhi Wang, Jason Wei, Dale Schuurmans …, Quoc V. Le⁠, Ed Chi⁠, ⁠Denny Zhou
link-bibliography⁠
https://arxiv.org/abs/2203.03466#microsoft: “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer ”⁠, Greg Yang, Edward J. Hu, Igor Babuschkin …, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, ⁠Jianfeng Gao⁠
link-bibliography⁠
https://arxiv.org/abs/2203.00854: “FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours ”⁠, Shenggan Cheng, Ruidong Wu, Zhongming Yu …, Binrui Li, Xiwen Zhang, Jian Peng, Yang You⁠
link-bibliography⁠
https://arxiv.org/abs/2202.12211#google: “Self-Distilled StyleGAN: Towards Generation from Internet Photos ”⁠, Ron Mokady, Michal Yarom, Omer Tov …, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani⁠, Inbar Mosseri
link-bibliography⁠
https://www.nature.com/articles/s42003-022-03036-1: “Brains and Algorithms Partially Converge in Natural Language Processing ”⁠, Charlotte Caucheteux, Jean-Rémi King
link-bibliography⁠
https://arxiv.org/abs/2202.06767#huawei: “Wukong: 100 Million Large-Scale Chinese Cross-Modal Pre-Training Dataset and A Foundation Framework ”⁠, Jiaxi Gu, Xiaojun Meng, Guansong Lu …, Lu Hou⁠, Minzhe Niu, Hang Xu, Xiaodan Liang, Wei Zhang, Xin Jiang⁠, Chunjing Xu
link-bibliography⁠
https://arxiv.org/abs/2202.03052#alibaba: “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-To-Sequence Learning Framework ”⁠, Peng Wang, An Yang⁠, Rui Men …, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang
link-bibliography⁠
https://arxiv.org/abs/2202.02317#allen: “Webly Supervised Concept Expansion for General Purpose Vision Models ”⁠, Amita Kamath, Christopher Clark⁠, Tanmay Gupta …, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi
link-bibliography⁠
https://arxiv.org/abs/2202.00273: “StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets ”⁠, Axel Sauer, Katja Schwarz, Andreas Geiger
link-bibliography⁠
https://arxiv.org/abs/2201.11990#microsoftnvidia: “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model ”⁠, Shaden Smith, Mostofa Patwary, Brandon Norick …, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, ⁠Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary⁠, Bryan Catanzaro⁠
link-bibliography⁠
https://arxiv.org/abs/2201.11473#microsoft: “Reasoning Like Program Executors ”⁠, Xinyu Pi, Qian Liu⁠, Bei Chen …, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen
link-bibliography⁠
https://arxiv.org/abs/2201.10005#openai: “Text and Code Embeddings by Contrastive Pre-Training ”⁠, Arvind Neelakantan, Tao Xu, Raul Puri …, Alec Radford⁠, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, ⁠Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger⁠, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson⁠, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder⁠, ⁠Lilian Weng
link-bibliography⁠
https://arxiv.org/abs/2201.08371#facebook: “SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models ”⁠, Mannat Singh, Laura Gustafson, Aaron Adcock …, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick⁠, Piotr Dollár, ⁠Laurens van der Maaten
link-bibliography⁠
https://arxiv.org/abs/2201.07520#facebook: “CM3: A Causal Masked Multimodal Model of the Internet ”⁠, Armen Aghajanyan, Bernie Huang, Candace Ross …, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis⁠, Luke Zettlemoyer⁠
link-bibliography⁠
https://arxiv.org/abs/2201.06910: “ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization ”⁠, Hanwei Xu, Yujun Chen, Yulun Du …, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang⁠
link-bibliography⁠
https://arxiv.org/abs/2201.03545#facebook: “ConvNeXt: A ConvNet for the 2020s ”⁠, ⁠Zhuang Liu, Hanzi Mao, Chao-Yuan Wu …, Christoph Feichtenhofer, Trevor Darrell⁠, Saining Xie
link-bibliography⁠
https://royalsocietypublishing.org/doi/10.1098/rstb.2020.0529: “The Evolution of Quantitative Sensitivity ”⁠, Margaret A. H. Bryer, Sarah E. Koopman, Jessica F. Cantlon⁠ …, ⁠Steven T. Piantadosi, Evan L. MacLean, Joseph M. Baker⁠, Michael J. Beran, Sarah M. Jones, Kerry E. Jordan, Salif Mahamane, Andreas Nieder, Bonnie M. Perdue, Friederike Range, Jeffrey R. Stevens, Masaki Tomonaga, Dorottya J. Ujfalussy, Jennifer Vonk
link-bibliography⁠
https://arxiv.org/abs/2112.05253: “MAGMA—Multimodal Augmentation of Generative Models through Adapter-Based Finetuning ”⁠, Constantin Eichenberg, Sidney Black, Samuel Weinbach …, Letitia Parcalabescu, Anette Frank
link-bibliography⁠
https://arxiv.org/abs/2112.04426#deepmind: “Improving Language Models by Retrieving from Trillions of Tokens ”⁠, Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann …, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, ⁠Geoffrey Irving, Oriol Vinyals⁠, Simon Osindero, Karen Simonyan⁠, Jack W. Rae, Erich Elsen, Laurent Sifre⁠
link-bibliography⁠
https://arxiv.org/abs/2111.12233#microsoft: “LEMON: Scaling Up Vision-Language Pre-Training for Image Captioning ”⁠, Xiaowei Hu, Zhe Gan, Jianfeng Wang …, Zhengyuan Yang, Zicheng Liu⁠, Yumao Lu, Lijuan Wang
link-bibliography⁠
https://arxiv.org/abs/2111.12763#google: “Sparse Is Enough in Scaling Transformers ”⁠, Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin …, Łukasz Kaiser⁠, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva
link-bibliography⁠
https://arxiv.org/abs/2111.11904#microsoft: “Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts? ”⁠, Jialu Zhang, Todd Mytkowicz, Mike Kaufman …, Ruzica Piskac, Shuvendu K. Lahiri
link-bibliography⁠
https://arxiv.org/abs/2111.11133: “L-Verse: Bidirectional Generation Between Image and Text ”⁠, Taehoon Kim, Gwangmo Song, Sihaeng Lee …, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
link-bibliography⁠
https://arxiv.org/abs/2111.11432#microsoft: “Florence: A New Foundation Model for Computer Vision ”⁠, Lu Yuan, Dongdong Chen, Yi-Ling Chen …, Noel Codella, Xiyang Dai, ⁠Jianfeng Gao⁠, Houdong Hu, Xuedong Huang⁠, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu⁠, Yumao Lu, Yu Shi⁠, Lijuan Wang, Jianfeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
link-bibliography⁠
https://arxiv.org/abs/2111.10050#google: “BASIC: Combined Scaling for Open-Vocabulary Image Classification ”⁠, Hieu Pham, Zihang Dai⁠, Golnaz Ghiasi …, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu⁠, Mingxing Tan, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/2111.08267: “Solving Probability and Statistics Problems by Program Synthesis ”⁠, Leonard Tang, Elizabeth Ke, Nikhil Singh …, Nakul Verma⁠, Iddo Drori
link-bibliography⁠
https://arxiv.org/abs/2111.06377#facebook: “MAE: Masked Autoencoders Are Scalable Vision Learners ”⁠, Kaiming He⁠, Xinlei Chen, Saining Xie …, Yanghao Li, Piotr Dollár, Ross Girshick⁠
link-bibliography⁠
https://arxiv.org/abs/2111.02114#laion: “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs ”⁠, Christoph Schuhmann, Richard Vencu, Romain Beaumont …, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki
link-bibliography⁠
https://arxiv.org/abs/2110.14168#openai: “Training Verifiers to Solve Math Word Problems ”⁠, Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian …, ⁠Jacob Hilton, Reiichiro Nakano, Christopher Hesse, ⁠John Schulman
link-bibliography⁠
https://arxiv.org/abs/2110.11526#deepmind: “Wide Neural Networks Forget Less Catastrophically ”⁠, Seyed Iman Mirzadeh, Arslan Chaudhry, Dong Yin …, Huiyi Hu, ⁠Razvan Pascanu⁠, Dilan Gorur, Mehrdad Farajtabar
link-bibliography⁠
https://arxiv.org/abs/2110.02095#google: “Exploring the Limits of Large Scale Pre-Training ”⁠, Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi
link-bibliography⁠
https://arxiv.org/abs/2109.10686#google: “Scale Efficiently: Insights from Pre-Training and Fine-Tuning Transformers ”⁠, ⁠Yi Tay, Mostafa Dehghani, Jinfeng Rao …, William Fedus⁠, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani⁠, Donald Metzler
link-bibliography⁠
https://arxiv.org/abs/2109.07958: “TruthfulQA: Measuring How Models Mimic Human Falsehoods ”⁠, Stephanie Lin⁠, ⁠Jacob Hilton, ⁠Owain Evans
link-bibliography⁠
https://arxiv.org/abs/2109.02593#allen: “General-Purpose Question-Answering With Macaw ”⁠, Oyvind Tafjord, Peter Clark
link-bibliography⁠
https://arxiv.org/abs/2108.13002#microsoft: “A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP ”⁠, Yucheng Zhao, Guangting Wang, Chuanxin Tang …, Chong Luo, Wenjun Zeng⁠, Zheng-Jun Zha
link-bibliography⁠
https://arxiv.org/abs/2108.08810#google: “Do Vision Transformers See Like Convolutional Neural Networks? ”⁠, Maithra Raghu, Thomas Unterthiner, Simon Kornblith …, Chiyuan Zhang, Alexey Dosovitskiy
link-bibliography⁠
https://arxiv.org/abs/2108.07686: “Scaling Laws for Deep Learning ”⁠, Jonathan S. Rosenfeld⁠
link-bibliography⁠
https://arxiv.org/abs/2107.02137#baidu: “ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation ”⁠, ⁠Yu Sun, Shuohuan Wang, Shikun Feng …, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun⁠, Wei Liu, Xuan Ouyang, Dianhai Yu, Hao Tian, Hua Wu, Haifeng Wang
link-bibliography⁠
https://arxiv.org/abs/2107.01294#allen: “Scarecrow: A Framework for Scrutinizing Machine Text ”⁠, Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski …, ⁠Noah A. Smith, Yejin Choi⁠
link-bibliography⁠
https://arxiv.org/abs/2106.07411: “Partial Success in Closing the Gap between Human and Machine Vision ”⁠, Robert Geirhos⁠, Kantharaju Narayanappa, Benjamin Mitzkus …, Tizian Thieringer, Matthias Bethge⁠, Felix A. Wichmann, Wiel, Brendel
link-bibliography⁠
https://arxiv.org/abs/2106.04803#google: “CoAtNet: Marrying Convolution and Attention for All Data Sizes ”⁠, Zihang Dai⁠, Hanxiao Liu, Quoc V. Le⁠, Mingxing Tan
link-bibliography⁠
https://arxiv.org/abs/2106.04560#google: “Scaling Vision Transformers ”⁠, Xiaohua Zhai⁠, Alexander Kolesnikov, ⁠Neil Houlsby, Lucas Beyer⁠
link-bibliography⁠
https://arxiv.org/abs/2106.03004#google: “Exploring the Limits of Out-Of-Distribution Detection ”⁠, Stanislav Fort, Jie Ren, Balaji Lakshminarayanan
link-bibliography⁠
https://arxiv.org/abs/2106.00116: “Effect of Pre-Training Scale on Intra/Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images ”⁠, Mehdi Cherti, Jenia Jitsev
link-bibliography⁠
https://arxiv.org/abs/2105.12806: “A Universal Law of Robustness via Isoperimetry ”⁠, Sébastien Bubeck⁠, Mark Sellke
link-bibliography⁠
https://m.koreaherald.com/view.php?ud=20210525000824#naver: “Naver Unveils First ‘Hyperscale’ AI Platform ”, Kang Jae-eun
link-bibliography⁠
https://arxiv.org/abs/2105.11084#facebook: “Unsupervised Speech Recognition ”⁠, Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli
link-bibliography⁠
https://venturebeat.com/ai/google-details-new-ai-accelerator-chips/: “Google Details New AI Accelerator Chips ”⁠, Kyle Wiggers
link-bibliography⁠
https://arxiv.org/abs/2105.01601#google: “MLP-Mixer: An All-MLP Architecture for Vision ”⁠, Ilya Tolstikhin, ⁠Neil Houlsby, Alexander Kolesnikov …, Lucas Beyer⁠, Xiaohua Zhai⁠, Thomas Unterthiner, Jessica Yung, Daniel Keysers, Jakob Uszkoreit⁠, Mario Lucic, Alexey Dosovitskiy
link-bibliography⁠
https://arxiv.org/abs/2105.00572#facebook: “XLM-R XL: Larger-Scale Transformers for Multilingual Masked Language Modeling ”⁠, Naman Goyal, Jingfei Du, Myle Ott …, Giri Anantharaman, Alexis Conneau
link-bibliography⁠
https://arxiv.org/abs/2104.14294#facebook: “DINO: Emerging Properties in Self-Supervised Vision Transformers ”⁠, Mathilde Caron, Hugo Touvron, Ishan Misra …, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin⁠
link-bibliography⁠
abstract: ⁠“Machine Learning Scaling ”⁠, ⁠Gwern⁠
link-bibliography⁠
https://arxiv.org/abs/2104.02133#google: “SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network ”⁠, ⁠William Chan, Daniel Park, Chris Lee …, Yu Zhang, Quoc V. Le⁠, Mohammad Norouzi⁠
link-bibliography⁠
https://arxiv.org/abs/2103.14586#google: “Understanding Robustness of Transformers for Image Classification ”⁠, Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner …, Daliang Li, Thomas Unterthiner, Andreas Veit
link-bibliography⁠
https://arxiv.org/abs/2103.13009#allen: “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark ”⁠, Nicholas Lourie, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi⁠
link-bibliography⁠
https://arxiv.org/abs/2103.10957#deepmind: “Efficient Visual Pretraining With Contrastive Detection ”⁠, Olivier J. Hénaff, Skanda Koppula, Jean-Baptiste Alayrac …, Aaron van den Oord, Oriol Vinyals⁠, João Carreira
link-bibliography⁠
https://arxiv.org/abs/2103.07579#google: “Revisiting ResNets: Improved Training and Scaling Strategies ”⁠, Irwan Bello, William Fedus⁠, Xianzhi Du …, Ekin D. Cubuk, Aravind Srinivas⁠, Tsung-Yi Lin, Jonathon Shlens, ⁠Barret Zoph
link-bibliography⁠
https://ai.meta.com/blog/learning-from-videos-to-understand-the-world/: “Learning from Videos to Understand the World ”⁠, Geoffrey Zweig, Polina Kuznetsova⁠, Michael Auli, Francois Fagan
link-bibliography⁠
https://arxiv.org/abs/2103.06561: “WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training ”⁠, Yuqi Huo, Manli Zhang, Guangzhen Liu …, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li⁠, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, Shizhe Chen, Zhiwu Lu⁠, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen
link-bibliography⁠
https://arxiv.org/abs/2103.01988#facebook: “SEER: Self-Supervised Pretraining of Visual Features in the Wild ”⁠, Priya Goyal, Mathilde Caron, Benjamin Lefaudeux …, Min Xu⁠, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin⁠, Piotr Bojanowski
link-bibliography⁠
https://arxiv.org/abs/2102.09672#openai: “Improved Denoising Diffusion Probabilistic Models ”⁠, Alex Nichol, ⁠Prafulla Dhariwal
link-bibliography⁠
https://arxiv.org/abs/2102.05918#google: “ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision ”⁠, Chao Jia, Yinfei Yang, Ye Xia⁠ …, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le⁠, Yunhsuan Sung, Zhen Li, Tom Duerig
link-bibliography⁠
https://arxiv.org/abs/2102.06171#deepmind: “NFNet: High-Performance Large-Scale Image Recognition Without Normalization ”⁠, Andrew Brock⁠, Soham De, Samuel L. Smith⁠, Karen Simonyan⁠
link-bibliography⁠
https://arxiv.org/abs/2102.02888#microsoft: “1-Bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed ”⁠, Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan …, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He
link-bibliography⁠
https://arxiv.org/abs/2102.01951#scaling&org=deepmind: “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling ”⁠, Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya …, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d’Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom
link-bibliography⁠
https://arxiv.org/abs/2003.10580#google: “Meta Pseudo Labels ”⁠, Hieu Pham, Zihang Dai⁠, Qizhe Xie …, Minh-Thang Luong, Quoc V. Le⁠
link-bibliography⁠
https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf: “CLIP: Learning Transferable Visual Models From Natural Language Supervision ”⁠, Alec Radford⁠, ⁠Jong Wook Kim, Chris Hallacy …, Aditya A. Ramesh⁠, Gabriel Goh⁠, Sandhini Agarwal⁠, Girish Sastry, ⁠Amanda Askell, Pamela Mishkin⁠, ⁠Jack Clark⁠, Gretchen Krueger⁠, Ilya Sutskever⁠
link-bibliography⁠
https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extrapolating-gpt-n-performance: “Extrapolating GPT-N Performance ”⁠, Lukas Finnveden
link-bibliography⁠
https://arxiv.org/abs/2012.00413: “CPM: A Large-Scale Generative Chinese Pre-Trained Language Model ”⁠, Zhengyan Zhang, Xu Han⁠, Hao Zhou⁠ …, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, ⁠Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang⁠, Juanzi Li, Xiaoyan Zhu, Maosong Sun
link-bibliography⁠
https://arxiv.org/abs/2011.10650#openai: “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images ”⁠, ⁠Rewon Child
link-bibliography⁠
https://arxiv.org/abs/2010.14701#openai: “Scaling Laws for Autoregressive Generative Modeling ”⁠, Tom Henighan, Jared Kaplan, Mor Katz …, ⁠Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown⁠, ⁠Prafulla Dhariwal, Scott Gray⁠, Chris Hallacy, Benjamin Mann, Alec Radford⁠, Aditya A. Ramesh⁠, Nick Ryder, Daniel M. Ziegler, ⁠John Schulman, Dario Amodei⁠, Sam McCandlish⁠
link-bibliography⁠
https://arxiv.org/abs/2010.14571#google: “Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus ”⁠, Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna
link-bibliography⁠
https://arxiv.org/abs/2010.10504#google: “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition ”⁠, Yu Zhang, James Qin, Daniel S. Park …, Wei Han⁠, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le⁠, Yonghui Wu⁠
link-bibliography⁠
https://ai.meta.com/blog/introducing-many-to-many-multilingual-machine-translation/: “The First AI Model That Translates 100 Languages without Relying on English Data ”⁠, Angela Fan
link-bibliography⁠
https://arxiv.org/abs/2010.11929#google: “Vision Transformer: An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale ”⁠, Alexey Dosovitskiy, Lucas Beyer⁠, Alexander Kolesnikov …, Dirk Weissenborn, Xiaohua Zhai⁠, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit⁠, ⁠Neil Houlsby
link-bibliography⁠
https://www.openphilanthropy.org/research/new-report-on-how-much-computational-power-it-takes-to-match-the-human-brain/: “New Report on How Much Computational Power It Takes to Match the Human Brain ”⁠, Joseph Carlsmith
link-bibliography⁠
https://arxiv.org/abs/2009.03393#openai: “Generative Language Modeling for Automated Theorem Proving ”⁠, Stanislas Polu, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/2008.09037: “Accuracy and Performance Comparison of Video Action Recognition Approaches ”⁠, Matthew Hutchinson⁠, Siddharth Samsi, William Arcand …, David Bestor, Bill Bergeron, Chansup Byun, Michael Houle, Matthew Hubbell, Michael Jones, Jeremy Kepner, Andrew Kirby, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout⁠, Antonio Rosa, Albert Reuther, Charles Yee, Vijay Gadepally
link-bibliography⁠
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms ”⁠, Adam Scholl
link-bibliography⁠
https://arxiv.org/abs/2008.02217: “Hopfield Networks Is All You Need ”⁠, Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner …, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Milena Pavlović, Geir Kjetil Sandve, Victor Greiff, David Kreil, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter⁠
link-bibliography⁠
https://arxiv.org/abs/2007.06225: “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing ”⁠, Ahmed Elnaggar⁠, Michael Heinzinger, Christian Dallago …, Ghalia Rihawi, Yu Wang, Llion Jones⁠, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger⁠, Debsindhu Bhowmik, Burkhard Rost⁠
link-bibliography⁠
https://arxiv.org/abs/2007.03898#nvidia: “NVAE: A Deep Hierarchical Variational Autoencoder ”⁠, Arash Vahdat, Jan Kautz
link-bibliography⁠
https://arxiv.org/abs/2006.11477#facebook: “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations ”⁠, Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
link-bibliography⁠
https://arxiv.org/abs/2006.10621: “On the Predictability of Pruning Across Scales ”⁠, Jonathan S. Rosenfeld⁠, ⁠Jonathan Frankle, ⁠Michael Carbin⁠, Nir Shavit⁠
link-bibliography⁠
2020-chen-2.pdf#openai: “IGPT: Generative Pretraining from Pixels ”⁠, ⁠Mark Chen, Alec Radford⁠, ⁠Rewon Child …, Jeff Wu, Heewoo Jun, ⁠Prafulla Dhariwal, David Luan, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/2006.09882#facebook: “SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments ”⁠, Mathilde Caron, Ishan Misra, Julien Mairal …, Priya Goyal, Piotr Bojanowski, Armand Joulin⁠
link-bibliography⁠
https://openai.com/index/image-gpt/: “Image GPT (IGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples ”⁠, ⁠Mark Chen, Alec Radford⁠, Ilya Sutskever⁠
link-bibliography⁠
https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/: “ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale ”⁠, DeepSpeed Team
link-bibliography⁠
https://openai.com/research/jukebox: “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples. ”⁠, ⁠Prafulla Dhariwal, Heewoo Jun, Christine Payne⁠ …, ⁠Jong Wook Kim, Alec Radford⁠, Ilya Sutskever⁠
link-bibliography⁠
https://ai.meta.com/blog/state-of-the-art-open-source-chatbot/: “Blender: A State-Of-The-Art Open Source Chatbot ”⁠, Stephen Roller, Jason Weston⁠, Emily Dinan
link-bibliography⁠
https://arxiv.org/abs/2004.08366#google: “DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications ”⁠, Yun Zeng, Siqi Zuo, Dongcai Shen
link-bibliography⁠
https://arxiv.org/abs/2004.07159#alibaba: “PALM: Pre-Training an Autoencoding & Autoregressive Language Model for Context-Conditioned Generation ”⁠, Bin Bi, Chenliang Li, Chen Wu …, Ming Yan⁠, Wei Wang, Songfang Huang, Fei Huang, Luo Si
link-bibliography⁠
https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/: “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism ”⁠, Karen Hao⁠
link-bibliography⁠
https://arxiv.org/abs/2002.05709#google: “A Simple Framework for Contrastive Learning of Visual Representations ”⁠, Ting Chen, Simon Kornblith, Mohammad Norouzi⁠, Geoffrey Hinton⁠
link-bibliography⁠
https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/: “Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft ”⁠, Corby Rosset
link-bibliography⁠
https://research.google/blog/towards-a-conversational-agent-that-can-chat-aboutanything/: “Towards a Conversational Agent That Can Chat About…Anything ”⁠, Daniel Adiwardana, Thang Luong
link-bibliography⁠
https://arxiv.org/abs/2001.08361#openai: “Scaling Laws for Neural Language Models ”⁠, Jared Kaplan, Sam McCandlish⁠, Tom Henighan …, Tom B. Brown⁠, Benjamin Chess, ⁠Rewon Child, Scott Gray⁠, Alec Radford⁠, Jeffrey Wu⁠, Dario Amodei⁠
link-bibliography⁠
https://www.youtube.com/watch?v=kY2NHSKBi10: “The Importance of Deconstruction ”⁠, ⁠Kilian Q. Weinberger
link-bibliography⁠
https://openai.com/research/deep-double-descent: “Deep Double Descent: We Show That the Double Descent Phenomenon Occurs in CNNs, ResNets, and Transformers: Performance First Improves, Then Gets Worse, and Then Improves Again With Increasing Model Size, Data Size, or Training Time ”⁠, Preetum Nakkiran, ⁠Gal Kaplun, Yamini Bansal⁠ …, Tristan Yang⁠, Boaz Barak⁠, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/1911.13299: “What’s Hidden in a Randomly Weighted Neural Network? ”⁠, Vivek Ramanujan, ⁠Mitchell Wortsman, Aniruddha Kembhavi …, Ali Farhadi⁠, Mohammad Rastegari
link-bibliography⁠
https://arxiv.org/abs/1911.05722#facebook: “Momentum Contrast for Unsupervised Visual Representation Learning ”⁠, Kaiming He⁠, Haoqi Fan, Yuxin Wu …, Saining Xie, Ross Girshick⁠
link-bibliography⁠
https://arxiv.org/abs/1911.04252#google: “Self-Training With Noisy Student Improves ImageNet Classification ”⁠, Qizhe Xie, Minh-Thang Luong, Eduard Hovy⁠, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/1911.02116#facebook: “Unsupervised Cross-Lingual Representation Learning at Scale ”⁠, Alexis Conneau, Kartikay Khandelwal, Naman Goyal …, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer⁠, Veselin Stoyanov⁠
link-bibliography⁠
https://arxiv.org/abs/1910.02054#microsoft: “ZeRO: Memory Optimizations Toward Training Trillion Parameter Models ”⁠, Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
link-bibliography⁠
https://arxiv.org/abs/1909.11740: “UNITER: UNiversal Image-TExt Representation Learning ”⁠, Yen-Chun Chen, Linjie Li, Licheng Yu …, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu
link-bibliography⁠
https://arxiv.org/abs/1909.05858#salesforce: “CTRL: A Conditional Transformer Language Model For Controllable Generation ”⁠, Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney …, ⁠Caiming Xiong, Richard Socher
link-bibliography⁠
https://nv-adlr.github.io/MegatronLM: “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism ”⁠, NVID I. A. ADLR
link-bibliography⁠
https://arxiv.org/abs/1907.11692#facebook: “RoBERTa: A Robustly Optimized BERT Pretraining Approach ”⁠, Yinhan Liu, Myle Ott, Naman Goyal …, Jingfei Du, Mandar Joshi, Danqi Chen⁠, Omer Levy⁠, Mike Lewis⁠, Luke Zettlemoyer⁠, Veselin Stoyanov⁠
link-bibliography⁠
https://arxiv.org/abs/1907.07640: “Robustness Properties of Facebook’s ResNeXt WSL Models ”⁠, A. Emin Orhan
link-bibliography⁠
https://arxiv.org/abs/1907.02544: “Large Scale Adversarial Representation Learning ”⁠, Jeff Donahue, Karen Simonyan⁠
link-bibliography⁠
https://arxiv.org/abs/1906.06669: “One Epoch Is All You Need ”⁠, Aran Komatsuzaki
link-bibliography⁠
https://david-abel.github.io/notes/icml_2019.pdf: “ICML 2019 Notes ”⁠, David Abel
link-bibliography⁠
https://arxiv.org/abs/1905.11946#google: “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ”⁠, Mingxing Tan, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/1905.10843: “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm ”⁠, Stefano Spigler, Mario Geiger, Matthieu Wyart
link-bibliography⁠
https://arxiv.org/abs/1905.03197: “UniLM: Unified Language Model Pre-Training for Natural Language Understanding and Generation ”⁠, Li Dong⁠, Nan Yang, Wenhui Wang …, Furu Wei⁠, Xiaodong Liu, Yu Wang, ⁠Jianfeng Gao⁠, Ming Zhou, Hsiao-Wuen Hon⁠
link-bibliography⁠
https://arxiv.org/abs/1905.00546#facebook: “Billion-Scale Semi-Supervised Learning for Image Classification ”⁠, I. Zeki Yalniz, Hervé Jégou, Kan Chen …, Manohar Paluri, Dhruv Mahajan
link-bibliography⁠
http://www.incompleteideas.net/IncIdeas/BitterLesson.html: “The Bitter Lesson ”, Rich Sutton⁠
link-bibliography⁠
https://openai.com/index/better-language-models/: “Better Language Models and Their Implications ”⁠, Alec Radford⁠, Jeffrey Wu⁠, Dario Amodei⁠ …, Daniela Amodei⁠, ⁠Jack Clark⁠, ⁠Miles Brundage, Ilya Sutskever⁠
link-bibliography⁠
https://melaniemitchell.me/aibook/: “Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified ”, Melanie Mitchell⁠
link-bibliography⁠
https://openai.com/research/how-ai-training-scales: “How AI Training Scales ”⁠, Sam McCandlish⁠, Jared Kaplan, Dario Amodei⁠
link-bibliography⁠
https://slatestarcodex.com/2018/11/26/is-science-slowing-down-2/: “Is Science Slowing Down? ”⁠, ⁠Scott Alexander⁠
link-bibliography⁠
https://arxiv.org/pdf/1809.11096#page=8&org=deepmind: “BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M ”⁠, Andrew Brock⁠, Jeff Donahue, Karen Simonyan⁠
link-bibliography⁠
https://arxiv.org/abs/1808.01097: “CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images ”⁠, Sheng Guo, Weilin Huang, Haozhi Zhang …, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, Dinglong Huang
link-bibliography⁠
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications ”⁠, Alec Radford⁠, Karthik Narasimhan, ⁠Tim Salimans⁠, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/1805.00932#facebook: “Exploring the Limits of Weakly Supervised Pretraining ”⁠, Dhruv Mahajan, Ross Girshick⁠, Vignesh Ramanathan …, Kaiming He⁠, Manohar Paluri, Yixuan Li⁠, Ashwin Bharambe, ⁠Laurens van der Maaten
link-bibliography⁠
https://arxiv.org/abs/1801.06146: “ULMFiT: Universal Language Model Fine-Tuning for Text Classification ”⁠, Jeremy Howard, Sebastian Ruder
link-bibliography⁠
https://arxiv.org/abs/1706.06083: “Towards Deep Learning Models Resistant to Adversarial Attacks ”⁠, ⁠Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt⁠ …, Dimitris Tsipras, Adrian Vladu
link-bibliography⁠
https://arxiv.org/abs/1706.01427#deepmind: “A Simple Neural Network Module for Relational Reasoning ”⁠, Adam Santoro⁠, David Raposo, David G. T. Barrett …, Mateusz Malinowski⁠, ⁠Razvan Pascanu⁠, Peter Battaglia, Timothy Lillicrap⁠
link-bibliography⁠
https://arxiv.org/abs/1705.07750#deepmind: “Quo Vadis, Action Recognition? A New Model I3D and the Kinetics Dataset ”⁠, Joao Carreira, Andrew Zisserman⁠
link-bibliography⁠
https://arxiv.org/abs/1705.05640: “WebVision Challenge: Visual Learning and Understanding With Web Data ”⁠, Wen Li, Limin Wang, Wei Li⁠ …, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc Van Gool
link-bibliography⁠
https://blogs.microsoft.com/ai/microsoft-researchers-win-imagenet-computer-vision-challenge/: “Microsoft Researchers Win ImageNet Computer Vision Challenge ”⁠, Allison Linn
link-bibliography⁠
https://arxiv.org/abs/1511.06789#google: “The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition ”⁠, Jonathan Krause⁠, Benjamin Sapp, Andrew Howard …, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei⁠
link-bibliography⁠
https://arxiv.org/abs/1511.02251#facebook: “Learning Visual Features from Large Weakly Supervised Data ”⁠, Armand Joulin⁠, ⁠Laurens van der Maaten, Allan Jabri, Nicolas Vasilache
link-bibliography⁠
https://openaccess.thecvf.com/content_cvpr_2015/papers/Xiao_Learning_From_Massive_2015_CVPR_paper.pdf#baidu: “Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification ”⁠, Tong Xiao, Tian Xia⁠, Yi Yang …, Chang Huang, Xiaogang Wang⁠
link-bibliography⁠
https://arxiv.org/abs/1402.1869: “On the Number of Linear Regions of Deep Neural Networks ”⁠, Guido Montúfar, ⁠Razvan Pascanu⁠, ⁠Kyunghyun Cho, Yoshua Bengio⁠
link-bibliography⁠
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1097_Paper.pdf: “N-Gram Counts and Language Models from the Common Crawl ”⁠, Christian Buck, Kenneth Heafield, Bas van Ooyen
link-bibliography⁠
https://aclanthology.org/P13-2121.pdf: “Scalable Modified Kneser-Ney Language Model Estimation ”⁠, Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, Philipp Koehn⁠
link-bibliography⁠
2010-mikolov.pdf: “Recurrent Neural Network Based Language Model ”⁠, Tomas Mikolov⁠, Martin Karafiat⁠, Lukas Burget …, Jan Cernocky, Sanjeev Khudanpur
link-bibliography⁠
2010-hameed.pdf: “Understanding Sources of Inefficiency in General-Purpose Chips ”⁠, Rehan Hameed, Wajahat Qadeer, Megan Wachs …, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis⁠, Mark Alan Horowitz⁠
link-bibliography⁠
https://dw2blog.com/2009/11/02/halloween-nightmare-scenario-early-2020s/: “Halloween Nightmare Scenario, Early 2020’s ”, David Wood
link-bibliography⁠
2009-koren.pdf: “Matrix Factorization Techniques for Recommender Systems ”⁠, Yehuda Koren, Robert Bell, Chris Volinsky
link-bibliography⁠
https://web.archive.org/web/20230718144747/https://frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html: “Robot Predictions Evolution ”⁠, Hans Moravec⁠
link-bibliography⁠
2003-perlich.pdf: “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis ”⁠, Claudia Perlich, Foster Provost⁠, Jeffrey S. Simonoff
link-bibliography⁠
http://infolab.stanford.edu/~backrub/google.html: “The Anatomy of a Large-Scale Hypertextual Web Search Engine ”⁠, Sergey Brin⁠, Lawrence Page⁠
link-bibliography⁠
https://paulfchristiano.com/: “Homepage of Paul F. Christiano ”⁠, Paul F. Christiano
link-bibliography⁠