“‘AI Scaling’ Tag”,2019-08-31 ():
![]()
Bibliography for tag
ai/scaling, most recent first: 21 related tags, 642 annotations, & 93 links (parent).
- See Also
- Gwern
- “What Do You Do After ‘Winning’ an AI Arms Race?”, 2024
- “What Do We Mean by ‘Diminishing Returns’ in Scaling?”, 2024
- “Research Ideas”, 2017
- “Absolute Unit NNs: Regression-Based MLPs for Everything”, 2023
- “GPT-3 Creative Fiction”, 2020
- “GANs Didn’t Fail, They Were Abandoned”, 2022
- “The Scaling Hypothesis”, 2020
- “ML Scaling Subreddit”, 2020
- “WBE and DRL: a Middle Way of Imitation Learning from the Human Brain”, 2018
- “Computer Optimization: Your Computer Is Faster Than You Think”, 2021
- “Fully-Connected Neural Nets”, 2021
- “Machine Learning Scaling”, 2021
- “Technology Forecasting: The Garden of Forking Paths”, 2014
- Links
- “Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?”, et al 2024
- “ABBYY’s Bitter Lesson: How Linguists Lost the Last Battle for NLP”, 2024
- “CT Foundation: Taking Medical Imaging Embeddings 3D”, 2024
- “Inference Scaling for Long-Context Retrieval Augmented Generation”, et al 2024
- “Strategic Insights from Simulation Gaming of AI Race Dynamics”, et al 2024
- “How Feature Learning Can Improve Neural Scaling Laws”, et al 2024
- “Dwarkesh Podcast Progress Update”, 2024
- “Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?”, et al 2024
- “Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process”, et al 2024
- “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, et al 2024
- “Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, et al 2024
- “Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?”, et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, et al 2024
- “How Do Large Language Models Acquire Factual Knowledge During Pretraining?”, et al 2024
- “Explore the Limits of Omni-Modal Pretraining at Scale”, et al 2024
- “Self-Consuming Generative Models With Curated Data Provably Optimize Human Preferences”, et al 2024
- “Beyond Model Collapse: Scaling Up With Synthesized Data Requires Reinforcement”, et al 2024
- “Attention As a Hypernetwork”, et al 2024
- “Training Compute-Optimal Protein Language Models”, et al 2024
- “AI Will Become Mathematicians’ ‘Co-Pilot’: Fields Medalist Terence Tao Explains How Proof Checkers and AI Programs Are Dramatically Changing Mathematics”, 2024
- “The Scaling Law in Stellar Light Curves”, et al 2024
- “AstroPT: Scaling Large Observation Models for Astronomy”, et al 2024
- “XLSTM: Extended Long Short-Term Memory”, et al 2024
- “Position: Understanding LLMs Requires More Than Statistical Generalization”, et al 2024
- “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic”, et al 2024
- “CatLIP: CLIP-Level Visual Recognition Accuracy With 2.7× Faster Pre-Training on Web-Scale Image-Text Data”, et al 2024
- “Test-Time Augmentation to Solve ARC”, 2024
- “Chinchilla Scaling: A Replication Attempt”, et al 2024
- “Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies”, et al 2024
- “Why Do Small Language Models Underperform? Studying Language Model Saturation via the Softmax Bottleneck”, et al 2024
- “Language Imbalance Can Boost Cross-Lingual Generalization”, et al 2024
- “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, et al 2024
- “Conformer-1: Robust ASR via Large-Scale Semi-Supervised Bootstrapping”, et al 2024
- “MiniCPM: Unveiling the Potential of Small Language Models With Scalable Training Strategies”, et al 2024
- “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, et al 2024
- “Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data”, et al 2024
- “Long-Form Factuality in Large Language Models”, et al 2024
- “Mechanistic Design and Scaling of Hybrid Architectures”, et al 2024
- “8 Google Employees Invented Modern AI. Here’s the Inside Story: They Met by Chance, Got Hooked on an Idea, and Wrote the Transformers Paper—The Most Consequential Tech Breakthrough in Recent History”, 2024
- “Inflection-2.5: Meet the World’s Best Personal AI”, 2024
- “Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, et al 2024
- “When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method”, et al 2024
- “Investigating Continual Pretraining in Large Language Models: Insights and Implications”, et al 2024
- “The Era of 1-Bit LLMs: All Large Language Models Are in 1.58 Bits”, et al 2024
- “StructLM: Towards Building Generalist Models for Structured Knowledge Grounding”, et al 2024
- “How to Train Data-Efficient LLMs”, et al 2024
- “Weaver: Foundation Models for Creative Writing”, et al 2024
- “Arrows of Time for Large Language Models”, et al 2024
- “Can AI Assistants Know What They Don’t Know?”, et al 2024
- “I Am a Strange Dataset: Metalinguistic Tests for Language Models”, et al 2024
- “TF-T2V: A Recipe for Scaling up Text-To-Video Generation With Text-Free Videos”, et al 2023
- “Generative Multimodal Models Are In-Context Learners”, et al 2023
- “Zoology: Measuring and Improving Recall in Efficient Language Models”, et al 2023
- “Seamless: Multilingual Expressive and Streaming Speech Translation”, et al 2023
- “Scaling Transformer Neural Networks for Skillful and Reliable Medium-Range Weather Forecasting”, et al 2023
- “Instruction-Tuning Aligns LLMs to the Human Brain”, et al 2023
- “Mamba: Linear-Time Sequence Modeling With Selective State Spaces”, 2023
- “Sequential Modeling Enables Scalable Learning for Large Vision Models”, et al 2023
- “UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition”, et al 2023
- “In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search”, et al 2023
- “First Tragedy, Then Parse: History Repeats Itself in the New Era of Large Language Models”, et al 2023
- “I2VGen-XL: High-Quality Image-To-Video Synthesis via Cascaded Diffusion Models”, et al 2023
- “A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models”, et al 2023
- “Sam Altman Accepts the 2023 Hawking Fellowship Award § Is There Another Breakthrough That’s Needed to Reach AGI?”, 2023
- “ConvNets Match Vision Transformers at Scale”, et al 2023
- “PaLI-3 Vision Language Models: Smaller, Faster, Stronger”, et al 2023
- “GeoLLM: Extracting Geospatial Knowledge from Large Language Models”, et al 2023
- “Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition”, et al 2023
- “Sheared LLaMA: Accelerating Language Model Pre-Training via Structured Pruning”, et al 2023
- “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, et al 2023
- “Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors”, et al 2023
- “MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book”, et al 2023
- “Intriguing Properties of Generative Classifiers”, et al 2023
- “Taken out of Context: On Measuring Situational Awareness in LLMs”, et al 2023
- “SeamlessM4T: Massively Multilingual & Multimodal Machine Translation”, et al 2023
- “Simple Synthetic Data Reduces Sycophancy in Large Language Models”, et al 2023
- “LLaMA-2: Open Foundation and Fine-Tuned Chat Models”, et al 2023
- “Measuring Faithfulness in Chain-Of-Thought Reasoning”, et al 2023
- “Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration”, et al 2023
- “Introducing Superalignment”, 2023
- “Gödel, Escher, Bach Author Douglas Hofstadter on the State of AI Today § What about AI Terrifies You?”, 2023
- “Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression”, et al 2023
- “Beyond Scale: the Diversity Coefficient As a Data Quality Metric Demonstrates LLMs Are Pre-Trained on Formally Diverse Data”, et al 2023
- “Scaling MLPs: A Tale of Inductive Bias”, et al 2023
- “Understanding Social Reasoning in Language Models With Language Models”, et al 2023
- “Image Captioners Are Scalable Vision Learners Too”, et al 2023
- “PaLI-X: On Scaling up a Multilingual Vision and Language Model”, et al 2023
- “The False Promise of Imitating Proprietary LLMs”, et al 2023
- “Scaling Data-Constrained Language Models”, et al 2023
- “Scaling Laws for Language Encoding Models in FMRI”, et al 2023
- “LIMA: Less Is More for Alignment”, et al 2023
- “Google’s Newest AI Model Uses Nearly 5× More Text Data for Training Than Its Predecessor”, 2023
- “TorToise: Better Speech Synthesis through Scaling”, 2023
- “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”, 2023
- “ImageBind: One Embedding Space To Bind Them All”, et al 2023
- “Finding Neurons in a Haystack: Case Studies With Sparse Probing”, et al 2023
- “Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4”, et al 2023
- “Google’s DeepMind-Brain Merger: Tech Giant Regroups for AI Battle”, 2023
- “CLaMP: Contrastive Language-Music Pre-Training for Cross-Modal Symbolic Music Information Retrieval”, et al 2023
- “Emergent and Predictable Memorization in Large Language Models”, et al 2023
- “Power Law Trends in Speedrunning and Machine Learning”, 2023
- “Even The Politicians Thought the Open Letter Made No Sense In The Senate Hearing on AI Today’s Hearing on Ai Covered Ai Regulation and Challenges, and the Infamous Open Letter, Which Nearly Everyone in the Room Thought Was Unwise”, 2023
- “DINOv2: Learning Robust Visual Features without Supervision”, et al 2023
- “Segment Anything”, et al 2023
- “Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure”, Koralus & Wang-2023
- “Sigmoid Loss for Language Image Pre-Training”, et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, et al 2023
- “GPT-4 Technical Report”, OpenAI 2023
- “Securing Liberal Democratic Control of AGI through UK Leadership”, 2023
- “GigaGAN: Scaling up GANs for Text-To-Image Synthesis”, et al 2023
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, et al 2023
- “Why Didn’t DeepMind Build GPT-3?”, 2023
- “Scaling Vision Transformers to 22 Billion Parameters”, et al 2023
- “John Carmack’s ‘Different Path’ to Artificial General Intelligence”, 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023
- “ClimaX: A Foundation Model for Weather and Climate”, et al 2023
- “StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-To-Image Synthesis”, et al 2023
- “MUG: Vision Learners Meet Web Image-Text Pairs”, et al 2023
- “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities”, et al 2023
- “Scaling Laws for Generative Mixed-Modal Language Models”, et al 2023
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, et al 2023
- “GPT-3 Takes the Bar Exam”, II & 2022
- “Cramming: Training a Language Model on a Single GPU in One Day”, 2022
- “Evolutionary-Scale Prediction of Atomic Level Protein Structure With a Language Model”, et al 2022
- “Discovering Language Model Behaviors With Model-Written Evaluations”, et al 2022
- “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, et al 2022
- “Reproducible Scaling Laws for Contrastive Language-Image Learning”, et al 2022
- “ERNIE-Code: Beyond English-Centric Cross-Lingual Pretraining for Programming Languages”, et al 2022
- “VideoCoCa: Video-Text Modeling With Zero-Shot Transfer from Contrastive Captioners”, et al 2022
- “VindLU: A Recipe for Effective Video-And-Language Pretraining”, et al 2022
- “Whisper: Robust Speech Recognition via Large-Scale Weak Supervision”, et al 2022
- “Scaling Language-Image Pre-Training via Masking”, et al 2022
- “MultiRay: Optimizing Efficiency for Large-Scale AI Models”, et al 2022
- “Galactica: A Large Language Model for Science”, et al 2022
- “Large Language Models Struggle to Learn Long-Tail Knowledge”, et al 2022
- “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, et al 2022
- “MMDialog: A Large-Scale Multi-Turn Dialogue Dataset Towards Multi-Modal Open-Domain Conversation”, et al 2022
- “Adversarial Policies Beat Superhuman Go AIs”, et al 2022
- “Increments Podcast: #45—4 Central Fallacies of AI Research (with Melanie Mitchell)”, 2022
- “A Solvable Model of Neural Scaling Laws”, et al 2022
- “Will We Run out of Data? An Analysis of the Limits of Scaling Datasets in Machine Learning”, et al 2022
- “Evaluating Parameter Efficient Learning for Generation”, et al 2022
- “FLAN: Scaling Instruction-Finetuned Language Models”, et al 2022
- “BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, et al 2022
- “Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends”, et al 2022
- “Foundation Transformers”, et al 2022
- “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, et al 2022
- “GLM-130B: An Open Bilingual Pre-Trained Model”, et al 2022
- “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, et al 2022
- “Do Current Multi-Task Optimization Methods in Deep Learning Even Help?”, et al 2022
- “Monolith: Real Time Recommendation System With Collisionless Embedding Table”, et al 2022
- “Machine Reading, Fast and Slow: When Do Models “Understand” Language?”, et al 2022
- “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, et al 2022
- “Using Large Language Models to Simulate Multiple Humans”, et al 2022
- “Understanding Scaling Laws for Recommendation Models”, et al 2022
- “
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale”, et al 2022- “Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, et al 2022
- “Efficient Training of Language Models to Fill in the Middle”, et al 2022
- “Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?”, et al 2022
- “PIXEL: Language Modeling With Pixels”, et al 2022
- “High-Performing Neural Network Models of Visual Cortex Benefit from High Latent Dimensionality”, 2022
- “Exploring Length Generalization in Large Language Models”, et al 2022
- “Language Models (Mostly) Know What They Know”, et al 2022
- “On-Device Training Under 256KB Memory”, et al 2022
- “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, et al 2022
- “ProGen2: Exploring the Boundaries of Protein Language Models”, et al 2022
- “RST: ReStructured Pre-Training”, 2022
- “Limitations of the NTK for Understanding Generalization in Deep Learning”, et al 2022
- “Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, et al 2022
- “BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, et al 2022
- “An Improved One Millisecond Mobile Backbone”, et al 2022
- “A Neural Corpus Indexer for Document Retrieval”, et al 2022
- “Toward a Realistic Model of Speech Processing in the Brain With Self-Supervised Learning”, et al 2022
- “Teaching Models to Express Their Uncertainty in Words”, et al 2022
- “Why Robust Generalization in Deep Learning Is Difficult: Perspective of Expressive Power”, et al 2022
- “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, et al 2022
- “InstructDial: Improving Zero and Few-Shot Generalization in Dialogue through Instruction Tuning”, et al 2022
- “Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, et al 2022
- “Least-To-Most Prompting Enables Complex Reasoning in Large Language Models”, et al 2022
- “Continual Pre-Training Mitigates Forgetting in Language and Vision”, et al 2022
- “Dialog Inpainting: Turning Documents into Dialogues”, et al 2022
- “Unifying Language Learning Paradigms”, et al 2022
- “Building Machine Translation Systems for the Next Thousand Languages”, et al 2022
- “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, et al 2022
- “CoCa: Contrastive Captioners Are Image-Text Foundation Models”, et al 2022
- “Data Determines Distributional Robustness in Contrastive Language Image Pre-Training (CLIP)”, et al 2022
- “Continual Learning With Foundation Models: An Empirical Study of Latent Replay”, et al 2022
- “Flamingo: a Visual Language Model for Few-Shot Learning”, et al 2022
- “WebFace260M: A Benchmark for Million-Scale Deep Face Recognition”, et al 2022
- “What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, et al 2022
- “DeepMind: The Podcast—Excerpts on AGI”, 2022
- “Can Language Models Learn from Explanations in Context?”, et al 2022
- “Chinchilla: Training Compute-Optimal Large Language Models”, et al 2022
- “A Roadmap for Big Model”, et al 2022
- “A Conversational Paradigm for Program Synthesis”, et al 2022
- “Self-Consistency Improves Chain-Of-Thought Reasoning in Language Models”, et al 2022
- “Effect of Scale on Catastrophic Forgetting in Neural Networks”, et al 2022
- “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, et al 2022
- “FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours”, et al 2022
- “Variational Autoencoders Without the Variation”, et al 2022
- “Performance Reserves in Brain-Imaging-Based Phenotype Prediction”, et al 2022
- “Self-Distilled StyleGAN: Towards Generation from Internet Photos”, et al 2022
- “UnifiedQA-V2: Stronger Generalization via Broader Cross-Format Training”, et al 2022
- “Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision”, et al 2022
- “Brains and Algorithms Partially Converge in Natural Language Processing”, 2022
- “Quantifying Memorization Across Neural Language Models”, et al 2022
- “Wukong: 100 Million Large-Scale Chinese Cross-Modal Pre-Training Dataset and A Foundation Framework”, et al 2022
- “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-To-Sequence Learning Framework”, et al 2022
- “Data Scaling Laws in NMT: The Effect of Noise and Architecture”, et al 2022
- “Webly Supervised Concept Expansion for General Purpose Vision Models”, et al 2022
- “StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets”, et al 2022
- “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, et al 2022
- “Reasoning Like Program Executors”, et al 2022
- “Text and Code Embeddings by Contrastive Pre-Training”, et al 2022
- “LaMDA: Language Models for Dialog Applications”, et al 2022
- “SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, et al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, et al 2022
- “ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization”, et al 2022
- “A High-Dimensional Sphere Spilling out of a High-Dimensional Cube despite Exponentially Many Constraints”, 2022
- “ConvNeXt: A ConvNet for the 2020s”, et al 2022
- “The Defeat of the Winograd Schema Challenge”, et al 2022
- “Robust Self-Supervised Audio-Visual Speech Recognition”, et al 2022
- “AV-HuBERT: Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction”, et al 2022
- “Self-Supervised Learning from 100 Million Medical Images”, et al 2022
- “The Evolution of Quantitative Sensitivity”, et al 2021
- “ERNIE 3.0 Titan: Exploring Larger-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation”, et al 2021
- “XGLM: Few-Shot Learning With Multilingual Language Models”, et al 2021
- “An Empirical Investigation of the Role of Pre-Training in Lifelong Learning”, et al 2021
- “Few-Shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, et al 2021
- “Knowledge-Rich Self-Supervised Entity Linking”, et al 2021
- “You Only Need One Model for Open-Domain Question Answering”, et al 2021
- “EBERT: Epigenomic Language Models Powered by Cerebras”, et al 2021
- “MAGMA—Multimodal Augmentation of Generative Models through Adapter-Based Finetuning”, et al 2021
- “Improving Language Models by Retrieving from Trillions of Tokens”, et al 2021
- “MLP Architectures for Vision-And-Language Modeling: An Empirical Study”, et al 2021
- “LEMON: Scaling Up Vision-Language Pre-Training for Image Captioning”, et al 2021
- “Sparse Is Enough in Scaling Transformers”, et al 2021
- “Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts?”, et al 2021
- “ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, et al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, et al 2021
- “RedCaps: Web-Curated Image-Text Data Created by the People, for the People”, et al 2021
- “Florence: A New Foundation Model for Computer Vision”, et al 2021
- “BASIC: Combined Scaling for Open-Vocabulary Image Classification”, et al 2021
- “Swin Transformer V2: Scaling Up Capacity and Resolution”, et al 2021
- “XLS-R: Self-Supervised Cross-Lingual Speech Representation Learning at Scale”, et al 2021
- “Solving Linear Algebra by Program Synthesis”, 2021
- “Covariate Shift in High-Dimensional Random Feature Regression”, et al 2021
- “Solving Probability and Statistics Problems by Program Synthesis”, et al 2021
- “Few-Shot Self-Rationalization With Natural Language Prompts”, et al 2021
- “INTERN: A New Learning Paradigm Towards General Vision”, et al 2021
- “Scaling Law for Recommendation Models: Towards General-Purpose User Representations”, et al 2021
- “MAE: Masked Autoencoders Are Scalable Vision Learners”, et al 2021
- “Persia: An Open, Hybrid System Scaling Deep Learning-Based Recommenders up to 100 Trillion Parameters”, et al 2021
- “Scaling ASR Improves Zero and Few Shot Learning”, et al 2021
- “Turing-Universal Learners With Optimal Scaling Laws”, 2021
- “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, et al 2021
- “Training Verifiers to Solve Math Word Problems”, et al 2021
- “Wide Neural Networks Forget Less Catastrophically”, et al 2021
- “When in Doubt, Summon the Titans: Efficient Inference With Large Models”, et al 2021
- “The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail”, 2021
- “Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, et al 2021
- “LFPT5: A Unified Framework for Lifelong Few-Shot Language Learning Based on Prompt Tuning of T5”, 2021
- “Scaling Laws for the Few-Shot Adaptation of Pre-Trained Image Classifiers”, et al 2021
- “Unsupervised Neural Machine Translation With Generative Language Models Only”, et al 2021
- “Yuan 1.0: Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning”, et al 2021
- “Universal Paralinguistic Speech Representations Using Self-Supervised Conformers”, et al 2021
- “M6–10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining”, et al 2021
- “A Few More Examples May Be Worth Billions of Parameters”, et al 2021
- “Exploring the Limits of Large Scale Pre-Training”, et al 2021
- “Show Your Work: Scratchpads for Intermediate Computation With Language Models”, et al 2021
- “Mining for Strong Gravitational Lenses With Self-Supervised Learning”, et al 2021
- “Stochastic Training Is Not Necessary for Generalization”, et al 2021
- “Evaluating Machine Accuracy on ImageNet”, et al 2021
- “BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition”, et al 2021
- “Scale Efficiently: Insights from Pre-Training and Fine-Tuning Transformers”, et al 2021
- “Scaling Laws for Neural Machine Translation”, et al 2021
- “What Changes Can Large-Scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-Scale Korean Generative Pretrained Transformers”, et al 2021
- “A Recipe For Arbitrary Text Style Transfer With Large Language Models”, et al 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, et al 2021
- “A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning”, et al 2021
- “General-Purpose Question-Answering With Macaw”, 2021
- “An Empirical Exploration in Quality Filtering of Text Data”, 2021
- “A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP”, et al 2021
- “Want To Reduce Labeling Cost? GPT-3 Can Help”, et al 2021
- “Data and Parameter Scaling Laws for Neural Machine Translation”, et al 2021
- “Do Vision Transformers See Like Convolutional Neural Networks?”, et al 2021
- “Modeling Protein Using Large-Scale Pretrain Language Model”, et al 2021
- “Scaling Laws for Deep Learning”, 2021
- “Billion-Scale Pretraining With Vision Transformers for Multi-Task Visual Representations”, et al 2021
- “Facebook AI WMT21 News Translation Task Submission”, et al 2021
- “EVA: An Open-Domain Chinese Dialogue System With Large-Scale Generative Pre-Training”, et al 2021
- “The History of Speech Recognition to the Year 2030”, 2021
- “The History of Speech Recognition to the Year 2030”, 2021
- “A Field Guide to Federated Optimization”, et al 2021
- “HTLM: Hyper-Text Pre-Training and Prompting of Language Models”, et al 2021
- “Brain-Like Functional Specialization Emerges Spontaneously in Deep Neural Networks”, et al 2021
- “ERNIE 3.0: Large-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation”, et al 2021
- “Scarecrow: A Framework for Scrutinizing Machine Text”, et al 2021
- “The Dimpled Manifold Model of Adversarial Examples in Machine Learning”, et al 2021
- “Revisiting the Calibration of Modern Neural Networks”, et al 2021
- “Partial Success in Closing the Gap between Human and Machine Vision”, et al 2021
- “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units”, et al 2021
- “Scaling Laws for Acoustic Models”, 2021
- “CoAtNet: Marrying Convolution and Attention for All Data Sizes”, et al 2021
- “Scaling Vision Transformers”, et al 2021
- “Exploring the Limits of Out-Of-Distribution Detection”, et al 2021
- “Effect of Pre-Training Scale on Intra/Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images”, 2021
- “A Universal Law of Robustness via Isoperimetry”, 2021
- “Naver Unveils First ‘Hyperscale’ AI Platform”, 2021
- “Unsupervised Speech Recognition”, et al 2021
- “One4all User Representation for Recommender Systems in E-Commerce”, et al 2021
- “RecPipe: Co-Designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance”, et al 2021
- “Google Details New AI Accelerator Chips”, 2021
- “MLP-Mixer: An All-MLP Architecture for Vision”, et al 2021
- “XLM-R XL: Larger-Scale Transformers for Multilingual Masked Language Modeling”, et al 2021
- “Scaling End-To-End Models for Large-Scale Multilingual ASR”, et al 2021
- “DINO: Emerging Properties in Self-Supervised Vision Transformers”, et al 2021
- “What Are Bayesian Neural Network Posteriors Really Like?”, et al 2021
- “[Ali Released PLUG: 27 Billion Parameters, the Largest Pre-Trained Language Model in the Chinese Community]”, 2021
- “The Power of Scale for Parameter-Efficient Prompt Tuning”, et al 2021
- “Revealing Persona Biases in Dialogue Systems”, et al 2021
- “CrossFit: A Few-Shot Learning Challenge for Cross-Task Generalization in NLP”, et al 2021
- “Probing Across Time: What Does RoBERTa Know and When?”, et al 2021
- “Memorization versus Generalization in Pre-Trained Language Models”, et al 2021
- “Large-Scale Self-Supervised and Semi-Supervised Learning for Speech Translation”, et al 2021
- “Scaling Laws for Language Transfer Learning”, 2021
- “Adapting Language Models for Zero-Shot Learning by Meta-Tuning on Dataset and Prompt Collections”, et al 2021
- “SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”, et al 2021
- “Understanding Robustness of Transformers for Image Classification”, et al 2021
- “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, et al 2021
- “Controllable Generation from Pre-Trained Language Models via Inverse Prompting”, et al 2021
- “The Shape of Learning Curves: a Review”, 2021
- “Efficient Visual Pretraining With Contrastive Detection”, et al 2021
- “Revisiting ResNets: Improved Training and Scaling Strategies”, et al 2021
- “Learning from Videos to Understand the World”, et al 2021
- “WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training”, et al 2021
- “Fast and Accurate Model Scaling”, et al 2021
- “Pretrained Transformers As Universal Computation Engines”, et al 2021
- “Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, et al 2021
- “Measuring Mathematical Problem Solving With the MATH Dataset”, et al 2021
- “A Law of Robustness for Two-Layers Neural Networks”, et al 2021
- “SEER: Self-Supervised Pretraining of Visual Features in the Wild”, et al 2021
- “M6: A Chinese Multimodal Pretrainer”, et al 2021
- “Zero-Shot Text-To-Image Generation”, et al 2021
- “Improved Denoising Diffusion Probabilistic Models”, 2021
- “Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”, et al 2021
- “A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes”, et al 2021
- “Explaining Neural Scaling Laws”, et al 2021
- “ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, et al 2021
- “NFNet: High-Performance Large-Scale Image Recognition Without Normalization”, et al 2021
- “Learning Curve Theory”, 2021
- “1-Bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, et al 2021
- “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, et al 2021
- “Scaling Laws for Transfer”, et al 2021
- “Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning”, et al 2021
- “Muppet: Massive Multi-Task Representations With Pre-Finetuning”, et al 2021
- “Language Processing in Brains and Deep Neural Networks: Computational Convergence and Its Limits”, 2021
- “Meta Pseudo Labels”, et al 2021
- “CLIP: Learning Transferable Visual Models From Natural Language Supervision”, et al 2021
- “VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation”, et al 2021
- “CDLM: Cross-Document Language Modeling”, et al 2021
- “VinVL: Revisiting Visual Representations in Vision-Language Models”, et al 2021
- “Parameter Count vs Training Dataset Size (1952–692021)”, 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, 2021
- “Extrapolating GPT-N Performance”, 2020
- “Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences”, et al 2020
- “CPM: A Large-Scale Generative Chinese Pre-Trained Language Model”, et al 2020
- “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, 2020
- “When Do You Need Billions of Words of Pretraining Data?”, et al 2020
- “Scaling Laws for Autoregressive Generative Modeling”, et al 2020
- “Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, et al 2020
- “MT5: A Massively Multilingual Pre-Trained Text-To-Text Transformer”, et al 2020
- “Beyond English-Centric Multilingual Machine Translation”, et al 2020
- “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, et al 2020
- “Towards End-To-End In-Image Neural Machine Translation”, et al 2020
- “The First AI Model That Translates 100 Languages without Relying on English Data”, 2020
- “WinoGrande: An Adversarial Winograd Schema Challenge at Scale”, et al 2020
- “The Deep Bootstrap Framework: Good Online Learners Are Good Offline Generalizers”, et al 2020
- “Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)”, et al 2020
- “The Neural Architecture of Language: Integrative Reverse-Engineering Converges on a Model for Predictive Processing”, et al 2020
- “Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, et al 2020
- “Fast Stencil-Code Computation on a Wafer-Scale Processor”, et al 2020
- “Vision Transformer: An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale”, et al 2020
- “Small Data, Big Decisions: Model Selection in the Small-Data Regime”, et al 2020
- “New Report on How Much Computational Power It Takes to Match the Human Brain”, 2020
- “Generative Language Modeling for Automated Theorem Proving”, 2020
- “GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce”, et al 2020
- “Accuracy and Performance Comparison of Video Action Recognition Approaches”, et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, et al 2020
- “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
- “Self-Supervised Learning through the Eyes of a Child”, et al 2020
- “On Robustness and Transferability of Convolutional Neural Networks”, et al 2020
- “Hopfield Networks Is All You Need”, et al 2020
- “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, et al 2020
- “NVAE: A Deep Hierarchical Variational Autoencoder”, 2020
- “Measuring Robustness to Natural Distribution Shifts in Image Classification”, et al 2020
- “Is SGD a Bayesian Sampler? Well, Almost”, et al 2020
- “Unsupervised Cross-Lingual Representation Learning for Speech Recognition”, et al 2020
- “Logarithmic Pruning Is All You Need”, et al 2020
- “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations”, et al 2020
- “Denoising Diffusion Probabilistic Models”, et al 2020
- “On the Predictability of Pruning Across Scales”, et al 2020
- “IGPT: Generative Pretraining from Pixels”, et al 2020
- “SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments”, et al 2020
- “SimCLRv2: Big Self-Supervised Models Are Strong Semi-Supervised Learners”, et al 2020
- “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, et al 2020
- “Are We Done With ImageNet?”, et al 2020
- “OpenAI API”, et al 2020
- “Object Segmentation Without Labels With Large-Scale Generative Models”, et al 2020
- “How Big Should My Language Model Be?”, 2020
- “GPT-3 Paper § Figure F.1: Four Uncurated Completions from a Context Suggesting the Model Compose a Poem in the Style of Wallace Stevens With the Title ‘Shadows on the Way’”, GPT-3 2020 (page 48)
- “Danny Hernandez on Forecasting and the Drivers of AI Progress”, et al 2020
- “Powered by AI: Advancing Product Understanding and Building New Shopping Experiences”, et al 2020
- “ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale”, 2020
- “Measuring the Algorithmic Efficiency of Neural Networks”, 2020
- “Pushing the Limit of Molecular Dynamics With ab Initio Accuracy to 100 Million Atoms With Machine Learning”, et al 2020
- “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, et al 2020
- “Blender: A State-Of-The-Art Open Source Chatbot”, et al 2020
- “A Review of Winograd Schema Challenge Datasets and Approaches”, et al 2020
- “Scaling Laws from the Data Manifold Dimension”, 2020
- “DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications”, et al 2020
- “PALM: Pre-Training an Autoencoding & Autoregressive Language Model for Context-Conditioned Generation”, et al 2020
- “Deep Learning Training in Facebook Data Centers: Design of Scale-Up and Scale-Out Systems”, et al 2020
- “TTTTTackling WinoGrande Schemas”, et al 2020
- “A Metric Learning Reality Check”, et al 2020
- “Zoom In: An Introduction to Circuits—By Studying the Connections between Neurons, We Can Find Meaningful Algorithms in the Weights of Neural Networks”, et al 2020
- “Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, et al 2020
- “Rethinking Bias-Variance Trade-Off for Generalization of Neural Networks”, et al 2020
- “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, et al 2020
- “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
- “The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence”, 2020
- “A Simple Framework for Contrastive Learning of Visual Representations”, et al 2020
- “How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, et al 2020
- “Turing-NLG: A 17-Billion-Parameter Language Model by Microsoft”, 2020
- “Quasi-Equivalence of Width and Depth of Neural Networks”, et al 2020
- “Impact of ImageNet Model Selection on Domain Adaptation”, 2020
- “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, et al 2020
- “Towards a Conversational Agent That Can Chat About…Anything”, 2020
- “Towards a Human-Like Open-Domain Chatbot”, et al 2020
- “Scaling Laws for Neural Language Models”, et al 2020
- “Scaling Laws for Neural Language Models: Figure 15: Far beyond the Model Sizes We Study Empirically, We Find a Contradiction between Our Equations § Pg17”, 2020 (page 17 org openai)
- “The Importance of Deconstruction”, 2020
- “Big Transfer (BiT): General Visual Representation Learning”, et al 2019
- “12-In-1: Multi-Task Vision and Language Representation Learning”, et al 2019
- “Deep Double Descent: We Show That the Double Descent Phenomenon Occurs in CNNs, ResNets, and Transformers: Performance First Improves, Then Gets Worse, and Then Improves Again With Increasing Model Size, Data Size, or Training Time”, et al 2019
- “Deep Double Descent: Where Bigger Models and More Data Hurt”, et al 2019
- “What’s Hidden in a Randomly Weighted Neural Network?”, et al 2019
- “Understanding the Generalization of ‘Lottery Tickets’ in Neural Networks”, 2019
- “The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design”, 2019
- “Momentum Contrast for Unsupervised Visual Representation Learning”, et al 2019
- “SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning”, et al 2019
- “Self-Training With Noisy Student Improves ImageNet Classification”, et al 2019
- “CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB”, et al 2019
- “CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs”, El- et al 2019
- “XLM-R: State-Of-The-Art Cross-Lingual Understanding through Self-Supervision”, FAIR 2019
- “High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks”, et al 2019
- “Unsupervised Cross-Lingual Representation Learning at Scale”, et al 2019
- “T5: Exploring the Limits of Transfer Learning With a Unified Text-To-Text Transformer”, et al 2019
- “ZeRO: Memory Optimizations Toward Training Trillion Parameter Models”, et al 2019
- “Environmental Drivers of Systematicity and Generalization in a Situated Agent”, et al 2019
- “A Constructive Prediction of the Generalization Error Across Scales”, et al 2019
- “Large-Scale Pretraining for Neural Machine Translation With Tens of Billions of Sentence Pairs”, et al 2019
- “UNITER: UNiversal Image-TExt Representation Learning”, et al 2019
- “Exascale Deep Learning for Scientific Inverse Problems”, et al 2019
- “Simple, Scalable Adaptation for Neural Machine Translation”, et al 2019
- “CTRL: A Conditional Transformer Language Model For Controllable Generation”, et al 2019
- “Show Your Work: Improved Reporting of Experimental Results”, et al 2019
- “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
- “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, et al 2019
- “Robustness Properties of Facebook’s ResNeXt WSL Models”, 2019
- “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges”, et al 2019
- “Large Scale Adversarial Representation Learning”, 2019
- “One Epoch Is All You Need”, 2019
- “Does Learning Require Memorization? A Short Tale about a Long Tail”, 2019
- “Intriguing Properties of Adversarial Training at Scale”, 2019
- “Scaling Autoregressive Video Models”, et al 2019
- “A Mathematical Theory of Semantic Development in Deep Neural Networks”, et al 2019
- “Adversarially Robust Generalization Just Requires More Unlabeled Data”, et al 2019
- “ICML 2019 Notes”, 2019
- “Are Labels Required for Improving Adversarial Robustness?”, et al 2019
- “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, 2019
- “SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”, et al 2019
- “Asymptotic Learning Curves of Kernel Methods: Empirical Data versus Teacher-Student Paradigm”, et al 2019
- “UniLM: Unified Language Model Pre-Training for Natural Language Understanding and Generation”, et al 2019
- “Adversarial Examples Are Not Bugs, They Are Features”, et al 2019
- “Billion-Scale Semi-Supervised Learning for Image Classification”, et al 2019
- “VideoBERT: A Joint Model for Video and Language Representation Learning”, et al 2019
- “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, 2019
- “Surprises in High-Dimensional Ridgeless Least Squares Interpolation”, et al 2019
- “The Bitter Lesson”, 2019
- “GPT-2 As Step Toward General Intelligence”, 2019
- “Deep Learning Hardware: Past, Present, & Future”, 2019
- “Language Models Are Unsupervised Multitask Learners”, et al 2019
- “Better Language Models and Their Implications”, et al 2019
- “Do ImageNet Classifiers Generalize to ImageNet?”, et al 2019
- “Cross-Lingual Language Model Pretraining”, 2019
- “Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified”, 2019
- “High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos”, et al 2019
- “Reconciling Modern Machine Learning Practice and the Bias-Variance Trade-Off”, et al 2018
- “Nocaps: Novel Object Captioning at Scale”, et al 2018
- “On Lazy Training in Differentiable Programming”, et al 2018
- “How AI Training Scales”, et al 2018
- “Is Science Slowing Down?”, 2018
- “Large Scale GAN Training for High Fidelity Natural Image Synthesis”, et al 2018
- “BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M”, et al 2018 (page 8 org deepmind)
- “Measurement Invariance Explains the Universal Law of Generalization for Psychological Perception”, 2018
- “CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, et al 2018
- “Large-Scale Visual Speech Recognition”, et al 2018
- “Troubling Trends in Machine Learning Scholarship”, 2018
- “Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, 2018
- “Neural Scene Representation and Rendering”, et al 2018
- “GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training”, et al 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, et al 2018 (page 5)
- “Do CIFAR-10 Classifiers Generalize to CIFAR-10?”, et al 2018
- “Deep Learning Generalizes Because the Parameter-Function Map Is Biased towards Simple Functions”, Valle- et al 2018
- “Google DeepMind Founder and Leader in Artificial Intelligence Returns to Hamilton”, 2018
- “Exploring the Limits of Weakly Supervised Pretraining”, et al 2018
- “One Big Net For Everything”, 2018
- “Sensitivity and Generalization in Neural Networks: an Empirical Study”, et al 2018
- “ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, 2018
- “GPipe: Easy Scaling With Micro-Batch Pipeline Parallelism § Pg4”, 2018 (page 4 org google)
- “Deep Image Reconstruction from Human Brain Activity”, et al 2017
- “Deep Learning Scaling Is Predictable, Empirically”, et al 2017
- “Are GANs Created Equal? A Large-Scale Study”, et al 2017
- “Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN”, et al 2017
- “Rethinking Generalization Requires Revisiting Old Ideas: Statistical Mechanics Approaches and Complex Learning Behavior”, 2017
- “There’s No Fire Alarm for Artificial General Intelligence”, 2017
- “WebVision Database: Visual Learning and Understanding from Web Data”, et al 2017
- “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”, et al 2017
- “Towards Deep Learning Models Resistant to Adversarial Attacks”, et al 2017
- “Gradient Diversity: a Key Ingredient for Scalable Distributed Learning”, et al 2017
- “Learning to Learn from Noisy Web Videos”, et al 2017
- “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour”, et al 2017
- “A Simple Neural Network Module for Relational Reasoning”, et al 2017
- “Deep Learning Is Robust to Massive Label Noise”, et al 2017
- “Quo Vadis, Action Recognition? A New Model I3D and the Kinetics Dataset”, 2017
- “WebVision Challenge: Visual Learning and Understanding With Web Data”, et al 2017
- “Geometry of Optimization and Implicit Regularization in Deep Learning”, et al 2017
- “On the Impossibility of Supersized Machines”, et al 2017
- “Parallel Multiscale Autoregressive Density Estimation”, et al 2017
- “Universal Representations: The Missing Link between Faces, Text, Planktons, and Cat Breeds”, 2017
- “Estimation of Gap Between Current Language Models and Human Performance”, et al 2017
- “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles”, et al 2016
- “Understanding Deep Learning Requires Rethinking Generalization”, et al 2016
- “Why Does Deep and Cheap Learning Work so Well?”, et al 2016
- “The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context”, et al 2016
- “Residual Networks Behave Like Ensembles of Relatively Shallow Networks”, et al 2016
- “Do Deep Convolutional Nets Really Need to Be Deep and Convolutional?”, et al 2016
- “PlaNet—Photo Geolocation With Convolutional Neural Networks”, et al 2016
- “Exploring the Limits of Language Modeling”, et al 2016
- “The Singularity: A Philosophical Analysis”, 2016
- “Microsoft Researchers Win ImageNet Computer Vision Challenge”, 2015
- “The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition”, et al 2015
- “Net2Net: Accelerating Learning via Knowledge Transfer”, et al 2015
- “Generative Concatenative Nets Jointly Learn to Write and Classify Reviews”, et al 2015
- “Learning Visual Features from Large Weakly Supervised Data”, et al 2015
- “LSUN: Construction of a Large-Scale Image Dataset Using Deep Learning With Humans in the Loop”, et al 2015
- “Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification”, et al 2015
- “The Unreasonable Effectiveness of Recurrent Neural Networks”, 2015
- “LSTM: A Search Space Odyssey”, et al 2015
- “YFCC100M: The New Data in Multimedia Research”, et al 2015
- “Machine Intelligence, Part 1”, 2015
- “Evolution of the Human Brain: From Matter to Mind”, 2015
- “In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning”, et al 2014
- “Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]”, 2014
- “Neural Networks, Manifolds, and Topology”, 2014
- “Computing’s Energy Problem (and What We Can Do about It)”, 2014b
- “N-Gram Counts and Language Models from the Common Crawl”, et al 2014
- “Evolution of the Human Brain: When Bigger Is Better”, 2014
- “One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling”, et al 2013
- “Algorithmic Progress in Six Domains”, 2013
- “Large–Scale Machine Learning Revisited [Slides]”, 2013
- “Intelligence Explosion Microeconomics”, 2013
- “Scalable Modified Kneser-Ney Language Model Estimation”, et al 2013
- “The Remarkable, yet Not Extraordinary, Human Brain As a Scaled-Up Primate Brain and Its Associated Cost”, Herculano-2012
- “Advantages of Artificial Intelligences, Uploads, and Digital Minds”, 2012
- “Recurrent Neural Network Based Language Model”, et al 2010
- “Understanding Sources of Inefficiency in General-Purpose Chips”, et al 2010
- “The Teenies”, 2009
- “Tick, Tock, Tick, Tock… BING”, 2009
- “Halloween Nightmare Scenario, Early 2020’s”, 2009
- “The Unreasonable Effectiveness of Data”, et al 2009
- “Economics Of The Singularity: Stuffed into Skyscrapers by the Billion, Brainy Bugbots Will Be the Knowledge Workers of the Future”, 2008
- “Large Language Models in Machine Translation”, et al 2007
- “The Tradeoffs of Large-Scale Learning”, 2007
- “Cellular Scaling Rules for Primate Brains”, Herculano- et al 2007
- “Robot Predictions Evolution”, 2004
- “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, et al 2003
- “Analytic and Algorithmic Solution of Random Satisfiability Problems”, et al 2002
- “A Bit of Progress in Language Modeling”, 2001
- “Scaling to Very Very Large Corpora for Natural Language Disambiguation”, 2001
- “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes”, 2001
- “A Survey of Methods for Scaling Up Inductive Algorithms”, 1999
- “On The Effect of Data Set Size on Bias And Variance in Classification Learning”, 1999
- “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, 1998
- “The Effects of Training Set Size on Decision Tree Complexity”, 1997
- “Rigorous Learning Curve Bounds from Statistical Mechanics”, et al 1996
- “Scaling up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, 1996
- “Reflections After Refereeing Papers for NIPS”, 1995
- “Building a Large Annotated Corpus of English: The Penn Treebank”, et al 1993
- “Statistical Theory of Learning Curves under Entropic Loss Criterion”, 1993
- “Learning Curves: Asymptotic Values and Rate of Convergence”, et al 1993
- “Exhaustive Learning”, et al 1990
- “Computing With Connections”, 1987
- “Don’t Worry—It Can’t Happen”, 1940
- “Eric Michaud on Neural Quantum Interpretability”
- “Billion-Scale Semi-Supervised Learning for State-Of-The-Art Image and Video Classification”
- “No Physics? No Problem. AI Weather Forecasting Is Already Making Huge Strides.”
- “Report Describes Apple’s ‘Organizational Dysfunction’ and ‘Lack of Ambition’ in AI”
- “StyleGAN-2 512px Trained on Danbooru2019”
- “Blake Bordelon”, 2024
- “Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks”
- “Komodo 8: the Smartphone vs Desktop Challenge”
- “Trading Off Compute in Training and Inference § Pruning”
- “How Can We Make Robotics More like Generative Modeling?”
- “Inverse-Scaling/prize: A Prize for Finding Tasks That Cause Large Language Models to Show Inverse Scaling”
- “Scaling up StyleGAN-2”
- “Semi Supervised Learning”
- “Homepage of Paul F. Christiano”, 2024
- “Statistical Modeling: The Two Cultures”, 2024
- “Jared Kaplan”
- “Safe Superintelligence Inc.”
- “OpenAI Disbands Its Robotics Research Team”
- “The Uneasy Relationship between Deep Learning and (classical) Statistics”
- “Parameter Counts in Machine Learning”
- “Can LLMs Learn from a Single Example?”
- “Deciphering China’s AI Dream”
- “Appendix: More Is Different In Other Domains”
- “Understanding ‘Deep Double Descent’”
- “How Much Compute Was Used to Train DeepMind’s Generally Capable Agents?”
- “Why Neural Networks Generalise, and Why They Are (Kind Of) Bayesian”
- “What’s the Backward-Forward FLOP Ratio for Neural Networks?”
- “Optimality Is the Tiger, and Agents Are Its Teeth”
- “What Next? A Dozen Information-Technology Research Goals: 3. Turing’s Vision of Machine Intelligence”
- “Was Linguistic A.I. Created by Accident?”
- “Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
- “A Universal Law of Robustness”
- “Greg Brockman: OpenAI and AGI”, 2024
- “Season 1 Ep. 22 OpenAI’s Ilya Sutskever: The Man Who Made AI Work”
- “A Law of Robustness and the Importance of Overparameterization in Deep Learning”
- “WELM”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography