- See Also
-
Links
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Et Al 2023
- “Scaling Vision Transformers to 22 Billion Parameters”, Et Al 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023
- “ClimaX: A Foundation Model for Weather and Climate”, Et Al 2023
- “StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis”, Et Al 2023
- “MUG: Vision Learners Meet Web Image-Text Pairs”, Et Al 2023
- “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Et Al 2023
- “Scaling Laws for Generative Mixed-Modal Language Models”, Et Al 2023
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Et Al 2023
- “GPT-3 Takes the Bar Exam”, II & 2022
- “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Et Al 2022
- “Reproducible Scaling Laws for Contrastive Language-image Learning”, Et Al 2022
- “ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, Et Al 2022
- “VindLU: A Recipe for Effective Video-and-Language Pretraining”, Et Al 2022
- “Video-Text Modeling With Zero-Shot Transfer from Contrastive Captioners”, Et Al 2022
- “Robust Speech Recognition via Large-Scale Weak Supervision”, Et Al 2022
- “Scaling Language-Image Pre-training via Masking”, Et Al 2022
- “MultiRay: Optimizing Efficiency for Large-scale AI Models”, Et Al 2022
- “Galactica: A Large Language Model for Science”, Et Al 2022
- “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Et Al 2022
- “MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation”, Et Al 2022
- “Will We Run out of Data? An Analysis of the Limits of Scaling Datasets in Machine Learning”, Et Al 2022
- “Evaluating Parameter Efficient Learning for Generation”, Et Al 2022
- “FLAN: Scaling Instruction-Finetuned Language Models”, Et Al 2022
- “Vision-Language Pre-training: Basics, Recent Advances, and Future Trends”, Et Al 2022
- “Foundation Transformers”, Et Al 2022
- “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Et Al 2022
- “GLM-130B: An Open Bilingual Pre-trained Model”, Et Al 2022
- “Do Current Multi-Task Optimization Methods in Deep Learning Even Help?”, Et Al 2022
- “Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Et Al 2022
- “Machine Reading, Fast and Slow: When Do Models”Understand” Language?“, Et Al 2022
- “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Et Al 2022
- “Using Large Language Models to Simulate Multiple Humans”, Et Al 2022
- “Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, Et Al 2022
- “ESMfold: Language Models of Protein Sequences at the Scale of Evolution Enable Accurate Structure Prediction”, Et Al 2022
- “Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?”, Et Al 2022
- “PIXEL: Language Modelling With Pixels”, Et Al 2022
- “High-performing Neural Network Models of Visual Cortex Benefit from High Latent Dimensionality”, 2022
- “Language Models (Mostly) Know What They Know”, Et Al 2022
- “Exploring Length Generalization in Large Language Models”, Et Al 2022
- “On-Device Training Under 256KB Memory”, Et Al 2022
- “Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, Et Al 2022
- “ProGen2: Exploring the Boundaries of Protein Language Models”, Et Al 2022
- “RST: ReStructured Pre-training”, 2022
- “Limitations of the NTK for Understanding Generalization in Deep Learning”, Et Al 2022
- “Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, Et Al 2022
- “BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, Et Al 2022
- “An Improved One Millisecond Mobile Backbone”, Et Al 2022
- “A Neural Corpus Indexer for Document Retrieval”, Et Al 2022
- “Toward a Realistic Model of Speech Processing in the Brain With Self-supervised Learning”, Et Al 2022
- “Teaching Models to Express Their Uncertainty in Words”, Et Al 2022
- “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Et Al 2022
- “Why Robust Generalization in Deep Learning Is Difficult: Perspective of Expressive Power”, Et Al 2022
- “InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning”, Et Al 2022
- “Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Et Al 2022
- “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, Et Al 2022
- “Towards Understanding Grokking: An Effective Theory of Representation Learning”, Et Al 2022
- “Dialog Inpainting: Turning Documents into Dialogues”, Et Al 2022
- “Unifying Language Learning Paradigms”, Et Al 2022
- “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, Et Al 2022
- “Building Machine Translation Systems for the Next Thousand Languages”, Et Al 2022
- “CoCa: Contrastive Captioners Are Image-Text Foundation Models”, Et Al 2022
- “Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”, Et Al 2022
- “Continual Learning With Foundation Models: An Empirical Study of Latent Replay”, Et Al 2022
- “Flamingo: a Visual Language Model for Few-Shot Learning”, Et Al 2022
- “WebFace260M: A Benchmark for Million-Scale Deep Face Recognition”, Et Al 2022
- “What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, Et Al 2022
- “DeepMind: The Podcast—Excerpts on AGI”, 2022
- “Can Language Models Learn from Explanations in Context?”, Et Al 2022
- “Training Compute-Optimal Large Language Models”, Et Al 2022
- “A Roadmap for Big Model”, Et Al 2022
- “A Conversational Paradigm for Program Synthesis”, Et Al 2022
- “Self-Consistency Improves Chain of Thought Reasoning in Language Models”, Et Al 2022
- “Effect of Scale on Catastrophic Forgetting in Neural Networks”, Et Al 2022
- “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, Et Al 2022
- “FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours”, Et Al 2022
- “Variational Autoencoders Without the Variation”, Et Al 2022
- “Performance Reserves in Brain-imaging-based Phenotype Prediction”, Et Al 2022
- “Self-Distilled StyleGAN: Towards Generation from Internet Photos”, Et Al 2022
- “Brains and Algorithms Partially Converge in Natural Language Processing”, 2022
- “Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision”, Et Al 2022
- “Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework”, Et Al 2022
- “Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework”, Et Al 2022
- “Webly Supervised Concept Expansion for General Purpose Vision Models”, Et Al 2022
- “Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Et Al 2022
- “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Et Al 2022
- “Reasoning Like Program Executors”, Et Al 2022
- “Text and Code Embeddings by Contrastive Pre-Training”, Et Al 2022
- “SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, Et Al 2022
- “LaMDA: Language Models for Dialog Applications”, Et Al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, Et Al 2022
- “ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization”, Et Al 2022
- “The Defeat of the Winograd Schema Challenge”, Et Al 2022
- “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]”, Et Al 2022
- “AV-HuBERT: Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction”, Et Al 2022
- “Robust Self-Supervised Audio-Visual Speech Recognition”, Et Al 2022
- “Self-supervised Learning from 100 Million Medical Images”, Et Al 2022
- “The Evolution of Quantitative Sensitivity”, Et Al 2021
- “ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation”, Et Al 2021
- “XGLM: Few-shot Learning With Multilingual Language Models”, Et Al 2021
- “An Empirical Investigation of the Role of Pre-training in Lifelong Learning”, Et Al 2021
- “Knowledge-Rich Self-Supervised Entity Linking”, Et Al 2021
- “Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Et Al 2021
- “EBERT: Epigenomic Language Models Powered by Cerebras”, Et Al 2021
- “You Only Need One Model for Open-domain Question Answering”, Et Al 2021
- “MAGMA—Multimodal Augmentation of Generative Models through Adapter-based Finetuning”, Et Al 2021
- “MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, Et Al 2021
- “Improving Language Models by Retrieving from Trillions of Tokens”, Et Al 2021
- “Sparse Is Enough in Scaling Transformers”, Et Al 2021
- “LEMON: Scaling Up Vision-Language Pre-training for Image Captioning”, Et Al 2021
- “Can Pre-trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts?”, Et Al 2021
- “Florence: A New Foundation Model for Computer Vision”, Et Al 2021
- “RedCaps: Web-curated Image-text Data Created by the People, for the People”, Et Al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, Et Al 2021
- “ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, Et Al 2021
- “BASIC: Combined Scaling for Open-Vocabulary Image Classification”, Et Al 2021
- “Swin Transformer V2: Scaling Up Capacity and Resolution”, Et Al 2021
- “XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale”, Et Al 2021
- “INTERN: A New Learning Paradigm Towards General Vision”, Et Al 2021
- “Few-Shot Self-Rationalization With Natural Language Prompts”, Et Al 2021
- “Solving Probability and Statistics Problems by Program Synthesis”, Et Al 2021
- “Covariate Shift in High-Dimensional Random Feature Regression”, Et Al 2021
- “Solving Linear Algebra by Program Synthesis”, 2021
- “MAE: Masked Autoencoders Are Scalable Vision Learners”, Et Al 2021
- “Scaling ASR Improves Zero and Few Shot Learning”, Et Al 2021
- “Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, Et Al 2021
- “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Et Al 2021
- “Training Verifiers to Solve Math Word Problems”, Et Al 2021
- “When in Doubt, Summon the Titans: Efficient Inference With Large Models”, Et Al 2021
- “The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail”, 2021
- “LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, 2021
- “Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, Et Al 2021
- “Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers”, Et Al 2021
- “Unsupervised Neural Machine Translation With Generative Language Models Only”, Et Al 2021
- “Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, Et Al 2021
- “Universal Paralinguistic Speech Representations Using Self-Supervised Conformers”, Et Al 2021
- “M6–10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining”, Et Al 2021
- “Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Et Al 2021
- “Exploring the Limits of Large Scale Pre-training”, Et Al 2021
- “Learning through Atypical”Phase Transitions” in Overparameterized Neural Networks”, Et Al 2021
- “Mining for Strong Gravitational Lenses With Self-supervised Learning”, Et Al 2021
- “Stochastic Training Is Not Necessary for Generalization”, Et Al 2021
- “Evaluating Machine Accuracy on ImageNet”, Et Al 2021
- “Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, Et Al 2021
- “Scaling Laws for Neural Machine Translation”, Et Al 2021
- “What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Et Al 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Et Al 2021
- “A Recipe For Arbitrary Text Style Transfer With Large Language Models”, Et Al 2021
- “General-Purpose Question-Answering With Macaw”, 2021
- “A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning”, Et Al 2021
- “An Empirical Exploration in Quality Filtering of Text Data”, 2021
- “Data and Parameter Scaling Laws for Neural Machine Translation”, Et Al 2021
- “Want To Reduce Labeling Cost? GPT-3 Can Help”, Et Al 2021
- “Scaling Laws for Deep Learning”, 2021
- “Modeling Protein Using Large-scale Pretrain Language Model”, Et Al 2021
- “Billion-Scale Pretraining With Vision Transformers for Multi-Task Visual Representations”, Et Al 2021
- “Facebook AI WMT21 News Translation Task Submission”, Et Al 2021
- “EVA: An Open-Domain Chinese Dialogue System With Large-Scale Generative Pre-Training”, Et Al 2021
- “HTLM: Hyper-Text Pre-Training and Prompting of Language Models”, Et Al 2021
- “A Field Guide to Federated Optimization”, Et Al 2021
- “Brain-like Functional Specialization Emerges Spontaneously in Deep Neural Networks”, Et Al 2021
- “ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation”, Et Al 2021
- “Scarecrow: A Framework for Scrutinizing Machine Text”, Et Al 2021
- “Revisiting the Calibration of Modern Neural Networks”, Et Al 2021
- “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units”, Et Al 2021
- “Partial Success in Closing the Gap between Human and Machine Vision”, Et Al 2021
- “Scaling Laws for Acoustic Models”, 2021
- “Knowledge Distillation: A Good Teacher Is Patient and Consistent”, Et Al 2021
- “CoAtNet: Marrying Convolution and Attention for All Data Sizes”, Et Al 2021
- “Scaling Vision Transformers”, Et Al 2021
- “Exploring the Limits of Out-of-Distribution Detection”, Et Al 2021
- “Effect of Pre-Training Scale on Intra/Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images”, 2021
- “A Universal Law of Robustness via Isoperimetry”, 2021
- “Naver Unveils First ‘Hyperscale’ AI Platform”, 2021
- “Unsupervised Speech Recognition”, Et Al 2021
- “Google Details New AI Accelerator Chips”, 2021
- “RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance”, Et Al 2021
- “MLP-Mixer: An All-MLP Architecture for Vision”, Et Al 2021
- “XLM-R XL: Larger-Scale Transformers for Multilingual Masked Language Modeling”, Et Al 2021
- “Scaling End-to-End Models for Large-Scale Multilingual ASR”, Et Al 2021
- “What Are Bayesian Neural Network Posteriors Really Like?”, Et Al 2021
- “DINO: Emerging Properties in Self-Supervised Vision Transformers”, Et Al 2021
- “Machine Learning Scaling”, 2021
- “Fully-Connected Neural Nets”, 2021
- “Computer Optimization: Your Computer Is Faster Than You Think”, 2021
- “[Ali Released PLUG: 27 Billion Parameters, the Largest Pre-trained Language Model in the Chinese Community]”, 2021
- “CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP”, Et Al 2021
- “Revealing Persona Biases in Dialogue Systems”, Et Al 2021
- “The Power of Scale for Parameter-Efficient Prompt Tuning”, Et Al 2021
- “Probing Across Time: What Does RoBERTa Know and When?”, Et Al 2021
- “Large-Scale Self-Supervised and Semi-Supervised Learning for Speech Translation”, Et Al 2021
- “Scaling Laws for Language Transfer Learning”, 2021
- “SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”, Et Al 2021
- “Understanding Robustness of Transformers for Image Classification”, Et Al 2021
- “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Et Al 2021
- “Efficient Visual Pretraining With Contrastive Detection”, Et Al 2021
- “The Shape of Learning Curves: a Review”, 2021
- “Controllable Generation from Pre-trained Language Models via Inverse Prompting”, Et Al 2021
- “Revisiting ResNets: Improved Training and Scaling Strategies”, Et Al 2021
- “Learning from Videos to Understand the World”, Et Al 2021
- “Fast and Accurate Model Scaling”, Et Al 2021
- “WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training”, Et Al 2021
- “Pretrained Transformers As Universal Computation Engines”, Et Al 2021
- “Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, Et Al 2021
- “A Law of Robustness for Two-layers Neural Networks”, Et Al 2021
- “Measuring Mathematical Problem Solving With the MATH Dataset”, Et Al 2021
- “SEER: Self-supervised Pretraining of Visual Features in the Wild”, Et Al 2021
- “M6: A Chinese Multimodal Pretrainer”, Et Al 2021
- “Zero-Shot Text-to-Image Generation”, Et Al 2021
- “Improved Denoising Diffusion Probabilistic Models”, 2021
- “Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”, Et Al 2021
- “A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes”, Et Al 2021
- “NFNet: High-Performance Large-Scale Image Recognition Without Normalization”, Et Al 2021
- “ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, Et Al 2021
- “Learning Curve Theory”, 2021
- “1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Et Al 2021
- “Scaling Laws for Transfer”, Et Al 2021
- “Muppet: Massive Multi-task Representations With Pre-Finetuning”, Et Al 2021
- “Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning”, Et Al 2021
- “Language Processing in Brains and Deep Neural Networks: Computational Convergence and Its Limits”, 2021
- “Meta Pseudo Labels”, Et Al 2021
- “VinVL: Revisiting Visual Representations in Vision-Language Models”, Et Al 2021
- “CDLM: Cross-Document Language Modeling”, Et Al 2021
- “VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation”, Et Al 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, 2021
- “Extrapolating GPT-N Performance”, 2020
- “Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences”, Et Al 2020
- “CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Et Al 2020
- “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, 2020
- “When Do You Need Billions of Words of Pretraining Data?”, Et Al 2020
- “ML Scaling Subreddit”, 2020
- “Scaling Laws for Autoregressive Generative Modeling”, Et Al 2020
- “Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, Et Al 2020
- “MT5: A Massively Multilingual Pre-trained Text-to-text Transformer”, Et Al 2020
- “Beyond English-Centric Multilingual Machine Translation”, Et Al 2020
- “Towards End-to-End In-Image Neural Machine Translation”, Et Al 2020
- “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, Et Al 2020
- “The First AI Model That Translates 100 Languages without Relying on English Data”, 2020
- “The Deep Bootstrap Framework: Good Online Learners Are Good Offline Generalizers”, Et Al 2020
- “WinoGrande: An Adversarial Winograd Schema Challenge at Scale”, Et Al 2020
- “Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)”, Et Al 2020
- “The Neural Architecture of Language: Integrative Reverse-engineering Converges on a Model for Predictive Processing”, Et Al 2020
- “Fast Stencil-Code Computation on a Wafer-Scale Processor”, Et Al 2020
- “Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, Et Al 2020
- “Vision Transformer: An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale”, Et Al 2020
- “Small Data, Big Decisions: Model Selection in the Small-Data Regime”, Et Al 2020
- “New Report on How Much Computational Power It Takes to Match the Human Brain”, 2020
- “Generative Language Modeling for Automated Theorem Proving”, 2020
- “GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce”, Et Al 2020
- “Accuracy and Performance Comparison of Video Action Recognition Approaches”, Et Al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Et Al 2020
- “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
- “Self-supervised Learning through the Eyes of a Child”, Et Al 2020
- “Hopfield Networks Is All You Need”, Et Al 2020
- “On Robustness and Transferability of Convolutional Neural Networks”, Et Al 2020
- “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Et Al 2020
- “NVAE: A Deep Hierarchical Variational Autoencoder”, 2020
- “Measuring Robustness to Natural Distribution Shifts in Image Classification”, Et Al 2020
- “Is SGD a Bayesian Sampler? Well, Almost”, Et Al 2020
- “Unsupervised Cross-lingual Representation Learning for Speech Recognition”, Et Al 2020
- “Logarithmic Pruning Is All You Need”, Et Al 2020
- “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations”, Et Al 2020
- “Denoising Diffusion Probabilistic Models”, Et Al 2020
- “GPT-3 Creative Fiction”, 2020
- “On the Predictability of Pruning Across Scales”, Et Al 2020
- “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Et Al 2020
- “SimCLRv2: Big Self-Supervised Models Are Strong Semi-Supervised Learners”, Et Al 2020
- “SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments”, Et Al 2020
- “IGPT: Generative Pretraining from Pixels”, Et Al 2020
- “Are We Done With ImageNet?”, Et Al 2020
- “OpenAI API”, Et Al 2020
- “How Big Should My Language Model Be?”, 2020
- “Object Segmentation Without Labels With Large-Scale Generative Models”, Et Al 2020
- “GPT-3 Paper § Figure F.1: Four Uncurated Completions from a Context Suggesting the Model Compose a Poem in the Style of Wallace Stevens With the Title ‘Shadows on the Way’”, GPT-3 2020 (page 48)
- “The Scaling Hypothesis”, 2020
- “Danny Hernandez on Forecasting and the Drivers of AI Progress”, Et Al 2020
- “ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale”, 2020
- “Powered by AI: Advancing Product Understanding and Building New Shopping Experiences”, Et Al 2020
- “Measuring the Algorithmic Efficiency of Neural Networks”, 2020
- “Pushing the Limit of Molecular Dynamics With Ab Initio Accuracy to 100 Million Atoms With Machine Learning”, Et Al 2020
- “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Et Al 2020
- “Blender: A State-of-the-art Open Source Chatbot”, Et Al 2020
- “A Review of Winograd Schema Challenge Datasets and Approaches”, Et Al 2020
- “Scaling Laws from the Data Manifold Dimension”, 2020
- “DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications”, Et Al 2020
- “PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation”, Et Al 2020
- “Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems”, Et Al 2020
- “A Metric Learning Reality Check”, Et Al 2020
- “TTTTTackling WinoGrande Schemas”, Et Al 2020
- “Zoom In: An Introduction to Circuits—By Studying the Connections between Neurons, We Can Find Meaningful Algorithms in the Weights of Neural Networks”, Et Al 2020
- “Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, Et Al 2020
- “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Et Al 2020
- “Rethinking Bias-Variance Trade-off for Generalization of Neural Networks”, Et Al 2020
- “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
- “A Simple Framework for Contrastive Learning of Visual Representations”, Et Al 2020
- “Turing-NLG: A 17-billion-parameter Language Model by Microsoft”, 2020
- “How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, Et Al 2020
- “Impact of ImageNet Model Selection on Domain Adaptation”, 2020
- “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Et Al 2020
- “Towards a Conversational Agent That Can Chat About…Anything”, 2020
- “Towards a Human-like Open-Domain Chatbot”, Et Al 2020
- “Scaling Laws for Neural Language Models”, Et Al 2020
- “Big Transfer (BiT): General Visual Representation Learning”, Et Al 2019
- “Deep Double Descent: We Show That the Double Descent Phenomenon Occurs in CNNs, ResNets, and Transformers: Performance First Improves, Then Gets Worse, and Then Improves Again With Increasing Model Size, Data Size, or Training Time. This Effect Is Often Avoided through Careful Regularization. While This Behavior Appears to Be Fairly Universal, We Don’t yet Fully Understand Why It Happens, and View Further Study of This Phenomenon As an Important Research Direction.”, Et Al 2019
- “12-in-1: Multi-Task Vision and Language Representation Learning”, Et Al 2019
- “Deep Double Descent: Where Bigger Models and More Data Hurt”, Et Al 2019
- “Understanding the Generalization Of ‘Lottery Tickets’ In Neural Networks”, 2019
- “Momentum Contrast for Unsupervised Visual Representation Learning”, Et Al 2019
- “The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design”, 2019
- “Self-training With Noisy Student Improves ImageNet Classification”, Et Al 2019
- “CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs”, El-Et Al 2019
- “CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB”, Et Al 2019
- “XLM-R: State-of-the-art Cross-lingual Understanding through Self-supervision”, FAIR 2019
- “Unsupervised Cross-lingual Representation Learning at Scale”, Et Al 2019
- “High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks”, Et Al 2019
- “Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, Et Al 2019
- “T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Et Al 2019
- “ZeRO: Memory Optimizations Toward Training Trillion Parameter Models”, Et Al 2019
- “Environmental Drivers of Systematicity and Generalization in a Situated Agent”, Et Al 2019
- “A Constructive Prediction of the Generalization Error Across Scales”, Et Al 2019
- “Large-scale Pretraining for Neural Machine Translation With Tens of Billions of Sentence Pairs”, Et Al 2019
- “UNITER: UNiversal Image-TExt Representation Learning”, Et Al 2019
- “Exascale Deep Learning for Scientific Inverse Problems”, Et Al 2019
- “Simple, Scalable Adaptation for Neural Machine Translation”, Et Al 2019
- “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Et Al 2019
- “Show Your Work: Improved Reporting of Experimental Results”, Et Al 2019
- “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
- “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, Et Al 2019
- “Robustness Properties of Facebook’s ResNeXt WSL Models”, 2019
- “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges”, Et Al 2019
- “Large Scale Adversarial Representation Learning”, 2019
- “One Epoch Is All You Need”, 2019
- “Does Learning Require Memorization? A Short Tale about a Long Tail”, 2019
- “Intriguing Properties of Adversarial Training at Scale”, 2019
- “Scaling Autoregressive Video Models”, Et Al 2019
- “A Mathematical Theory of Semantic Development in Deep Neural Networks”, Et Al 2019
- “Adversarially Robust Generalization Just Requires More Unlabeled Data”, Et Al 2019
- “ICML 2019 Notes”, 2019
- “SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”, Et Al 2019
- “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, 2019
- “UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation”, Et Al 2019
- “Billion-scale Semi-supervised Learning for Image Classification”, Et Al 2019
- “VideoBERT: A Joint Model for Video and Language Representation Learning”, Et Al 2019
- “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, 2019
- “Surprises in High-Dimensional Ridgeless Least Squares Interpolation”, Et Al 2019
- “The Bitter Lesson”, 2019
- “Deep Learning Hardware: Past, Present, & Future”, 2019
- “Better Language Models and Their Implications”, Et Al 2019
- “Language Models Are Unsupervised Multitask Learners”, Et Al 2019
- “Do ImageNet Classifiers Generalize to ImageNet?”, Et Al 2019
- “Cross-lingual Language Model Pretraining”, 2019
- “High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos”, Et Al 2019
- “Reconciling Modern Machine Learning Practice and the Bias-variance Trade-off”, Et Al 2018
- “Nocaps: Novel Object Captioning at Scale”, Et Al 2018
- “How AI Training Scales”, Et Al 2018
- “Is Science Slowing Down?”, 2018
- “WBE and DRL: a Middle Way of Imitation Learning from the Human Brain”, 2018
- “BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M”, Et Al 2018 (page 8 Org Deepmind)
- “Large Scale GAN Training for High Fidelity Natural Image Synthesis”, Et Al 2018
- “Measurement Invariance Explains the Universal Law of Generalization for Psychological Perception”, 2018
- “CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, Et Al 2018
- “Large-Scale Visual Speech Recognition”, Et Al 2018
- “Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, 2018
- “Neural Scene Representation and Rendering”, Et Al 2018
- “GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Et Al 2018 (page 5)
- “GPT-1: Improving Language Understanding by Generative Pre-Training”, Et Al 2018
- “Do CIFAR-10 Classifiers Generalize to CIFAR-10?”, Et Al 2018
- “Deep Learning Generalizes Because the Parameter-function Map Is Biased towards Simple Functions”, Valle-Et Al 2018
- “Google DeepMind Founder and Leader in Artificial Intelligence Returns to Hamilton”, 2018
- “Exploring the Limits of Weakly Supervised Pretraining”, Et Al 2018
- “One Big Net For Everything”, 2018
- “Sensitivity and Generalization in Neural Networks: an Empirical Study”, Et Al 2018
- “Deep Image Reconstruction from Human Brain Activity”, Et Al 2017
- “Deep Learning Scaling Is Predictable, Empirically”, Et Al 2017
- “Are GANs Created Equal? A Large-Scale Study”, Et Al 2017
- “Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN”, Et Al 2017
- “Rethinking Generalization Requires Revisiting Old Ideas: Statistical Mechanics Approaches and Complex Learning Behavior”, 2017
- “There’s No Fire Alarm for Artificial General Intelligence”, 2017
- “WebVision Database: Visual Learning and Understanding from Web Data”, Et Al 2017
- “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”, Et Al 2017
- “Towards Deep Learning Models Resistant to Adversarial Attacks”, Et Al 2017
- “Learning to Learn from Noisy Web Videos”, Et Al 2017
- “A Simple Neural Network Module for Relational Reasoning”, Et Al 2017
- “Quo Vadis, Action Recognition? A New Model I3D and the Kinetics Dataset”, 2017
- “WebVision Challenge: Visual Learning and Understanding With Web Data”, Et Al 2017
- “Geometry of Optimization and Implicit Regularization in Deep Learning”, Et Al 2017
- “On the Impossibility of Supersized Machines”, Et Al 2017
- “Parallel Multiscale Autoregressive Density Estimation”, Et Al 2017
- “Universal Representations:The Missing Link between Faces, Text, Planktons, and Cat Breeds”, 2017
- “Estimation of Gap Between Current Language Models and Human Performance”, Et Al 2017
- “DeepMind Lab”, Et Al 2016
- “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles”, Et Al 2016
- “Understanding Deep Learning Requires Rethinking Generalization”, Et Al 2016
- “Ra”, 2016
- “The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context”, Et Al 2016
- “Do Deep Convolutional Nets Really Need to Be Deep and Convolutional?”, Et Al 2016
- “PlaNet—Photo Geolocation With Convolutional Neural Networks”, Et Al 2016
- “Exploring the Limits of Language Modeling”, Et Al 2016
- “Net2Net: Accelerating Learning via Knowledge Transfer”, Et Al 2015
- “Generative Concatenative Nets Jointly Learn to Write and Classify Reviews”, Et Al 2015
- “Learning Visual Features from Large Weakly Supervised Data”, Et Al 2015
- “The Brain As a Universal Learning Machine”, 2015
- “LSUN: Construction of a Large-scale Image Dataset Using Deep Learning With Humans in the Loop”, Et Al 2015
- “Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification”, Et Al 2015
- “The Unreasonable Effectiveness of Recurrent Neural Networks”, 2015
- “YFCC100M: The New Data in Multimedia Research”, Et Al 2015
- “Evolution of the Human Brain: From Matter to Mind”, 2015
- “In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning”, Et Al 2014
- “Technology Forecasting: The Garden of Forking Paths”, 2014
- “Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]”, 2014
- “Neural Networks, Manifolds, and Topology”, 2014
- “Computing’s Energy Problem (and What We Can Do about It)”, 2014
- “One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling”, Et Al 2013
- “Algorithmic Progress in Six Domains”, 2013
- “Large-Scale Machine Learning Revisited [slides]”, 2013
- “Intelligence Explosion Microeconomics”, 2013
- “The Remarkable, yet Not Extraordinary, Human Brain As a Scaled-up Primate Brain and Its Associated Cost”, Herculano-2012
- “Advantages of Artificial Intelligences, Uploads, and Digital Minds”, 2012
- “How Complex Are Individual Differences?”, 2010
- “Understanding Sources of Inefficiency in General-purpose Chips”, Et Al 2010
- “The Teenies”, 2009
- “Tick, Tock, Tick, Tock… BING”, 2009
- “The Unreasonable Effectiveness of Data”, Et Al 2009
- “Economics Of The Singularity: Stuffed into Skyscrapers by the Billion, Brainy Bugbots Will Be the Knowledge Workers of the Future”, 2008
- “Large Language Models in Machine Translation”, Et Al 2007
- “Cellular Scaling Rules for Primate Brains”, Herculano-Et Al 2007
- “The Tradeoffs of Large-Scale Learning”, 2007
- “Robot Predictions Evolution”, 2004
- “Tree Induction vs. Logistic Regression: A Learning-Curve Analysis”, Et Al 2003
- “Scaling to Very Very Large Corpora for Natural Language Disambiguation”, 2001
- “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes”, 2001
- “A Survey of Methods for Scaling Up Inductive Algorithms”, 1999
- “On The Effect of Data Set Size on Bias And Variance in Classification Learning”, 1999
- “The Effects of Training Set Size on Decision Tree Complexity”, 1997
- “Rigorous Learning Curve Bounds from Statistical Mechanics”, Et Al 1996
- “Scaling up the Accuracy of Naive-Bayes Classifiers: a Decision-tree Hybrid”, 1996
- “Reflections After Refereeing Papers for NIPS”, 1995
- “Building a Large Annotated Corpus of English: The Penn Treebank”, Et Al 1993
- “Statistical Theory of Learning Curves under Entropic Loss Criterion”, 1993
- “Learning Curves: Asymptotic Values and Rate of Convergence”, Et Al 1993
- “Exhaustive Learning”, Et Al 1990
- “Computing With Connections”, 1987
- “The Role Of RAW POWER In INTELLIGENCE”, 1976
- “Don’t Worry—It Can’t Happen”, 1940
- “Homepage of Paul F. Christiano”, 2023
- “Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
- “A Universal Law of Robustness”
- “A Law of Robustness and the Importance of Overparameterization in Deep Learning”
- Wikipedia
- Miscellaneous
- Link Bibliography
Tagged links on machine learning scaling.
For the bibliography of ML scaling papers showing smooth scaling of neural net performance in general with increasingly large parameters, data, & compute organized by topic, see “Machine Learning Scaling” notes. For the essay on NN scaling implications, see “The Scaling Hypothesis”.
See Also
Links
“Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Et Al 2023
“Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)”, 2023-02-27 ( ; similar)
“Scaling Vision Transformers to 22 Billion Parameters”, Et Al 2023
“Scaling Vision Transformers to 22 Billion Parameters”, 2023-02-10 ( ; similar; bibliography)
“Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023
“Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023-01-25 ( ; similar; bibliography)
“ClimaX: A Foundation Model for Weather and Climate”, Et Al 2023
“ClimaX: A foundation model for weather and climate”, 2023-01-24 ( ; similar)
“StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis”, Et Al 2023
“StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis”, 2023-01-23 ( ; similar; bibliography)
“MUG: Vision Learners Meet Web Image-Text Pairs”, Et Al 2023
“MUG: Vision Learners Meet Web Image-Text Pairs”, 2023-01-17 ( ; similar; bibliography)
“GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Et Al 2023
“GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, 2023-01-11 ( ; similar; bibliography)
“Scaling Laws for Generative Mixed-Modal Language Models”, Et Al 2023
“Scaling Laws for Generative Mixed-Modal Language Models”, 2023-01-10 ( ; similar; bibliography)
“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Et Al 2023
“VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers”, 2023-01-05 ( ; similar; bibliography)
“GPT-3 Takes the Bar Exam”, II & 2022
“GPT-3 Takes the Bar Exam”, 2022-12-29 ( ; backlinks; similar; bibliography)
“One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Et Al 2022
“One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, 2022-12-19 ( ; similar; bibliography)
“Reproducible Scaling Laws for Contrastive Language-image Learning”, Et Al 2022
“Reproducible scaling laws for contrastive language-image learning”, 2022-12-14 ( ; backlinks; similar; bibliography)
“ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, Et Al 2022
“ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, 2022-12-13 ( ; similar)
“VindLU: A Recipe for Effective Video-and-Language Pretraining”, Et Al 2022
“VindLU: A Recipe for Effective Video-and-Language Pretraining”, 2022-12-09 ( ; similar; bibliography)
“Video-Text Modeling With Zero-Shot Transfer from Contrastive Captioners”, Et Al 2022
“Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners”, 2022-12-09 ( ; similar; bibliography)
“Robust Speech Recognition via Large-Scale Weak Supervision”, Et Al 2022
“Robust Speech Recognition via Large-Scale Weak Supervision”, 2022-12-06 (similar)
“Scaling Language-Image Pre-training via Masking”, Et Al 2022
“Scaling Language-Image Pre-training via Masking”, 2022-12-01 ( ; similar)
“MultiRay: Optimizing Efficiency for Large-scale AI Models”, Et Al 2022
“MultiRay: Optimizing efficiency for large-scale AI models”, 2022-11-18 (similar; bibliography)
“Galactica: A Large Language Model for Science”, Et Al 2022
“Galactica: A Large Language Model for Science”, 2022-11-16 ( ; similar)
“EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Et Al 2022
“EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, 2022-11-14 ( ; similar; bibliography)
“MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation”, Et Al 2022
“MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation”, 2022-11-10 ( ; similar)
“Will We Run out of Data? An Analysis of the Limits of Scaling Datasets in Machine Learning”, Et Al 2022
“Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning”, 2022-10-26 ( ; similar)
“Evaluating Parameter Efficient Learning for Generation”, Et Al 2022
“Evaluating Parameter Efficient Learning for Generation”, 2022-10-25 ( ; similar; bibliography)
“FLAN: Scaling Instruction-Finetuned Language Models”, Et Al 2022
“FLAN: Scaling Instruction-Finetuned Language Models”, 2022-10-20 ( ; similar; bibliography)
“Vision-Language Pre-training: Basics, Recent Advances, and Future Trends”, Et Al 2022
“Vision-Language Pre-training: Basics, Recent Advances, and Future Trends”, 2022-10-17 ( ; similar)
“Foundation Transformers”, Et Al 2022
“Foundation Transformers”, 2022-10-12 ( ; similar; bibliography)
“Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Et Al 2022
“Ask Me Anything (AMA): A simple strategy for prompting language models”, 2022-10-05 ( ; similar; bibliography)
“GLM-130B: An Open Bilingual Pre-trained Model”, Et Al 2022
“GLM-130B: An Open Bilingual Pre-trained Model”, 2022-10-05 ( ; similar; bibliography)
“Do Current Multi-Task Optimization Methods in Deep Learning Even Help?”, Et Al 2022
“Do Current Multi-Task Optimization Methods in Deep Learning Even Help?”, 2022-09-23 ( ; similar)
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”, Et Al 2022
“Monolith: Real Time Recommendation System With Collisionless Embedding Table”, 2022-09-16 ( ; similar)
“Machine Reading, Fast and Slow: When Do Models”Understand” Language?“, Et Al 2022
“Machine Reading, Fast and Slow: When Do Models "Understand" Language?”, 2022-09-15 ( ; similar)
“PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Et Al 2022
“PaLI: A Jointly-Scaled Multilingual Language-Image Model”, 2022-09-14 ( ; similar)
“Using Large Language Models to Simulate Multiple Humans”, Et Al 2022
“Using Large Language Models to Simulate Multiple Humans”, 2022-08-18 ( ; similar)
“Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, Et Al 2022
“Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, 2022-08-10 ( ; similar; bibliography)
“ESMfold: Language Models of Protein Sequences at the Scale of Evolution Enable Accurate Structure Prediction”, Et Al 2022
“ESMfold: Language models of protein sequences at the scale of evolution enable accurate structure prediction”, 2022-07-21 ( ; similar)
“Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?”, Et Al 2022
“Why do tree-based models still outperform deep learning on tabular data?”, 2022-07-18 ( ; similar)
“PIXEL: Language Modelling With Pixels”, Et Al 2022
“PIXEL: Language Modelling with Pixels”, 2022-07-14 ( ; backlinks; similar; bibliography)
“High-performing Neural Network Models of Visual Cortex Benefit from High Latent Dimensionality”, 2022
“High-performing neural network models of visual cortex benefit from high latent dimensionality”, 2022-07-13 ( ; similar)
“Language Models (Mostly) Know What They Know”, Et Al 2022
“Language Models (Mostly) Know What They Know”, 2022-07-11 ( ; similar; bibliography)
“Exploring Length Generalization in Large Language Models”, Et Al 2022
“Exploring Length Generalization in Large Language Models”, 2022-07-11 ( ; similar)
“On-Device Training Under 256KB Memory”, Et Al 2022
“On-Device Training Under 256KB Memory”, 2022-06-30 ( ; similar; bibliography)
“Beyond Neural Scaling Laws: Beating Power Law Scaling via Data Pruning”, Et Al 2022
“Beyond neural scaling laws: beating power law scaling via data pruning”, 2022-06-29 ( ; backlinks; similar; bibliography)
“ProGen2: Exploring the Boundaries of Protein Language Models”, Et Al 2022
“ProGen2: Exploring the Boundaries of Protein Language Models”, 2022-06-27 ( ; similar)
“RST: ReStructured Pre-training”, 2022
“RST: reStructured Pre-training”, 2022-06-22 ( ; similar)
“Limitations of the NTK for Understanding Generalization in Deep Learning”, Et Al 2022
“Limitations of the NTK for Understanding Generalization in Deep Learning”, 2022-06-20 (similar)
“Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, Et Al 2022
“Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, 2022-06-19 ( ; similar)
“BigVGAN: A Universal Neural Vocoder With Large-Scale Training”, Et Al 2022
“BigVGAN: A Universal Neural Vocoder with Large-Scale Training”, 2022-06-09 ( ; similar; bibliography)
“An Improved One Millisecond Mobile Backbone”, Et Al 2022
“An Improved One millisecond Mobile Backbone”, 2022-06-08 ( ; similar)
“A Neural Corpus Indexer for Document Retrieval”, Et Al 2022
“A Neural Corpus Indexer for Document Retrieval”, 2022-06-06 ( ; similar)
“Toward a Realistic Model of Speech Processing in the Brain With Self-supervised Learning”, Et Al 2022
“Toward a realistic model of speech processing in the brain with self-supervised learning”, 2022-06-03 ( ; backlinks; similar; bibliography)
“Teaching Models to Express Their Uncertainty in Words”, Et Al 2022
“Teaching Models to Express Their Uncertainty in Words”, 2022-05-28 ( ; backlinks; similar)
“M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Et Al 2022
“M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, 2022-05-27 ( ; similar; bibliography)
“Why Robust Generalization in Deep Learning Is Difficult: Perspective of Expressive Power”, Et Al 2022
“Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power”, 2022-05-27 ( ; similar)
“InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning”, Et Al 2022
“InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning”, 2022-05-25 ( ; similar)
“Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Et Al 2022
“Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, 2022-05-22 ( ; backlinks; similar)
“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, Et Al 2022
“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models”, 2022-05-21 ( ; similar; bibliography)
“Towards Understanding Grokking: An Effective Theory of Representation Learning”, Et Al 2022
“Towards Understanding Grokking: An Effective Theory of Representation Learning”, 2022-05-20 ( ; backlinks; similar)
“Dialog Inpainting: Turning Documents into Dialogues”, Et Al 2022
“Dialog Inpainting: Turning Documents into Dialogues”, 2022-05-18 ( ; similar; bibliography)
“Unifying Language Learning Paradigms”, Et Al 2022
“Unifying Language Learning Paradigms”, 2022-05-10 ( ; similar; bibliography)
“When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, Et Al 2022
“When does dough become a bagel? Analyzing the remaining mistakes on ImageNet”, 2022-05-09 ( ; similar; bibliography)
“Building Machine Translation Systems for the Next Thousand Languages”, Et Al 2022
“Building Machine Translation Systems for the Next Thousand Languages”, 2022-05-09 ( ; similar; bibliography)
“CoCa: Contrastive Captioners Are Image-Text Foundation Models”, Et Al 2022
“CoCa: Contrastive Captioners are Image-Text Foundation Models”, 2022-05-04 ( ; similar; bibliography)
“Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”, Et Al 2022
“Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”, 2022-05-03 ( ; similar; bibliography)
“Continual Learning With Foundation Models: An Empirical Study of Latent Replay”, Et Al 2022
“Continual Learning with Foundation Models: An Empirical Study of Latent Replay”, 2022-04-30 (backlinks; similar)
“Flamingo: a Visual Language Model for Few-Shot Learning”, Et Al 2022
“Flamingo: a Visual Language Model for Few-Shot Learning”, 2022-04-29 (similar; bibliography)
“WebFace260M: A Benchmark for Million-Scale Deep Face Recognition”, Et Al 2022
“WebFace260M: A Benchmark for Million-Scale Deep Face Recognition”, 2022-04-21 (backlinks; similar; bibliography)
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, Et Al 2022
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, 2022-04-12 ( ; backlinks; similar)
“DeepMind: The Podcast—Excerpts on AGI”, 2022
“DeepMind: The Podcast—Excerpts on AGI”, 2022-04-07 ( ; similar; bibliography)
“Can Language Models Learn from Explanations in Context?”, Et Al 2022
“Can language models learn from explanations in context?”, 2022-04-05 ( ; similar)
“Training Compute-Optimal Large Language Models”, Et Al 2022
“Training Compute-Optimal Large Language Models”, 2022-03-29 ( ; similar)
“A Roadmap for Big Model”, Et Al 2022
“A Roadmap for Big Model”, 2022-03-26 (similar)
“A Conversational Paradigm for Program Synthesis”, Et Al 2022
“A Conversational Paradigm for Program Synthesis”, 2022-03-25 ( ; similar)
“Self-Consistency Improves Chain of Thought Reasoning in Language Models”, Et Al 2022
“Self-Consistency Improves Chain of Thought Reasoning in Language Models”, 2022-03-21 ( ; similar; bibliography)
“Effect of Scale on Catastrophic Forgetting in Neural Networks”, Et Al 2022
“Effect of scale on catastrophic forgetting in neural networks”, 2022-03-15 (backlinks; similar)
“Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, Et Al 2022
“Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”, 2022-03-07 ( ; backlinks; similar; bibliography)
“FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours”, Et Al 2022
“FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours”, 2022-03-02 ( ; similar; bibliography)
“Variational Autoencoders Without the Variation”, Et Al 2022
“Variational Autoencoders Without the Variation”, 2022-03-01 ( ; similar)
“Performance Reserves in Brain-imaging-based Phenotype Prediction”, Et Al 2022
“Performance reserves in brain-imaging-based phenotype prediction”, 2022-02-25 ( ; similar)
“Self-Distilled StyleGAN: Towards Generation from Internet Photos”, Et Al 2022
“Self-Distilled StyleGAN: Towards Generation from Internet Photos”, 2022-02-24 ( ; similar; bibliography)
“Brains and Algorithms Partially Converge in Natural Language Processing”, 2022
“Brains and algorithms partially converge in natural language processing”, 2022-02-16 ( ; similar; bibliography)
“Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision”, Et Al 2022
“Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision”, 2022-02-16 (similar)
“Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework”, Et Al 2022
“Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework”, 2022-02-14 ( ; similar; bibliography)
“Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework”, Et Al 2022
“Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework”, 2022-02-07 ( ; similar; bibliography)
“Webly Supervised Concept Expansion for General Purpose Vision Models”, Et Al 2022
“Webly Supervised Concept Expansion for General Purpose Vision Models”, 2022-02-04 (similar; bibliography)
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Et Al 2022
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, 2022-02-04 ( ; similar)
“Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Et Al 2022
“Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, 2022-01-28 ( ; similar; bibliography)
“Reasoning Like Program Executors”, Et Al 2022
“Reasoning Like Program Executors”, 2022-01-27 ( ; similar; bibliography)
“Text and Code Embeddings by Contrastive Pre-Training”, Et Al 2022
“Text and Code Embeddings by Contrastive Pre-Training”, 2022-01-24 ( ; similar; bibliography)
“SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, Et Al 2022
“SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, 2022-01-20 ( ; similar; bibliography)
“LaMDA: Language Models for Dialog Applications”, Et Al 2022
“LaMDA: Language Models for Dialog Applications”, 2022-01-20 ( ; similar)
“CM3: A Causal Masked Multimodal Model of the Internet”, Et Al 2022
“CM3: A Causal Masked Multimodal Model of the Internet”, 2022-01-19 ( ; similar)
“ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization”, Et Al 2022
“ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization”, 2022-01-18 ( ; backlinks; similar; bibliography)
“The Defeat of the Winograd Schema Challenge”, Et Al 2022
“The Defeat of the Winograd Schema Challenge”, 2022-01-07 ( ; backlinks)
“Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]”, Et Al 2022
“Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]”, 2022-01-06 ( ; similar)
“AV-HuBERT: Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction”, Et Al 2022
“AV-HuBERT: Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction”, 2022-01-05 ( ; similar)
“Robust Self-Supervised Audio-Visual Speech Recognition”, Et Al 2022
“Robust Self-Supervised Audio-Visual Speech Recognition”, 2022-01-05 (similar)
“Self-supervised Learning from 100 Million Medical Images”, Et Al 2022
“Self-supervised Learning from 100 Million Medical Images”, 2022-01-04 (similar)
“The Evolution of Quantitative Sensitivity”, Et Al 2021
“The evolution of quantitative sensitivity”, 2021-12-27 ( ; backlinks; similar; bibliography)
“ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation”, Et Al 2021
“ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation”, 2021-12-23 ( ; similar)
“XGLM: Few-shot Learning With Multilingual Language Models”, Et Al 2021
“XGLM: Few-shot Learning with Multilingual Language Models”, 2021-12-20 ( ; similar)
“An Empirical Investigation of the Role of Pre-training in Lifelong Learning”, Et Al 2021
“An Empirical Investigation of the Role of Pre-training in Lifelong Learning”, 2021-12-16 ( ; backlinks; similar)
“Knowledge-Rich Self-Supervised Entity Linking”, Et Al 2021
“Knowledge-Rich Self-Supervised Entity Linking”, 2021-12-15 (similar)
“Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Et Al 2021
“Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, 2021-12-15 ( ; similar)
“EBERT: Epigenomic Language Models Powered by Cerebras”, Et Al 2021
“EBERT: Epigenomic language models powered by Cerebras”, 2021-12-14 ( ; similar)
“You Only Need One Model for Open-domain Question Answering”, Et Al 2021
“You Only Need One Model for Open-domain Question Answering”, 2021-12-14 ( ; similar)
“MAGMA—Multimodal Augmentation of Generative Models through Adapter-based Finetuning”, Et Al 2021
“MAGMA—Multimodal Augmentation of Generative Models through Adapter-based Finetuning”, 2021-12-09 ( ; backlinks; similar)
“MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, Et Al 2021
“MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, 2021-12-08 ( ; backlinks; similar)
“Improving Language Models by Retrieving from Trillions of Tokens”, Et Al 2021
“Improving language models by retrieving from trillions of tokens”, 2021-12-08 ( ; similar; bibliography)
“Sparse Is Enough in Scaling Transformers”, Et Al 2021
“Sparse is Enough in Scaling Transformers”, 2021-11-24 ( ; similar; bibliography)
“LEMON: Scaling Up Vision-Language Pre-training for Image Captioning”, Et Al 2021
“LEMON: Scaling Up Vision-Language Pre-training for Image Captioning”, 2021-11-24 ( ; similar; bibliography)
“Can Pre-trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts?”, Et Al 2021
“Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?”, 2021-11-23 ( ; similar; bibliography)
“Florence: A New Foundation Model for Computer Vision”, Et Al 2021
“Florence: A New Foundation Model for Computer Vision”, 2021-11-22 ( ; similar; bibliography)
“RedCaps: Web-curated Image-text Data Created by the People, for the People”, Et Al 2021
“RedCaps: web-curated image-text data created by the people, for the people”, 2021-11-22 (backlinks; similar)
“L-Verse: Bidirectional Generation Between Image and Text”, Et Al 2021
“L-Verse: Bidirectional Generation Between Image and Text”, 2021-11-22 ( ; backlinks; similar; bibliography)
“ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, Et Al 2021
“ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, 2021-11-22 ( ; similar)
“BASIC: Combined Scaling for Open-Vocabulary Image Classification”, Et Al 2021
“BASIC: Combined Scaling for Open-Vocabulary Image Classification”, 2021-11-19 ( ; similar; bibliography)
“Swin Transformer V2: Scaling Up Capacity and Resolution”, Et Al 2021
“Swin Transformer V2: Scaling Up Capacity and Resolution”, 2021-11-18 ( ; backlinks; similar; bibliography)
“XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale”, Et Al 2021
“XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale”, 2021-11-17 (similar)
“INTERN: A New Learning Paradigm Towards General Vision”, Et Al 2021
“INTERN: A New Learning Paradigm Towards General Vision”, 2021-11-16 ( ; similar)
“Few-Shot Self-Rationalization With Natural Language Prompts”, Et Al 2021
“Few-Shot Self-Rationalization with Natural Language Prompts”, 2021-11-16 ( ; similar)
“Solving Probability and Statistics Problems by Program Synthesis”, Et Al 2021
“Solving Probability and Statistics Problems by Program Synthesis”, 2021-11-16 ( ; backlinks; similar; bibliography)
“Covariate Shift in High-Dimensional Random Feature Regression”, Et Al 2021
“Covariate Shift in High-Dimensional Random Feature Regression”, 2021-11-16 (backlinks; similar)
“Solving Linear Algebra by Program Synthesis”, 2021
“Solving Linear Algebra by Program Synthesis”, 2021-11-16 ( ; backlinks; similar)
“MAE: Masked Autoencoders Are Scalable Vision Learners”, Et Al 2021
“MAE: Masked Autoencoders Are Scalable Vision Learners”, 2021-11-11 ( ; similar; bibliography)
“Scaling ASR Improves Zero and Few Shot Learning”, Et Al 2021
“Scaling ASR Improves Zero and Few Shot Learning”, 2021-11-10 ( ; similar)
“Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, Et Al 2021
“Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters”, 2021-11-10 ( ; backlinks; similar)
“LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Et Al 2021
“LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, 2021-11-03 ( ; similar; bibliography)
“Training Verifiers to Solve Math Word Problems”, Et Al 2021
“Training Verifiers to Solve Math Word Problems”, 2021-10-27 ( ; similar)
“When in Doubt, Summon the Titans: Efficient Inference With Large Models”, Et Al 2021
“When in Doubt, Summon the Titans: Efficient Inference with Large Models”, 2021-10-19 ( ; similar)
“The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail”, 2021
“The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail”, 2021-10-15 ( ; similar)
“LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, 2021
“LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, 2021-10-14 ( ; similar)
“Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, Et Al 2021
“Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, 2021-10-14 ( ; similar)
“Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers”, Et Al 2021
“Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers”, 2021-10-13 (backlinks; similar; bibliography)
“Unsupervised Neural Machine Translation With Generative Language Models Only”, Et Al 2021
“Unsupervised Neural Machine Translation with Generative Language Models Only”, 2021-10-11 ( ; similar)
“Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, Et Al 2021
“Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, 2021-10-10 ( ; similar)
“Universal Paralinguistic Speech Representations Using Self-Supervised Conformers”, Et Al 2021
“Universal Paralinguistic Speech Representations Using Self-Supervised Conformers”, 2021-10-09 (similar)
“M6–10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining”, Et Al 2021
“M6–10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining”, 2021-10-08 (similar)
“Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Et Al 2021
“Show Your Work: Scratchpads for Intermediate Computation with Language Models”, 2021-10-05 ( ; similar)
“Exploring the Limits of Large Scale Pre-training”, Et Al 2021
“Exploring the Limits of Large Scale Pre-training”, 2021-10-05 ( ; similar; bibliography)
“Learning through Atypical”Phase Transitions” in Overparameterized Neural Networks”, Et Al 2021
“Learning through atypical "phase transitions" in overparameterized neural networks”, 2021-10-01 ( ; similar)
“Mining for Strong Gravitational Lenses With Self-supervised Learning”, Et Al 2021
“Mining for strong gravitational lenses with self-supervised learning”, 2021-09-30 (similar)
“Stochastic Training Is Not Necessary for Generalization”, Et Al 2021
“Stochastic Training is Not Necessary for Generalization”, 2021-09-29 (similar)
“Evaluating Machine Accuracy on ImageNet”, Et Al 2021
“Evaluating Machine Accuracy on ImageNet”, 2021-09-28 (similar)
“Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, Et Al 2021
“Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, 2021-09-22 ( ; similar; bibliography)
“Scaling Laws for Neural Machine Translation”, Et Al 2021
“Scaling Laws for Neural Machine Translation”, 2021-09-16 ( ; similar)
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Et Al 2021
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, 2021-09-10 ( ; similar)
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Et Al 2021
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, 2021-09-08 ( ; backlinks; similar; bibliography)
“A Recipe For Arbitrary Text Style Transfer With Large Language Models”, Et Al 2021
“A Recipe For Arbitrary Text Style Transfer with Large Language Models”, 2021-09-08 ( ; similar)
“General-Purpose Question-Answering With Macaw”, 2021
“General-Purpose Question-Answering with Macaw”, 2021-09-06 ( ; similar; bibliography)
“A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning”, Et Al 2021
“A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning”, 2021-09-06 (backlinks; similar)
“An Empirical Exploration in Quality Filtering of Text Data”, 2021
“An Empirical Exploration in Quality Filtering of Text Data”, 2021-09-02 ( ; similar)
“Data and Parameter Scaling Laws for Neural Machine Translation”, Et Al 2021
“Data and Parameter Scaling Laws for Neural Machine Translation”, 2021-08-30 (backlinks; similar)
“Want To Reduce Labeling Cost? GPT-3 Can Help”, Et Al 2021
“Want To Reduce Labeling Cost? GPT-3 Can Help”, 2021-08-30 ( ; similar)
“Scaling Laws for Deep Learning”, 2021
“Scaling Laws for Deep Learning”, 2021-08-17 ( ; backlinks; similar; bibliography)
“Modeling Protein Using Large-scale Pretrain Language Model”, Et Al 2021
“Modeling Protein Using Large-scale Pretrain Language Model”, 2021-08-17 ( ; similar)
“Billion-Scale Pretraining With Vision Transformers for Multi-Task Visual Representations”, Et Al 2021
“Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations”, 2021-08-12 ( ; similar)
“Facebook AI WMT21 News Translation Task Submission”, Et Al 2021
“Facebook AI WMT21 News Translation Task Submission”, 2021-08-06 (similar)
“EVA: An Open-Domain Chinese Dialogue System With Large-Scale Generative Pre-Training”, Et Al 2021
“EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training”, 2021-08-03 ( ; similar)
“HTLM: Hyper-Text Pre-Training and Prompting of Language Models”, Et Al 2021
“HTLM: Hyper-Text Pre-Training and Prompting of Language Models”, 2021-07-14 (similar)
“A Field Guide to Federated Optimization”, Et Al 2021
“A Field Guide to Federated Optimization”, 2021-07-14 (similar)
“Brain-like Functional Specialization Emerges Spontaneously in Deep Neural Networks”, Et Al 2021
“Brain-like functional specialization emerges spontaneously in deep neural networks”, 2021-07-06 ( ; similar)
“ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation”, Et Al 2021
“ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation”, 2021-07-05 (similar; bibliography)
“Scarecrow: A Framework for Scrutinizing Machine Text”, Et Al 2021
“Scarecrow: A Framework for Scrutinizing Machine Text”, 2021-07-02 ( ; similar; bibliography)
“Revisiting the Calibration of Modern Neural Networks”, Et Al 2021
“Revisiting the Calibration of Modern Neural Networks”, 2021-06-15 ( ; similar)
“HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units”, Et Al 2021
“HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units”, 2021-06-14 (similar)
“Partial Success in Closing the Gap between Human and Machine Vision”, Et Al 2021
“Partial success in closing the gap between human and machine vision”, 2021-06-14 ( ; backlinks; similar; bibliography)
“Scaling Laws for Acoustic Models”, 2021
“Scaling Laws for Acoustic Models”, 2021-06-11 ( ; similar; bibliography)
“Knowledge Distillation: A Good Teacher Is Patient and Consistent”, Et Al 2021
“Knowledge distillation: A good teacher is patient and consistent”, 2021-06-09 ( ; similar; bibliography)
“CoAtNet: Marrying Convolution and Attention for All Data Sizes”, Et Al 2021
“CoAtNet: Marrying Convolution and Attention for All Data Sizes”, 2021-06-09 ( ; similar; bibliography)
“Scaling Vision Transformers”, Et Al 2021
“Scaling Vision Transformers”, 2021-06-08 ( ; similar; bibliography)
“Exploring the Limits of Out-of-Distribution Detection”, Et Al 2021
“Exploring the Limits of Out-of-Distribution Detection”, 2021-06-06 ( ; similar; bibliography)
“Effect of Pre-Training Scale on Intra/Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images”, 2021
“Effect of Pre-Training Scale on Intra/Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images”, 2021-05-31 ( ; similar; bibliography)
“A Universal Law of Robustness via Isoperimetry”, 2021
“A Universal Law of Robustness via Isoperimetry”, 2021-05-26 ( ; backlinks; similar; bibliography)
“Naver Unveils First ‘Hyperscale’ AI Platform”, 2021
“Naver unveils first ‘hyperscale’ AI platform”, 2021-05-25 ( ; similar; bibliography)
“Unsupervised Speech Recognition”, Et Al 2021
“Unsupervised Speech Recognition”, 2021-05-24 (similar)
“Google Details New AI Accelerator Chips”, 2021
“Google details new AI accelerator chips”, 2021-05-18 ( ; similar; bibliography)
“RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance”, Et Al 2021
“RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance”, 2021-05-18 (similar)
“MLP-Mixer: An All-MLP Architecture for Vision”, Et Al 2021
“MLP-Mixer: An all-MLP Architecture for Vision”, 2021-05-04 ( ; similar; bibliography)
“XLM-R XL: Larger-Scale Transformers for Multilingual Masked Language Modeling”, Et Al 2021
“XLM-R XL: Larger-Scale Transformers for Multilingual Masked Language Modeling”, 2021-05-02 (similar; bibliography)
“Scaling End-to-End Models for Large-Scale Multilingual ASR”, Et Al 2021
“Scaling End-to-End Models for Large-Scale Multilingual ASR”, 2021-04-30 ( ; similar)
“What Are Bayesian Neural Network Posteriors Really Like?”, Et Al 2021
“What Are Bayesian Neural Network Posteriors Really Like?”, 2021-04-29 ( ; similar)
“DINO: Emerging Properties in Self-Supervised Vision Transformers”, Et Al 2021
“DINO: Emerging Properties in Self-Supervised Vision Transformers”, 2021-04-29 ( ; similar)
“Machine Learning Scaling”, 2021
“Machine Learning Scaling”, 2021-04-24 ( ; backlinks; bibliography)
“Fully-Connected Neural Nets”, 2021
“Fully-Connected Neural Nets”, 2021-04-24 ( ; backlinks; similar; bibliography)
“Computer Optimization: Your Computer Is Faster Than You Think”, 2021
“Computer Optimization: Your Computer Is Faster Than You Think”, 2021-04-24 ( ; backlinks)
“[Ali Released PLUG: 27 Billion Parameters, the Largest Pre-trained Language Model in the Chinese Community]”, 2021
“[Ali released PLUG: 27 billion parameters, the largest pre-trained language model in the Chinese community]”, 2021-04-19 (similar)
“CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP”, Et Al 2021
“CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP”, 2021-04-18 ( ; similar)
“Revealing Persona Biases in Dialogue Systems”, Et Al 2021
“Revealing Persona Biases in Dialogue Systems”, 2021-04-18 (backlinks; similar)
“The Power of Scale for Parameter-Efficient Prompt Tuning”, Et Al 2021
“The Power of Scale for Parameter-Efficient Prompt Tuning”, 2021-04-18 ( ; similar; bibliography)
“Probing Across Time: What Does RoBERTa Know and When?”, Et Al 2021
“Probing Across Time: What Does RoBERTa Know and When?”, 2021-04-16 (backlinks; similar)
“Large-Scale Self-Supervised and Semi-Supervised Learning for Speech Translation”, Et Al 2021
“Large-Scale Self-Supervised and Semi-Supervised Learning for Speech Translation”, 2021-04-14 (similar)
“Scaling Laws for Language Transfer Learning”, 2021
“Scaling Laws for Language Transfer Learning”, 2021-04-11 ( ; similar)
“SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”, Et Al 2021
“SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”, 2021-04-05 ( ; similar)
“Understanding Robustness of Transformers for Image Classification”, Et Al 2021
“Understanding Robustness of Transformers for Image Classification”, 2021-03-26 (similar; bibliography)
“UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Et Al 2021
“UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, 2021-03-24 ( ; similar; bibliography)
“Efficient Visual Pretraining With Contrastive Detection”, Et Al 2021
“Efficient Visual Pretraining with Contrastive Detection”, 2021-03-19 (similar; bibliography)
“The Shape of Learning Curves: a Review”, 2021
“The Shape of Learning Curves: a Review”, 2021-03-19 ( ; backlinks; similar)
“Controllable Generation from Pre-trained Language Models via Inverse Prompting”, Et Al 2021
“Controllable Generation from Pre-trained Language Models via Inverse Prompting”, 2021-03-19 ( ; backlinks; similar)
“Revisiting ResNets: Improved Training and Scaling Strategies”, Et Al 2021
“Revisiting ResNets: Improved Training and Scaling Strategies”, 2021-03-13 ( ; similar; bibliography)
“Learning from Videos to Understand the World”, Et Al 2021
“Learning from videos to understand the world”, 2021-03-12 ( ; similar; bibliography)
“Fast and Accurate Model Scaling”, Et Al 2021
“Fast and Accurate Model Scaling”, 2021-03-11 ( ; similar)
“WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training”, Et Al 2021
“WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training”, 2021-03-11 ( ; backlinks; similar)
“Pretrained Transformers As Universal Computation Engines”, Et Al 2021
“Pretrained Transformers as Universal Computation Engines”, 2021-03-09 ( ; backlinks; similar)
“Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, Et Al 2021
“Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, 2021-03-06 ( ; similar)
“A Law of Robustness for Two-layers Neural Networks”, Et Al 2021
“A law of robustness for two-layers neural networks”, 2021-03-05 ( ; backlinks; similar)
“Measuring Mathematical Problem Solving With the MATH Dataset”, Et Al 2021
“Measuring Mathematical Problem Solving With the MATH Dataset”, 2021-03-05 ( ; backlinks; similar)
“SEER: Self-supervised Pretraining of Visual Features in the Wild”, Et Al 2021
“SEER: Self-supervised Pretraining of Visual Features in the Wild”, 2021-03-02 (similar; bibliography)
“M6: A Chinese Multimodal Pretrainer”, Et Al 2021
“M6: A Chinese Multimodal Pretrainer”, 2021-03-01 ( ; similar)
“Zero-Shot Text-to-Image Generation”, Et Al 2021
“Zero-Shot Text-to-Image Generation”, 2021-02-24 ( ; similar)
“Improved Denoising Diffusion Probabilistic Models”, 2021
“Improved Denoising Diffusion Probabilistic Models”, 2021-02-18 ( ; similar; bibliography)
“Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”, Et Al 2021
“Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”, 2021-02-17 ( ; similar)
“A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes”, Et Al 2021
“A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes”, 2021-02-12 (backlinks; similar)
“NFNet: High-Performance Large-Scale Image Recognition Without Normalization”, Et Al 2021
“NFNet: High-Performance Large-Scale Image Recognition Without Normalization”, 2021-02-11 ( ; similar; bibliography)
“ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, Et Al 2021
“ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, 2021-02-11 ( ; similar; bibliography)
“Learning Curve Theory”, 2021
“Learning Curve Theory”, 2021-02-08 (similar)
“1-bit Adam: Communication Efficient Large-Scale Training With Adam’s Convergence Speed”, Et Al 2021
“1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed”, 2021-02-04 ( ; similar; bibliography)
“Scaling Laws for Transfer”, Et Al 2021
“Scaling Laws for Transfer”, 2021-02-02 ( ; similar)
“Muppet: Massive Multi-task Representations With Pre-Finetuning”, Et Al 2021
“Muppet: Massive Multi-task Representations with Pre-Finetuning”, 2021-01-26 ( ; similar)
“Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning”, Et Al 2021
“Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning”, 2021-01-26 ( ; backlinks; similar)
“Language Processing in Brains and Deep Neural Networks: Computational Convergence and Its Limits”, 2021
“Language processing in brains and deep neural networks: computational convergence and its limits”, 2021-01-14 ( ; backlinks; similar)
“Meta Pseudo Labels”, Et Al 2021
“Meta Pseudo Labels”, 2021-01-05 ( ; similar; bibliography)
“VinVL: Revisiting Visual Representations in Vision-Language Models”, Et Al 2021
“VinVL: Revisiting Visual Representations in Vision-Language Models”, 2021-01-02 (similar)
“CDLM: Cross-Document Language Modeling”, Et Al 2021
“CDLM: Cross-Document Language Modeling”, 2021-01-02 ( ; backlinks; similar)
“VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation”, Et Al 2021
“VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation”, 2021-01-02 (similar)
“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, 2021
“Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets”, 2021 ( ; similar)
“Extrapolating GPT-N Performance”, 2020
“Extrapolating GPT-N performance”, 2020-12-18 (backlinks; similar; bibliography)
“Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences”, Et Al 2020
“Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences”, 2020-12-15 ( ; similar)
“CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Et Al 2020
“CPM: A Large-scale Generative Chinese Pre-trained Language Model”, 2020-12-01 ( ; backlinks; similar)
“Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, 2020
“Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, 2020-11-20 ( ; similar; bibliography)
“When Do You Need Billions of Words of Pretraining Data?”, Et Al 2020
“When Do You Need Billions of Words of Pretraining Data?”, 2020-11-09 (backlinks; similar)
“ML Scaling Subreddit”, 2020
“ML Scaling subreddit”, 2020-10-30 (backlinks; similar)
“Scaling Laws for Autoregressive Generative Modeling”, Et Al 2020
“Scaling Laws for Autoregressive Generative Modeling”, 2020-10-28 ( ; similar; bibliography)
“Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, Et Al 2020
“Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, 2020-10-27 (similar)
“MT5: A Massively Multilingual Pre-trained Text-to-text Transformer”, Et Al 2020
“mT5: A massively multilingual pre-trained text-to-text transformer”, 2020-10-22 ( ; similar)
“Beyond English-Centric Multilingual Machine Translation”, Et Al 2020
“Beyond English-Centric Multilingual Machine Translation”, 2020-10-21 (similar)
“Towards End-to-End In-Image Neural Machine Translation”, Et Al 2020
“Towards End-to-End In-Image Neural Machine Translation”, 2020-10-20 ( ; similar)
“Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, Et Al 2020
“Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, 2020-10-20 ( ; similar; bibliography)
“The First AI Model That Translates 100 Languages without Relying on English Data”, 2020
“The first AI model that translates 100 languages without relying on English data”, 2020-10-19 (similar; bibliography)
“The Deep Bootstrap Framework: Good Online Learners Are Good Offline Generalizers”, Et Al 2020
“The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers”, 2020-10-16 (similar)
“WinoGrande: An Adversarial Winograd Schema Challenge at Scale”, Et Al 2020
“WinoGrande: An Adversarial Winograd Schema Challenge at Scale”, 2020-10-16 (similar)
“Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)”, Et Al 2020
“Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)”, 2020-10-11 (backlinks; similar)
“The Neural Architecture of Language: Integrative Reverse-engineering Converges on a Model for Predictive Processing”, Et Al 2020
“The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing”, 2020-10-09 ( ; backlinks; similar)
“Fast Stencil-Code Computation on a Wafer-Scale Processor”, Et Al 2020
“Fast Stencil-Code Computation on a Wafer-Scale Processor”, 2020-10-07 ( ; similar)
“Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, Et Al 2020
“Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples”, 2020-10-07 ( ; similar)
“Vision Transformer: An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale”, Et Al 2020
“Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale”, 2020-09-28 ( ; similar; bibliography)
“Small Data, Big Decisions: Model Selection in the Small-Data Regime”, Et Al 2020
“Small Data, Big Decisions: Model Selection in the Small-Data Regime”, 2020-09-26 (similar)
“New Report on How Much Computational Power It Takes to Match the Human Brain”, 2020
“New Report on How Much Computational Power It Takes to Match the Human Brain”, 2020-09-11 ( ; similar; bibliography)
“Generative Language Modeling for Automated Theorem Proving”, 2020
“Generative Language Modeling for Automated Theorem Proving”, 2020-09-07 ( ; similar; bibliography)
“GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce”, Et Al 2020
“GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce”, 2020-08-22 (similar)
“Accuracy and Performance Comparison of Video Action Recognition Approaches”, Et Al 2020
“Accuracy and Performance Comparison of Video Action Recognition Approaches”, 2020-08-20 ( ; similar; bibliography)
“Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Et Al 2020
“Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, 2020-08-17 ( ; similar)
“Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
“Matt Botvinick on the spontaneous emergence of learning algorithms”, 2020-08-12 ( ; backlinks; similar; bibliography)
“Self-supervised Learning through the Eyes of a Child”, Et Al 2020
“Self-supervised learning through the eyes of a child”, 2020-07-31 ( ; similar)
“Hopfield Networks Is All You Need”, Et Al 2020
“Hopfield Networks is All You Need”, 2020-07-16 ( ; backlinks; similar; bibliography)
“On Robustness and Transferability of Convolutional Neural Networks”, Et Al 2020
“On Robustness and Transferability of Convolutional Neural Networks”, 2020-07-16 ( ; similar)
“ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Et Al 2020
“ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, 2020-07-13 ( ; backlinks; similar; bibliography)
“NVAE: A Deep Hierarchical Variational Autoencoder”, 2020
“NVAE: A Deep Hierarchical Variational Autoencoder”, 2020-07-08 ( ; similar; bibliography)
“Measuring Robustness to Natural Distribution Shifts in Image Classification”, Et Al 2020
“Measuring Robustness to Natural Distribution Shifts in Image Classification”, 2020-07-01 (backlinks; similar)
“Is SGD a Bayesian Sampler? Well, Almost”, Et Al 2020
“Is SGD a Bayesian sampler? Well, almost”, 2020-06-26 ( ; similar)
“Unsupervised Cross-lingual Representation Learning for Speech Recognition”, Et Al 2020
“Unsupervised Cross-lingual Representation Learning for Speech Recognition”, 2020-06-24 (similar)
“Logarithmic Pruning Is All You Need”, Et Al 2020
“Logarithmic Pruning is All You Need”, 2020-06-22 ( ; similar)
“Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations”, Et Al 2020
“wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations”, 2020-06-20 ( ; similar)
“Denoising Diffusion Probabilistic Models”, Et Al 2020
“Denoising Diffusion Probabilistic Models”, 2020-06-19 ( ; backlinks; similar)
“GPT-3 Creative Fiction”, 2020
“GPT-3 Creative Fiction”, 2020-06-19 ( ; backlinks; similar; bibliography)
“On the Predictability of Pruning Across Scales”, Et Al 2020
“On the Predictability of Pruning Across Scales”, 2020-06-18 ( ; backlinks; similar; bibliography)
“Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Et Al 2020
“Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples”, 2020-06-17 ( ; backlinks; similar; bibliography)
“SimCLRv2: Big Self-Supervised Models Are Strong Semi-Supervised Learners”, Et Al 2020
“SimCLRv2: Big Self-Supervised Models are Strong Semi-Supervised Learners”, 2020-06-17 ( ; similar)
“SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments”, Et Al 2020
“SwAV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments”, 2020-06-17 (similar; bibliography)
“IGPT: Generative Pretraining from Pixels”, Et Al 2020
“iGPT: Generative Pretraining from Pixels”, 2020-06-17 ( ; similar; bibliography)
“Are We Done With ImageNet?”, Et Al 2020
“Are we done with ImageNet?”, 2020-06-12 (similar)
“OpenAI API”, Et Al 2020
“OpenAI API”, 2020-06-11 (backlinks; similar)
“How Big Should My Language Model Be?”, 2020
“How Big Should My Language Model Be?”, 2020-06-08 (backlinks; similar)
“Object Segmentation Without Labels With Large-Scale Generative Models”, Et Al 2020
“Object Segmentation Without Labels with Large-Scale Generative Models”, 2020-06-08 ( ; backlinks; similar)
“GPT-3 Paper § Figure F.1: Four Uncurated Completions from a Context Suggesting the Model Compose a Poem in the Style of Wallace Stevens With the Title ‘Shadows on the Way’”, GPT-3 2020 (page 48)
“GPT-3 paper § Figure F.1: Four uncurated completions from a context suggesting the model compose a poem in the style of Wallace Stevens with the title ‘Shadows on the Way’”, 2020-05-28 ( ; similar)
“The Scaling Hypothesis”, 2020
“The Scaling Hypothesis”, 2020-05-28 ( ; backlinks; similar; bibliography)
“Danny Hernandez on Forecasting and the Drivers of AI Progress”, Et Al 2020
“Danny Hernandez on forecasting and the drivers of AI progress”, 2020-05-22 ( ; similar)
“ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale”, 2020
“ZeRO-2 & DeepSpeed: Shattering barriers of deep learning speed & scale”, 2020-05-19 (backlinks; similar; bibliography)
“Powered by AI: Advancing Product Understanding and Building New Shopping Experiences”, Et Al 2020
“Powered by AI: Advancing product understanding and building new shopping experiences”, 2020-05-19 (similar)
“Measuring the Algorithmic Efficiency of Neural Networks”, 2020
“Measuring the Algorithmic Efficiency of Neural Networks”, 2020-05-08 ( ; similar)
“Pushing the Limit of Molecular Dynamics With Ab Initio Accuracy to 100 Million Atoms With Machine Learning”, Et Al 2020
“Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning”, 2020-05-01 (backlinks; similar)
“Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Et Al 2020
“Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.”, 2020-04-30 ( ; backlinks; similar; bibliography)
“Blender: A State-of-the-art Open Source Chatbot”, Et Al 2020
“Blender: A state-of-the-art open source chatbot”, 2020-04-29 ( ; similar; bibliography)
“A Review of Winograd Schema Challenge Datasets and Approaches”, Et Al 2020
“A Review of Winograd Schema Challenge Datasets and Approaches”, 2020-04-23 (backlinks; similar)
“DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications”, Et Al 2020
“DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications”, 2020-04-17 (similar)
“PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation”, Et Al 2020
“PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation”, 2020-04-14 (similar; bibliography)
“Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems”, Et Al 2020
“Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems”, 2020-03-20 ( ; similar)
“A Metric Learning Reality Check”, Et Al 2020
“A Metric Learning Reality Check”, 2020-03-18
“TTTTTackling WinoGrande Schemas”, Et Al 2020
“TTTTTackling WinoGrande Schemas”, 2020-03-18 ( ; similar)
“Zoom In: An Introduction to Circuits—By Studying the Connections between Neurons, We Can Find Meaningful Algorithms in the Weights of Neural Networks”, Et Al 2020
“Zoom In: An Introduction to Circuits—By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks”, 2020-03-10 ( ; similar)
“Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, Et Al 2020
“Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, 2020-03-04 ( ; backlinks; similar)
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, Et Al 2020
“Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers”, 2020-02-26 ( ; backlinks; similar)
“Rethinking Bias-Variance Trade-off for Generalization of Neural Networks”, Et Al 2020
“Rethinking Bias-Variance Trade-off for Generalization of Neural Networks”, 2020-02-26 (backlinks; similar)
“The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
“The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism”, 2020-02-17 ( ; backlinks; similar; bibliography)
“A Simple Framework for Contrastive Learning of Visual Representations”, Et Al 2020
“A Simple Framework for Contrastive Learning of Visual Representations”, 2020-02-13 ( ; similar; bibliography)
“Turing-NLG: A 17-billion-parameter Language Model by Microsoft”, 2020
“Turing-NLG: A 17-billion-parameter language model by Microsoft”, 2020-02-10 (backlinks; similar; bibliography)
“How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, Et Al 2020
“How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, 2020-02-10 ( ; similar)
“Impact of ImageNet Model Selection on Domain Adaptation”, 2020
“Impact of ImageNet Model Selection on Domain Adaptation”, 2020-02-06 (backlinks; similar)
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Et Al 2020
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, 2020-02-05 ( ; backlinks; similar)
“Towards a Conversational Agent That Can Chat About…Anything”, 2020
“Towards a Conversational Agent that Can Chat About…Anything”, 2020-01-28 ( ; similar; bibliography)
“Towards a Human-like Open-Domain Chatbot”, Et Al 2020
“Towards a Human-like Open-Domain Chatbot”, 2020-01-27 ( ; similar)
“Scaling Laws for Neural Language Models”, Et Al 2020
“Scaling Laws for Neural Language Models”, 2020-01-23 ( ; similar; bibliography)
“Big Transfer (BiT): General Visual Representation Learning”, Et Al 2019
“Big Transfer (BiT): General Visual Representation Learning”, 2019-12-24 (similar)
“Deep Double Descent: We Show That the Double Descent Phenomenon Occurs in CNNs, ResNets, and Transformers: Performance First Improves, Then Gets Worse, and Then Improves Again With Increasing Model Size, Data Size, or Training Time. This Effect Is Often Avoided through Careful Regularization. While This Behavior Appears to Be Fairly Universal, We Don’t yet Fully Understand Why It Happens, and View Further Study of This Phenomenon As an Important Research Direction.”, Et Al 2019
“Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction.”, 2019-12-05 ( ; backlinks; similar; bibliography)
“12-in-1: Multi-Task Vision and Language Representation Learning”, Et Al 2019
“12-in-1: Multi-Task Vision and Language Representation Learning”, 2019-12-05 ( ; similar)
“Deep Double Descent: Where Bigger Models and More Data Hurt”, Et Al 2019
“Deep Double Descent: Where Bigger Models and More Data Hurt”, 2019-12-04 (similar)
“Understanding the Generalization Of ‘Lottery Tickets’ In Neural Networks”, 2019
“Understanding the generalization of ‘lottery tickets’ in neural networks”, 2019-11-25 ( ; backlinks; similar)
“Momentum Contrast for Unsupervised Visual Representation Learning”, Et Al 2019
“Momentum Contrast for Unsupervised Visual Representation Learning”, 2019-11-13 (similar; bibliography)
“The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design”, 2019
“The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design”, 2019-11-13 (similar)
“Self-training With Noisy Student Improves ImageNet Classification”, Et Al 2019
“Self-training with Noisy Student improves ImageNet classification”, 2019-11-11 ( ; similar; bibliography)
“CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs”, El-Et Al 2019
“CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs”, 2019-11-10 (similar)
“CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB”, Et Al 2019
“CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB”, 2019-11-10 (similar)
“XLM-R: State-of-the-art Cross-lingual Understanding through Self-supervision”, FAIR 2019
“XLM-R: State-of-the-art cross-lingual understanding through self-supervision”, 2019-11-07 (similar)
“Unsupervised Cross-lingual Representation Learning at Scale”, Et Al 2019
“Unsupervised Cross-lingual Representation Learning at Scale”, 2019-11-05 ( ; similar; bibliography)
“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks”, Et Al 2019
“High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks”, 2019-11-05 ( ; similar)
“Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, Et Al 2019
“Grandmaster level in StarCraft II using multi-agent reinforcement learning”, 2019-10-30 ( ; similar; bibliography)
“T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Et Al 2019
“T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”, 2019-10-23 ( ; similar)
“ZeRO: Memory Optimizations Toward Training Trillion Parameter Models”, Et Al 2019
“ZeRO: Memory Optimizations Toward Training Trillion Parameter Models”, 2019-10-04 (similar; bibliography)
“Environmental Drivers of Systematicity and Generalization in a Situated Agent”, Et Al 2019
“Environmental drivers of systematicity and generalization in a situated agent”, 2019-10-01 (similar)
“A Constructive Prediction of the Generalization Error Across Scales”, Et Al 2019
“A Constructive Prediction of the Generalization Error Across Scales”, 2019-09-27 (backlinks; similar)
“Large-scale Pretraining for Neural Machine Translation With Tens of Billions of Sentence Pairs”, Et Al 2019
“Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs”, 2019-09-26 (similar)
“UNITER: UNiversal Image-TExt Representation Learning”, Et Al 2019
“UNITER: UNiversal Image-TExt Representation Learning”, 2019-09-25 (backlinks; similar; bibliography)
“Exascale Deep Learning for Scientific Inverse Problems”, Et Al 2019
“Exascale Deep Learning for Scientific Inverse Problems”, 2019-09-24 (backlinks; similar)
“Simple, Scalable Adaptation for Neural Machine Translation”, Et Al 2019
“Simple, Scalable Adaptation for Neural Machine Translation”, 2019-09-18 (similar)
“CTRL: A Conditional Transformer Language Model For Controllable Generation”, Et Al 2019
“CTRL: A Conditional Transformer Language Model For Controllable Generation”, 2019-09-11 ( ; similar; bibliography)
“Show Your Work: Improved Reporting of Experimental Results”, Et Al 2019
“Show Your Work: Improved Reporting of Experimental Results”, 2019-09-06 (similar)
“MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
“MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, 2019-08-13 ( ; backlinks; similar; bibliography)
“RoBERTa: A Robustly Optimized BERT Pretraining Approach”, Et Al 2019
“RoBERTa: A Robustly Optimized BERT Pretraining Approach”, 2019-07-26 ( ; similar; bibliography)
“Robustness Properties of Facebook’s ResNeXt WSL Models”, 2019
“Robustness properties of Facebook’s ResNeXt WSL models”, 2019-07-17 ( ; backlinks; similar)
“Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges”, Et Al 2019
“Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges”, 2019-07-11 (similar)
“Large Scale Adversarial Representation Learning”, 2019
“Large Scale Adversarial Representation Learning”, 2019-07-04 ( ; backlinks; similar)
“One Epoch Is All You Need”, 2019
“One Epoch Is All You Need”, 2019-06-19 ( ; backlinks; similar; bibliography)
“Does Learning Require Memorization? A Short Tale about a Long Tail”, 2019
“Does Learning Require Memorization? A Short Tale about a Long Tail”, 2019-06-12 (similar)
“Intriguing Properties of Adversarial Training at Scale”, 2019
“Intriguing properties of adversarial training at scale”, 2019-06-10 ( ; backlinks; similar)
“Scaling Autoregressive Video Models”, Et Al 2019
“Scaling Autoregressive Video Models”, 2019-06-06 ( ; similar)
“A Mathematical Theory of Semantic Development in Deep Neural Networks”, Et Al 2019
“A mathematical theory of semantic development in deep neural networks”, 2019-06-04 ( ; backlinks; similar)
“Adversarially Robust Generalization Just Requires More Unlabeled Data”, Et Al 2019
“Adversarially Robust Generalization Just Requires More Unlabeled Data”, 2019-06-03 ( ; similar)
“ICML 2019 Notes”, 2019
“ICML 2019 Notes”, 2019-06 ( ; similar; bibliography)
“SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”, Et Al 2019
“SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers”, 2019-05-28 ( ; backlinks; similar)
“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, 2019
“EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, 2019-05-28 ( ; similar; bibliography)
“UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation”, Et Al 2019
“UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation”, 2019-05-08 ( ; backlinks; similar; bibliography)
“Billion-scale Semi-supervised Learning for Image Classification”, Et Al 2019
“Billion-scale semi-supervised learning for image classification”, 2019-05-02 ( ; similar; bibliography)
“VideoBERT: A Joint Model for Video and Language Representation Learning”, Et Al 2019
“VideoBERT: A Joint Model for Video and Language Representation Learning”, 2019-04-03 ( ; similar)
“Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, 2019
“Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, 2019-03-28 ( ; backlinks; similar)
“Surprises in High-Dimensional Ridgeless Least Squares Interpolation”, Et Al 2019
“Surprises in High-Dimensional Ridgeless Least Squares Interpolation”, 2019-03-19 ( ; similar)
“The Bitter Lesson”, 2019
“The Bitter Lesson”, 2019-03-13 ( ; backlinks; similar)
“Deep Learning Hardware: Past, Present, & Future”, 2019
“Deep Learning Hardware: Past, Present, & Future”, 2019-02-18 (backlinks)
“Better Language Models and Their Implications”, Et Al 2019
“Better Language Models and Their Implications”, 2019-02-14 ( ; backlinks; similar; bibliography)
“Language Models Are Unsupervised Multitask Learners”, Et Al 2019
“Language Models are Unsupervised Multitask Learners”, 2019-02-14 ( ; similar)
“Do ImageNet Classifiers Generalize to ImageNet?”, Et Al 2019
“Do ImageNet Classifiers Generalize to ImageNet?”, 2019-02-13 (backlinks; similar)
“Cross-lingual Language Model Pretraining”, 2019
“Cross-lingual Language Model Pretraining”, 2019-01-22 (similar)
“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos”, Et Al 2019
“High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks: Videos”, 2019 ( )
“Reconciling Modern Machine Learning Practice and the Bias-variance Trade-off”, Et Al 2018
“Reconciling modern machine learning practice and the bias-variance trade-off”, 2018-12-28 (backlinks; similar)
“Nocaps: Novel Object Captioning at Scale”, Et Al 2018
“nocaps: novel object captioning at scale”, 2018-12-20 (backlinks; similar)
“How AI Training Scales”, Et Al 2018
“How AI Training Scales”, 2018-12-14 ( ; backlinks; similar; bibliography)
“Is Science Slowing Down?”, 2018
“Is Science Slowing Down?”, 2018-11-26 (backlinks; similar; bibliography)
“WBE and DRL: a Middle Way of Imitation Learning from the Human Brain”, 2018
“WBE and DRL: a Middle Way of imitation learning from the human brain”, 2018-10-20 ( ; backlinks; similar)
“BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M”, Et Al 2018 (page 8 Org Deepmind)
“BigGAN: Large Scale GAN Training For High Fidelity Natural Image Synthesis § 5.2 Additional Evaluation On JFT-300M”, 2018-09-28 ( ; similar)
“Large Scale GAN Training for High Fidelity Natural Image Synthesis”, Et Al 2018
“Large Scale GAN Training for High Fidelity Natural Image Synthesis”, 2018-09-28 ( ; similar)
“Measurement Invariance Explains the Universal Law of Generalization for Psychological Perception”, 2018
“Measurement invariance explains the universal law of generalization for psychological perception”, 2018-09-25 ( ; similar)
“CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, Et Al 2018
“CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, 2018-08-03 ( ; backlinks; similar; bibliography)
“Large-Scale Visual Speech Recognition”, Et Al 2018
“Large-Scale Visual Speech Recognition”, 2018-07-13 ( ; similar)
“Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, 2018
“Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, 2018-07-04 ( ; backlinks; similar)
“Neural Scene Representation and Rendering”, Et Al 2018
“Neural scene representation and rendering”, 2018-06-15 ( ; similar)
“GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
“GPT-1: Improving Language Understanding with Unsupervised Learning”, 2018-06-11 ( ; backlinks; similar)
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Et Al 2018 (page 5)
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications”, 2018-06-08 ( ; similar; bibliography)
“GPT-1: Improving Language Understanding by Generative Pre-Training”, Et Al 2018
“GPT-1: Improving Language Understanding by Generative Pre-Training”, 2018-06-08 ( ; backlinks; similar)
“Do CIFAR-10 Classifiers Generalize to CIFAR-10?”, Et Al 2018
“Do CIFAR-10 Classifiers Generalize to CIFAR-10?”, 2018-06-01 ( ; backlinks; similar)
“Deep Learning Generalizes Because the Parameter-function Map Is Biased towards Simple Functions”, Valle-Et Al 2018
“Deep learning generalizes because the parameter-function map is biased towards simple functions”, 2018-05-22 ( ; similar)
“Google DeepMind Founder and Leader in Artificial Intelligence Returns to Hamilton”, 2018
“Google DeepMind founder and leader in artificial intelligence returns to Hamilton”, 2018-05-07 (backlinks; similar)
“Exploring the Limits of Weakly Supervised Pretraining”, Et Al 2018
“Exploring the Limits of Weakly Supervised Pretraining”, 2018-05-02 ( ;