‘meta-learning’ directory

Gwern

‘meta-learning’ directory

Gwern

“Free-Play Periods for RL Agents ”, Gwern 2023

Free-Play Periods for RL Agents

“WBE & DRL: a Middle Way of Imitation Learning on Brains ”, Gwern 2018

WBE & DRL: a Middle Way of imitation learning on brains⁠

Links

“It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization ”, Behrouz et al 2025

⁠It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization⁠

“RWKV-7 ‘Goose’ With Expressive Dynamic State Evolution ”, Peng et al 2025

⁠RWKV-7 ‘Goose’ with Expressive Dynamic State Evolution⁠

“Meta-Statistical Learning: Supervised Learning of Statistical Inference ”, Peyrard & Cho 2025

Meta-Statistical Learning: Supervised Learning of Statistical Inference⁠

“Competitive Programming With Large Reasoning Models ”, El-Kishky et al 2025

Competitive Programming with Large Reasoning Models⁠

“Introducing Deep Research: An Agent That Uses Reasoning to Synthesize Large Amounts of Online Information and Complete Multi-Step Research Tasks for You. Available to Pro Users Today, Plus and Team Next ”, OpenAI 2025

Introducing Deep Research: An agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you. Available to Pro users today, Plus and Team next⁠

“Test-Time Regression: a Unifying Framework for Designing Sequence Models With Associative Memory ”, Wang et al 2025

⁠Test-time regression: a unifying framework for designing sequence models with associative memory⁠

“Where Does In-Context Learning Happen in Large Language Models? ”, Sia et al 2025

Where does In-context Learning Happen in Large Language Models?⁠

“Metadata Conditioning Accelerates Language Model Pre-Training ”, Gao et al 2025

Metadata Conditioning Accelerates Language Model Pre-training⁠

“Titans: Learning to Memorize at Test Time ”, Behrouz et al 2024

Titans: Learning to Memorize at Test Time⁠

“ICLR: In-Context Learning of Representations ”, Park et al 2024

ICLR: In-Context Learning of Representations⁠

“Improving Factuality With Explicit Working Memory ”, Chen et al 2024

Improving Factuality with Explicit Working Memory⁠

“Gated Delta Networks: Improving Mamba-2 With Delta Rule ”, Yang et al 2024

⁠Gated Delta Networks: Improving Mamba-2 with Delta Rule⁠

“Continuous Autoregressive Models With Noise Augmentation Avoid Error Accumulation ”, Pasini et al 2024

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation⁠

“State-Space Models Can Learn In-Context by Gradient Descent ”, Sushma et al 2024

State-space models can learn in-context by gradient descent⁠

“Thinking LLMs: General Instruction Following With Thought Generation ”, Wu et al 2024

Thinking LLMs: General Instruction Following with Thought Generation⁠

“MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering ”, Chan et al 2024

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering⁠

“Contextual Document Embeddings ”, Morris & Rush 2024

Contextual Document Embeddings⁠

“Generating Diverse and Reliable Features for Few-Shot Learning ”, Xu 2024b

Generating Diverse and Reliable Features for Few-Shot Learning⁠

“Tamper-Resistant Safeguards for Open-Weight LLMs ”, Tamirisa et al 2024

⁠Tamper-Resistant Safeguards for Open-Weight LLMs⁠

“When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models ”, Chang et al 2024

When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models⁠

“Probing the Decision Boundaries of In-Context Learning in Large Language Models ”, Zhao et al 2024

Probing the Decision Boundaries of In-context Learning in Large Language Models⁠

“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models ”, Denison et al 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models⁠

“Discovering Preference Optimization Algorithms With and for Large Language Models ”, Lu et al 2024

Discovering Preference Optimization Algorithms with and for Large Language Models⁠

“State Soup: In-Context Skill Learning, Retrieval and Mixing ”, Pióro et al 2024

State Soup: In-Context Skill Learning, Retrieval and Mixing⁠

“Attention As a Hypernetwork ”, Schug et al 2024

Attention as a Hypernetwork⁠

“BERTs Are Generative In-Context Learners ”, Samuel 2024

BERTs are Generative In-Context Learners⁠

“To Believe or Not to Believe Your LLM ”, Yadkori et al 2024

To Believe or Not to Believe Your LLM⁠

“Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks ”, He et al 2024

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks⁠

“Auto Evol-Instruct: Automatic Instruction Evolving for Large Language Models ”, Zeng et al 2024

Auto Evol-Instruct: Automatic Instruction Evolving for Large Language Models⁠

“A Theoretical Understanding of Self-Correction through In-Context Alignment ”, Wang et al 2024

A Theoretical Understanding of Self-Correction through In-context Alignment⁠

“MLPs Learn In-Context ”, Tong & Pehlevan 2024

MLPs Learn In-Context⁠

“LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language ”, Requeima et al 2024

⁠LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language⁠

“Zero-Shot Tokenizer Transfer ”, Minixhofer et al 2024

Zero-Shot Tokenizer Transfer⁠

“What Exactly Has TabPFN Learned to Do? ”, McCarter 2024

What exactly has TabPFN learned to do?

“Position: Understanding LLMs Requires More Than Statistical Generalization ”, Reizinger et al 2024

Position: Understanding LLMs Requires More Than Statistical Generalization⁠

“SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-Trained Models ”, Deng et al 2024

SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models⁠

“Many-Shot In-Context Learning ”, Agarwal et al 2024

Many-Shot In-Context Learning⁠

“Foundational Challenges in Assuring Alignment and Safety of Large Language Models ”, Anwar et al 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models⁠

“Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution ”, Mahdavi et al 2024

Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution⁠

“Best Practices and Lessons Learned on Synthetic Data for Language Models ”, Liu et al 2024

Best Practices and Lessons Learned on Synthetic Data for Language Models⁠

“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”, Vacareanu et al 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples⁠

“Mixture-Of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models ”, Raposo et al 2024

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models⁠

“Evolutionary Optimization of Model Merging Recipes ”, Akiba et al 2024

Evolutionary Optimization of Model Merging Recipes⁠

“How Well Can Transformers Emulate In-Context Newton’s Method? ”, Giannou et al 2024

How Well Can Transformers Emulate In-context Newton’s Method?⁠

“Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models ”, Rannen-Triki et al 2024

Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models⁠

“Neural Network Parameter Diffusion ”, Wang et al 2024

Neural Network Parameter Diffusion⁠

“The Matrix: A Bayesian Learning Model for LLMs ”, Dalal & Misra 2024

The Matrix: A Bayesian learning model for LLMs⁠

“Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling ”, Maini et al 2024

Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling⁠

“An Information-Theoretic Analysis of In-Context Learning ”, Jeon et al 2024

An Information-Theoretic Analysis of In-Context Learning⁠

“Deep De Finetti: Recovering Topic Distributions from Large Language Models ”, Zhang et al 2023

Deep de Finetti: Recovering Topic Distributions from Large Language Models⁠

“Generative Multimodal Models Are In-Context Learners ”, Sun et al 2023

Generative Multimodal Models are In-Context Learners⁠

“VILA: On Pre-Training for Visual Language Models ”, Lin et al 2023

VILA: On Pre-training for Visual Language Models⁠

“Evolving Reservoirs for Meta Reinforcement Learning ”, Léger et al 2023

Evolving Reservoirs for Meta Reinforcement Learning⁠

“The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning ”, Lin et al 2023

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning⁠

“Learning Few-Shot Imitation As Cultural Transmission ”, Bhoopchand et al 2023

Learning few-shot imitation as cultural transmission⁠

“In-Context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering ”, Liu et al 2023

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering⁠

“Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves ”, Deng et al 2023

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves⁠

“ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models ”, Luo et al 2023

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models⁠

“Self-AIXI: Self-Predictive Universal AI ”, Catt et al 2023

Self-AIXI: Self-Predictive Universal AI⁠

“HyperFields: Towards Zero-Shot Generation of NeRFs from Text ”, Babu et al 2023

HyperFields: Towards Zero-Shot Generation of NeRFs from Text⁠

“Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study With Linear Models ”, Fu et al 2023

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models⁠

“Eureka: Human-Level Reward Design via Coding Large Language Models ”, Ma et al 2023

Eureka: Human-Level Reward Design via Coding Large Language Models⁠

“How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? ”, Wu et al 2023

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?⁠

“Motif: Intrinsic Motivation from Artificial Intelligence Feedback ”, Klissarov et al 2023

Motif: Intrinsic Motivation from Artificial Intelligence Feedback⁠

“ExpeL: LLM Agents Are Experiential Learners ”, Zhao et al 2023

ExpeL: LLM Agents Are Experiential Learners⁠

“Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”, Zahavy et al 2023

Diversifying AI: Towards Creative Chess with AlphaZero (AZ_db)⁠

“RAVEN: In-Context Learning With Retrieval-Augmented Encoder-Decoder Language Models ”, Huang et al 2023

RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models⁠

“CausalLM Is Not Optimal for In-Context Learning ”, Ding et al 2023

CausalLM is not optimal for in-context learning⁠

“MetaDiff: Meta-Learning With Conditional Diffusion for Few-Shot Learning ”, Zhang & Yu 2023

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning⁠

“Self Expanding Neural Networks ”, Mitchell et al 2023

Self Expanding Neural Networks⁠

“Teaching Arithmetic to Small Transformers ”, Lee et al 2023

Teaching Arithmetic to Small Transformers⁠

“One Step of Gradient Descent Is Provably the Optimal In-Context Learner With One Layer of Linear Self-Attention ”, Mahankali et al 2023

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention⁠

“Trainable Transformer in Transformer ”, Panigrahi et al 2023

Trainable Transformer in Transformer⁠

“Supervised Pretraining Can Learn In-Context Reinforcement Learning ”, Lee et al 2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning⁠

“Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression ”, Raventós et al 2023

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression⁠

“Language Models Are Weak Learners ”, Manikandan et al 2023

Language models are weak learners⁠

“Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks ”, Chevalier-Boisvert et al 2023

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks⁠

“Improving Long-Horizon Imitation Through Instruction Prediction ”, Hejna et al 2023

Improving Long-Horizon Imitation Through Instruction Prediction⁠

“Schema-Learning and Rebinding As Mechanisms of In-Context Learning and Emergence ”, Swaminathan et al 2023

Schema-learning and rebinding as mechanisms of in-context learning and emergence⁠

“RGD: Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization ”, Kumar et al 2023

RGD: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization⁠

“Transformers Learn to Implement Preconditioned Gradient Descent for In-Context Learning ”, Ahn et al 2023

Transformers learn to implement preconditioned gradient descent for in-context learning⁠

“Learning Transformer Programs ”, Friedman et al 2023

Learning Transformer Programs⁠

“Fundamental Limitations of Alignment in Large Language Models ”, Wolf et al 2023

Fundamental Limitations of Alignment in Large Language Models⁠

“How Well Do Large Language Models Perform in Arithmetic Tasks? ”, Yuan et al 2023

How well do Large Language Models perform in Arithmetic tasks?⁠

“Larger Language Models Do In-Context Learning Differently ”, Wei et al 2023

Larger language models do in-context learning differently⁠

“BiLD: Big Little Transformer Decoder ”, Kim et al 2023

BiLD: Big Little Transformer Decoder⁠

“Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery ”, Wen et al 2023

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery⁠

“Looped Transformers As Programmable Computers ”, Giannou et al 2023

Looped Transformers as Programmable Computers⁠

“A Survey of Meta-Reinforcement Learning ”, Beck et al 2023

A Survey of Meta-Reinforcement Learning⁠

“Human-Like Systematic Generalization through a Meta-Learning Neural Network ”, Lake & Baroni 2023

Human-like systematic generalization through a meta-learning neural network⁠

“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers ”, Dai et al 2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers⁠

“Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor ”, Honovich et al 2022

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor⁠

“Rethinking the Role of Scale for In-Context Learning: An Interpretability-Based Case Study at 66 Billion Scale ”, Bansal et al 2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale⁠

“Transformers Learn In-Context by Gradient Descent ”, Oswald et al 2022

Transformers learn in-context by gradient descent⁠

“FWL: Meta-Learning Fast Weight Language Models ”, Clark et al 2022

FWL: Meta-Learning Fast Weight Language Models⁠

“What Learning Algorithm Is In-Context Learning? Investigations With Linear Models ”, Akyürek et al 2022

What learning algorithm is in-context learning? Investigations with linear models⁠

“Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models ”, Henderson et al 2022

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models⁠

“VeLO: Training Versatile Learned Optimizers by Scaling Up ”, Metz et al 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up⁠

“Mysteries of Mode Collapse § Inescapable Wedding Parties ”, Janus 2022

Mysteries of mode collapse § Inescapable wedding parties⁠

“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning ”, Muennighoff et al 2022

BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning⁠

“ProMoT: Preserving In-Context Learning Ability in Large Language Model Fine-Tuning ”, Wang et al 2022

ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning⁠

“In-Context Reinforcement Learning With Algorithm Distillation ”, Laskin et al 2022

In-context Reinforcement Learning with Algorithm Distillation⁠

“SAP: Bidirectional Language Models Are Also Few-Shot Learners ”, Patel et al 2022

SAP: Bidirectional Language Models Are Also Few-shot Learners⁠

“`g.pt`: Learning to Learn With Generative Models of Neural Network Checkpoints ”, Peebles et al 2022

g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints⁠

“Simulators ”, Janus 2022

⁠Simulators⁠

“AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model ”, Soltan et al 2022

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model⁠

“Few-Shot Adaptation Works With UnpredicTable Data ”, Chan et al 2022

Few-shot Adaptation Works with UnpredicTable Data⁠

“What Can Transformers Learn In-Context? A Case Study of Simple Function Classes ”, Garg et al 2022

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes⁠

“Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling ”, Nguyen & Grover 2022

Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling⁠

“TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data ”, Hollmann et al 2022

TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data⁠

“Offline RL Policies Should Be Trained to Be Adaptive ”, Ghosh et al 2022

Offline RL Policies Should be Trained to be Adaptive⁠

“Goal-Conditioned Generators of Deep Policies ”, Faccio et al 2022

Goal-Conditioned Generators of Deep Policies⁠

“Prompting Decision Transformer for Few-Shot Policy Generalization ”, Xu et al 2022

Prompting Decision Transformer for Few-Shot Policy Generalization⁠

“LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks ”, Dinh et al 2022

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks⁠

“RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt ”, Mindermann et al 2022

RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt⁠

“NOAH: Neural Prompt Search ”, Zhang et al 2022

NOAH: Neural Prompt Search⁠

“Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions ”, Jiang et al 2022

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions⁠

“Towards Learning Universal Hyperparameter Optimizers With Transformers ”, Chen et al 2022

Towards Learning Universal Hyperparameter Optimizers with Transformers⁠

“Instruction Induction: From Few Examples to Natural Language Task Descriptions ”, Honovich et al 2022

Instruction Induction: From Few Examples to Natural Language Task Descriptions⁠

“Gato: A Generalist Agent ”, Reed et al 2022

Gato: A Generalist Agent⁠

“UL2: Unifying Language Learning Paradigms ”, Tay et al 2022

UL2: Unifying Language Learning Paradigms⁠

“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers ”, Chan et al 2022

Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers⁠

“Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks ”, Wang et al 2022

Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks⁠

“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? ”, Wang et al 2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?⁠

“Effective Mutation Rate Adaptation through Group Elite Selection ”, Kumar et al 2022

Effective Mutation Rate Adaptation through Group Elite Selection⁠

“Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs ”, Akin et al 2022

Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs⁠

“Can Language Models Learn from Explanations in Context? ”, Lampinen et al 2022

Can language models learn from explanations in context?⁠

“Auto-Lambda: Disentangling Dynamic Task Relationships ”, Liu et al 2022

Auto-Lambda: Disentangling Dynamic Task Relationships⁠

“In-Context Learning and Induction Heads ”, Olsson et al 2022

⁠In-context Learning and Induction Heads⁠

“HyperMixer: An MLP-Based Low Cost Alternative to Transformers ”, Mai et al 2022

HyperMixer: An MLP-based Low Cost Alternative to Transformers⁠

“LiteTransformerSearch: Training-Free Neural Architecture Search for Efficient Language Models ”, Javaheripi et al 2022

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models⁠

“Evolving Curricula With Regret-Based Environment Design ”, Parker-Holder et al 2022

Evolving Curricula with Regret-Based Environment Design⁠

“HyperPrompt: Prompt-Based Task-Conditioning of Transformers ”, He et al 2022

HyperPrompt: Prompt-based Task-Conditioning of Transformers⁠

“Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ”, Min et al 2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?⁠

“All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL ”, Arulkumaran et al 2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL⁠

“NeuPL: Neural Population Learning ”, Liu et al 2022

NeuPL: Neural Population Learning⁠

“Learning Synthetic Environments and Reward Networks for Reinforcement Learning ”, Ferreira et al 2022

Learning Synthetic Environments and Reward Networks for Reinforcement Learning⁠

“Datamodels: Predicting Predictions from Training Data ”, Ilyas et al 2022

Datamodels: Predicting Predictions from Training Data⁠

“From Data to Functa: Your Data Point Is a Function and You Should Treat It like One ”, Dupont et al 2022

From data to functa: Your data point is a function and you should treat it like one⁠

“Environment Generation for Zero-Shot Compositional Reinforcement Learning ”, Gur et al 2022

Environment Generation for Zero-Shot Compositional Reinforcement Learning⁠

“Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies ”, Gklezakos & Rao 2022

Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies⁠

“Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild ”, Miki et al 2022

Learning robust perceptive locomotion for quadrupedal robots in the wild⁠

“Automated Reinforcement Learning (AutoRL): A Survey and Open Problems ”, Parker-Holder et al 2022

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems⁠

“In Defense of the Unitary Scalarization for Deep Multi-Task Learning ”, Kurin et al 2022

In Defense of the Unitary Scalarization for Deep Multi-Task Learning⁠

“HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning ”, Zhmoginov et al 2022

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning⁠

“Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning ”, Curry et al 2022

Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning⁠

“The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence ”, Miranda et al 2021

The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence⁠

“A Mathematical Framework for Transformer Circuits ”, Elhage et al 2021

A Mathematical Framework for Transformer Circuits⁠

“PFNs: Transformers Can Do Bayesian Inference ”, Müller et al 2021

PFNs: Transformers Can Do Bayesian Inference⁠

“How to Learn and Represent Abstractions: An Investigation Using Symbolic Alchemy ”, AlKhamissi et al 2021

How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy⁠

“Noether Networks: Meta-Learning Useful Conserved Quantities ”, Alet et al 2021

Noether Networks: Meta-Learning Useful Conserved Quantities⁠

“A Rational Reinterpretation of Dual-Process Theories ”, Milli et al 2021

A rational reinterpretation of dual-process theories⁠

“A General Language Assistant As a Laboratory for Alignment ”, Askell et al 2021

A General Language Assistant as a Laboratory for Alignment⁠

“A Modern Self-Referential Weight Matrix That Learns to Modify Itself ”, Irie et al 2021

A Modern Self-Referential Weight Matrix That Learns to Modify Itself⁠

“A Survey of Generalization in Deep Reinforcement Learning ”, Kirk et al 2021

A Survey of Generalization in Deep Reinforcement Learning⁠

“Gradients Are Not All You Need ”, Metz et al 2021

Gradients are Not All You Need⁠

“An Explanation of In-Context Learning As Implicit Bayesian Inference ”, Xie et al 2021

An Explanation of In-context Learning as Implicit Bayesian Inference⁠

“Procedural Generalization by Planning With Self-Supervised World Models ”, Anand et al 2021

Procedural Generalization by Planning with Self-Supervised World Models⁠

“MetaICL: Learning to Learn In Context ”, Min et al 2021

MetaICL: Learning to Learn In Context⁠

“Logical Activation Functions: Logit-Space Equivalents of Probabilistic Boolean Operators ”, Lowe et al 2021

Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators⁠

“Shaking the Foundations: Delusions in Sequence Models for Interaction and Control ”, Ortega et al 2021

Shaking the foundations: delusions in sequence models for interaction and control⁠

“Meta-Learning, Social Cognition and Consciousness in Brains and Machines ”, Langdon et al 2021

Meta-learning, social cognition and consciousness in brains and machines⁠

“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization ”, Sanh et al 2021

T0: Multitask Prompted Training Enables Zero-Shot Task Generalization⁠

“Replay-Guided Adversarial Environment Design ”, Jiang et al 2021

Replay-Guided Adversarial Environment Design⁠

“Embodied Intelligence via Learning and Evolution ”, Gupta et al 2021

Embodied intelligence via learning and evolution⁠

“Transformers Are Meta-Reinforcement Learners ”, Anonymous 2021

Transformers are Meta-Reinforcement Learners⁠

“Scalable Online Planning via Reinforcement Learning Fine-Tuning ”, Fickinger et al 2021

Scalable Online Planning via Reinforcement Learning Fine-Tuning⁠

“Dropout’s Dream Land: Generalization from Learned Simulators to Reality ”, Wellmer & Kwok 2021

Dropout’s Dream Land: Generalization from Learned Simulators to Reality⁠

“Is Curiosity All You Need? On the Utility of Emergent Behaviors from Curious Exploration ”, Groth et al 2021

Is Curiosity All You Need? On the Utility of Emergent Behaviors from Curious Exploration⁠

“Bootstrapped Meta-Learning ”, Flennerhag et al 2021

Bootstrapped Meta-Learning⁠

“The Sensory Neuron As a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning ”, Tang & Ha 2021

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning⁠

“FLAN: Finetuned Language Models Are Zero-Shot Learners ”, Wei et al 2021

FLAN: Finetuned Language Models Are Zero-Shot Learners⁠

“The AI Economist: Optimal Economic Policy Design via Two-Level Deep Reinforcement Learning ”, Zheng et al 2021

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning⁠

“Open-Ended Learning Leads to Generally Capable Agents ”, Team et al 2021

Open-Ended Learning Leads to Generally Capable Agents⁠

“Dataset Distillation With Infinitely Wide Convolutional Networks ”, Nguyen et al 2021

Dataset Distillation with Infinitely Wide Convolutional Networks⁠

“Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability ”, Ghosh et al 2021

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability⁠

“PonderNet: Learning to Ponder ”, Banino et al 2021

PonderNet: Learning to Ponder⁠

“Multimodal Few-Shot Learning With Frozen Language Models ”, Tsimpoukelli et al 2021

Multimodal Few-Shot Learning with Frozen Language Models⁠

“LHOPT: A Generalizable Approach to Learning Optimizers ”, Almeida et al 2021

LHOPT: A Generalizable Approach to Learning Optimizers⁠

“Towards Mental Time Travel: a Hierarchical Memory for Reinforcement Learning Agents ”, Lampinen et al 2021

Towards mental time travel: a hierarchical memory for reinforcement learning agents⁠

“A Full-Stack Accelerator Search Technique for Vision Applications ”, Zhang et al 2021

A Full-stack Accelerator Search Technique for Vision Applications⁠

“Reward Is Enough ”, Silver et al 2021

Reward is enough⁠

“Bayesian Optimization Is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020 ”, Turner et al 2021

Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020⁠

“CrossFit: A Few-Shot Learning Challenge for Cross-Task Generalization in NLP ”, Ye et al 2021

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP⁠

“Podracer Architectures for Scalable Reinforcement Learning ”, Hessel et al 2021

Podracer architectures for scalable Reinforcement Learning⁠

“BLUR: Meta-Learning Bidirectional Update Rules ”, Sandler et al 2021

BLUR: Meta-Learning Bidirectional Update Rules⁠

“Asymmetric Self-Play for Automatic Goal Discovery in Robotic Manipulation ”, OpenAI et al 2021

Asymmetric self-play for automatic goal discovery in robotic manipulation⁠

“OmniNet: Omnidirectional Representations from Transformers ”, Tay et al 2021

OmniNet: Omnidirectional Representations from Transformers⁠

“Linear Transformers Are Secretly Fast Weight Programmers ”, Schlag et al 2021

Linear Transformers Are Secretly Fast Weight Programmers⁠

“Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm ”, Reynolds & McDonell 2021

Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm⁠

“ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution ”, Song et al 2021

ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution⁠

“Training Learned Optimizers With Randomly Initialized Learned Optimizers ”, Metz et al 2021

Training Learned Optimizers with Randomly Initialized Learned Optimizers⁠

“Evolving Reinforcement Learning Algorithms ”, Co-Reyes et al 2021

Evolving Reinforcement Learning Algorithms⁠

“Meta Pseudo Labels ”, Pham et al 2021

Meta Pseudo Labels⁠

“Meta Learning Backpropagation And Improving It ”, Kirsch & Schmidhuber 2020

Meta Learning Backpropagation And Improving It⁠

“Emergent Complexity and Zero-Shot Transfer via Unsupervised Environment Design [UED+PAIRED] ”, Dennis et al 2020

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [UED+PAIRED]⁠

“Scaling down Deep Learning ”, Greydanus 2020

Scaling down Deep Learning

“Reverse Engineering Learned Optimizers Reveals Known and Novel Mechanisms ”, Maheswaranathan et al 2020

Reverse engineering learned optimizers reveals known and novel mechanisms⁠

“Dataset Meta-Learning from Kernel Ridge-Regression ”, Nguyen et al 2020

Dataset Meta-Learning from Kernel Ridge-Regression⁠

“MELD: Meta-Reinforcement Learning from Images via Latent State Models ”, Zhao et al 2020

MELD: Meta-Reinforcement Learning from Images via Latent State Models⁠

“Meta-Trained Agents Implement Bayes-Optimal Agents ”, Mikulik et al 2020

Meta-trained agents implement Bayes-optimal agents⁠

“Learning Not to Learn: Nature versus Nurture in Silico ”, Lange & Sprekeler 2020

Learning not to learn: Nature versus nurture in silico⁠

“Prioritized Level Replay ”, Jiang et al 2020

Prioritized Level Replay⁠

“Tasks, Stability, Architecture, and Compute: Training More Effective Learned Optimizers, and Using Them to Train Themselves ”, Metz et al 2020

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves⁠

“Hidden Incentives for Auto-Induced Distributional Shift ”, Krueger et al 2020

Hidden Incentives for Auto-Induced Distributional Shift⁠

“It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners ”, Schick & Schütze 2020

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners⁠

“Grounded Language Learning Fast and Slow ”, Hill et al 2020

Grounded Language Learning Fast and Slow⁠

“Matt Botvinick on the Spontaneous Emergence of Learning Algorithms ”, Scholl 2020

Matt Botvinick on the spontaneous emergence of learning algorithms⁠

“Discovering Reinforcement Learning Algorithms ”, Oh et al 2020

Discovering Reinforcement Learning Algorithms⁠

“Deep Reinforcement Learning and Its Neuroscientific Implications ”, Botvinick 2020

Deep Reinforcement Learning and Its Neuroscientific Implications⁠

“Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions ”, Chang et al 2020

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions⁠

“Rapid Task-Solving in Novel Environments ”, Ritter et al 2020

Rapid Task-Solving in Novel Environments⁠

“FBNetV3: Joint Architecture-Recipe Search Using Predictor Pretraining ”, Dai et al 2020

FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining⁠

“GPT-3: Language Models Are Few-Shot Learners ”, Brown et al 2020

GPT-3: Language Models are Few-Shot Learners⁠

“Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search ”, Rawal et al 2020

Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search⁠

“Automatic Discovery of Interpretable Planning Strategies ”, Skirzyński et al 2020

Automatic Discovery of Interpretable Planning Strategies⁠

“Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks ”, Schoettler et al 2020

Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks⁠

“A Comparison of Methods for Treatment Assignment With an Application to Playlist Generation ”, Fernández-Loría et al 2020

A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation⁠

“Approximate Exploitability: Learning a Best Response in Large Games ”, Timbers et al 2020

Approximate exploitability: Learning a best response in large games⁠

“Meta-Learning in Neural Networks: A Survey ”, Hospedales et al 2020

Meta-Learning in Neural Networks: A Survey⁠

“Agent57: Outperforming the Atari Human Benchmark ”, Badia et al 2020

Agent57: Outperforming the Atari Human Benchmark⁠

“Designing Network Design Spaces ”, Radosavovic et al 2020

Designing Network Design Spaces⁠

“Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and Their Solutions ”, Wang et al 2020

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions⁠

“Accelerating and Improving AlphaZero Using Population Based Training ”, Wu et al 2020

Accelerating and Improving AlphaZero Using Population Based Training⁠

“Meta-Learning Curiosity Algorithms ”, Alet et al 2020

Meta-learning curiosity algorithms⁠

“AutoML-Zero: Evolving Machine Learning Algorithms From Scratch ”, Real et al 2020

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch⁠

“AutoML-Zero: Open Source Code for the Paper: "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch" ”, Real et al 2020

AutoML-Zero: Open source code for the paper: "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch"⁠

“Effective Diversity in Population Based Reinforcement Learning ”, Parker-Holder et al 2020

Effective Diversity in Population Based Reinforcement Learning⁠

“AI Helps Warehouse Robots Pick Up New Tricks: Backed by Machine Learning Luminaries, Covariant.ai’s Bots Can Handle Jobs Previously Needing a Human Touch ”, Knight 2020

AI Helps Warehouse Robots Pick Up New Tricks: Backed by machine learning luminaries, Covariant.ai’s bots can handle jobs previously needing a human touch⁠

“Smooth Markets: A Basic Mechanism for Organizing Gradient-Based Learners ”, Balduzzi et al 2020

Smooth markets: A basic mechanism for organizing gradient-based learners⁠

“AutoML-Zero: Evolving Code That Learns ”, Real & Liang 2020

AutoML-Zero: Evolving Code that Learns⁠

“Learning Neural Activations ”, Minhas & Asif 2019

Learning Neural Activations⁠

“Meta-Learning without Memorization ”, Yin et al 2019

Meta-Learning without Memorization⁠

“MetaFun: Meta-Learning With Iterative Functional Updates ”, Xu et al 2019

MetaFun: Meta-Learning with Iterative Functional Updates⁠

“Leveraging Procedural Generation to Benchmark Reinforcement Learning ”, Cobbe et al 2019

Leveraging Procedural Generation to Benchmark Reinforcement Learning⁠

“Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-To-Use Procedurally-Generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills ”, Cobbe et al 2019

Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills⁠

“Increasing Generality in Machine Learning through Procedural Content Generation ”, Risi & Togelius 2019

Increasing Generality in Machine Learning through Procedural Content Generation⁠

“SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning ”, Wang et al 2019

SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning⁠

“Optimizing Millions of Hyperparameters by Implicit Differentiation ”, Lorraine et al 2019

Optimizing Millions of Hyperparameters by Implicit Differentiation⁠

“Learning to Predict Without Looking Ahead: World Models Without Forward Prediction ”, Freeman et al 2019

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction⁠

“Learning to Predict Without Looking Ahead: World Models Without Forward Prediction [Blog] ”, Freeman et al 2019

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction [blog]⁠

“Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning ”, Yu et al 2019

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning⁠

“Solving Rubik’s Cube With a Robot Hand ”, OpenAI et al 2019

Solving Rubik’s Cube with a Robot Hand⁠

“Solving Rubik’s Cube With a Robot Hand [Blog] ”, OpenAI 2019

Solving Rubik’s Cube with a Robot Hand [blog]⁠

“Gradient Descent: The Ultimate Optimizer ”, Chandra et al 2019

Gradient Descent: The Ultimate Optimizer⁠

“Data Valuation Using Reinforcement Learning ”, Yoon et al 2019

Data Valuation using Reinforcement Learning⁠

“Multiplicative Interactions and Where to Find Them ”, Jayakumar et al 2019

Multiplicative Interactions and Where to Find Them⁠

“ANIL: Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML ”, Raghu et al 2019

ANIL: Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML⁠

“Emergent Tool Use From Multi-Agent Autocurricula ”, Baker et al 2019

Emergent Tool Use From Multi-Agent Autocurricula⁠

“Meta-Learning With Implicit Gradients ”, Rajeswaran et al 2019

Meta-Learning with Implicit Gradients⁠

“A Critique of Pure Learning and What Artificial Neural Networks Can Learn from Animal Brains ”, Zador 2019

A critique of pure learning and what artificial neural networks can learn from animal brains⁠

“AutoML: A Survey of the State-Of-The-Art ”, He et al 2019

AutoML: A Survey of the State-of-the-Art⁠

“Metalearned Neural Memory ”, Munkhdalai et al 2019

Metalearned Neural Memory⁠

“Algorithms for Hyper-Parameter Optimization ”, Bergstra et al 2019

Algorithms for Hyper-Parameter Optimization⁠

“Evolving the Hearthstone Meta ”, Silva et al 2019

Evolving the Hearthstone Meta⁠

“Meta Reinforcement Learning ”, Weng 2019

Meta Reinforcement Learning⁠

“One Epoch Is All You Need ”, Komatsuzaki 2019

One Epoch Is All You Need⁠

“Compositional Generalization through Meta Sequence-To-Sequence Learning ”, Lake 2019

Compositional generalization through meta sequence-to-sequence learning⁠

“Risks from Learned Optimization in Advanced Machine Learning Systems ”, Hubinger et al 2019

Risks from Learned Optimization in Advanced Machine Learning Systems⁠

“ICML 2019 Notes ”, Abel 2019

ICML 2019 Notes⁠

“SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers ”, Fedorov et al 2019

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers⁠

“AI-GAs: AI-Generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence ”, Clune 2019

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence⁠

“Alpha MAML: Adaptive Model-Agnostic Meta-Learning ”, Behl et al 2019

Alpha MAML: Adaptive Model-Agnostic Meta-Learning⁠

“Reinforcement Learning, Fast and Slow ”, Botvinick et al 2019

Reinforcement Learning, Fast and Slow⁠

“Meta Reinforcement Learning As Task Inference ”, Humplik et al 2019

Meta reinforcement learning as task inference⁠

“Learning Loss for Active Learning ”, Yoo & Kweon 2019

Learning Loss for Active Learning⁠

“Meta-Learning of Sequential Strategies ”, Ortega et al 2019

Meta-learning of Sequential Strategies⁠

“Searching for MobileNetV3 ”, Howard et al 2019

Searching for MobileNetV3⁠

“Meta-Learners’ Learning Dynamics Are unlike Learners’ ”, Rabinowitz 2019

Meta-learners’ learning dynamics are unlike learners’⁠

“Ray Interference: a Source of Plateaus in Deep Reinforcement Learning ”, Schaul et al 2019

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning⁠

“AlphaX: EXploring Neural Architectures With Deep Neural Networks and Monte Carlo Tree Search ”, Wang et al 2019

AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search⁠

“Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables ”, Rakelly et al 2019

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables⁠

“Task2Vec: Task Embedding for Meta-Learning ”, Achille et al 2019

Task2Vec: Task Embedding for Meta-Learning⁠

“The Omniglot Challenge: a 3-Year Progress Report ”, Lake et al 2019

The Omniglot challenge: a 3-year progress report⁠

“FIGR: Few-Shot Image Generation With Reptile ”, Clouâtre & Demers 2019

FIGR: Few-shot Image Generation with Reptile⁠

“Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions ”, Wang et al 2019

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions⁠

“Meta-Learning Neural Bloom Filters ”, Rae 2019

⁠Meta-Learning Neural Bloom Filters⁠ :

View PDF:

⁠/doc/www/proceedings.mlr.press/ffc205d2c6c2a415641f822e0c909bd847352e99.pdf⁠

“Malthusian Reinforcement Learning ”, Leibo et al 2018

Malthusian Reinforcement Learning⁠

“Quantifying Generalization in Reinforcement Learning ”, Cobbe et al 2018

Quantifying Generalization in Reinforcement Learning⁠

“An Introduction to Deep Reinforcement Learning ”, Francois-Lavet et al 2018

An Introduction to Deep Reinforcement Learning⁠

“Meta-Learning: Learning to Learn Fast ”, Weng 2018

Meta-Learning: Learning to Learn Fast⁠

“Evolving Space-Time Neural Architectures for Videos ”, Piergiovanni et al 2018

Evolving Space-Time Neural Architectures for Videos⁠

“Understanding and Correcting Pathologies in the Training of Learned Optimizers ”, Metz et al 2018

Understanding and correcting pathologies in the training of learned optimizers⁠

“BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning ”, Chevalier-Boisvert et al 2018

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning⁠

“Deep Reinforcement Learning ”, Li 2018

Deep Reinforcement Learning⁠

“Searching for Efficient Multi-Scale Architectures for Dense Image Prediction ”, Chen et al 2018

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction⁠

“Backprop Evolution ”, Alber et al 2018

Backprop Evolution⁠

“Learning Dexterous In-Hand Manipulation ”, OpenAI et al 2018

Learning Dexterous In-Hand Manipulation⁠

“LEO: Meta-Learning With Latent Embedding Optimization ”, Rusu et al 2018

LEO: Meta-Learning with Latent Embedding Optimization⁠

“Automatically Composing Representation Transformations As a Means for Generalization ”, Chang et al 2018

Automatically Composing Representation Transformations as a Means for Generalization⁠

“Human-Level Performance in First-Person Multiplayer Games With Population-Based Deep Reinforcement Learning ”, Jaderberg et al 2018

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning⁠

“Guided Evolutionary Strategies: Augmenting Random Search With Surrogate Gradients ”, Maheswaranathan et al 2018

Guided evolutionary strategies: Augmenting random search with surrogate gradients⁠

“RUDDER: Return Decomposition for Delayed Rewards ”, Arjona-Medina et al 2018

RUDDER: Return Decomposition for Delayed Rewards⁠

“Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning ”, Pang et al 2018

Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning⁠

“Fingerprint Policy Optimization for Robust Reinforcement Learning ”, Paul et al 2018

Fingerprint Policy Optimization for Robust Reinforcement Learning⁠

“AutoAugment: Learning Augmentation Policies from Data ”, Cubuk et al 2018

AutoAugment: Learning Augmentation Policies from Data⁠

“Meta-Gradient Reinforcement Learning ”, Xu et al 2018

Meta-Gradient Reinforcement Learning⁠

“Continuous Learning in a Hierarchical Multiscale Neural Network ”, Wolf et al 2018

Continuous Learning in a Hierarchical Multiscale Neural Network⁠

“Prefrontal Cortex As a Meta-Reinforcement Learning System ”, Wang et al 2018

Prefrontal cortex as a meta-reinforcement learning system⁠

“Meta-Learning Update Rules for Unsupervised Representation Learning ”, Metz et al 2018

Meta-Learning Update Rules for Unsupervised Representation Learning⁠

“Reviving and Improving Recurrent Back-Propagation ”, Liao et al 2018

Reviving and Improving Recurrent Back-Propagation⁠

“Kickstarting Deep Reinforcement Learning ”, Schmitt et al 2018

Kickstarting Deep Reinforcement Learning⁠

“Reptile/FOMAML: On First-Order Meta-Learning Algorithms ”, Nichol et al 2018

Reptile/FOMAML: On First-Order Meta-Learning Algorithms⁠

“Some Considerations on Learning to Explore via Meta-Reinforcement Learning ”, Stadie et al 2018

Some Considerations on Learning to Explore via Meta-Reinforcement Learning⁠

“One Big Net For Everything ”, Schmidhuber 2018

One Big Net For Everything⁠

“Machine Theory of Mind ”, Rabinowitz et al 2018

Machine Theory of Mind⁠

“Evolved Policy Gradients ”, Houthooft et al 2018

Evolved Policy Gradients⁠

“One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning ”, Yu et al 2018

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning⁠

“Rover Descent: Learning to Optimize by Learning to Navigate on Prototypical Loss Surfaces ”, Faury & Vasile 2018

Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces⁠

“ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks ”, Kim & Choi 2018

ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks⁠

“Population Based Training of Neural Networks ”, Jaderberg et al 2017

Population Based Training of Neural Networks⁠

“BlockDrop: Dynamic Inference Paths in Residual Networks ”, Wu et al 2017

BlockDrop: Dynamic Inference Paths in Residual Networks⁠

“Learning to Select Computations ”, Callaway et al 2017

Learning to select computations⁠

“Learning to Generalize: Meta-Learning for Domain Generalization ”, Li et al 2017

Learning to Generalize: Meta-Learning for Domain Generalization⁠

“Efficient K-Shot Learning With Regularized Deep Networks ”, Yoo et al 2017

Efficient K-shot Learning with Regularized Deep Networks⁠

“Online Learning of a Memory for Learning Rates ”, Meier et al 2017

Online Learning of a Memory for Learning Rates⁠

“One-Shot Visual Imitation Learning via Meta-Learning ”, Finn et al 2017

One-Shot Visual Imitation Learning via Meta-Learning⁠

“Supervising Unsupervised Learning ”, Garg & Kalai 2017

Supervising Unsupervised Learning⁠

“Learning With Opponent-Learning Awareness ”, Foerster et al 2017

Learning with Opponent-Learning Awareness⁠

“SMASH: One-Shot Model Architecture Search through HyperNetworks ”, Brock et al 2017

SMASH: One-Shot Model Architecture Search through HyperNetworks⁠

“Stochastic Optimization With Bandit Sampling ”, Salehi et al 2017

Stochastic Optimization with Bandit Sampling⁠

“A Simple Neural Attentive Meta-Learner ”, Mishra et al 2017

A Simple Neural Attentive Meta-Learner⁠

“Reinforcement Learning for Learning Rate Control ”, Xu et al 2017

Reinforcement Learning for Learning Rate Control⁠

“Metacontrol for Adaptive Imagination-Based Optimization ”, Hamrick et al 2017

Metacontrol for Adaptive Imagination-Based Optimization⁠

“Deciding How to Decide: Dynamic Routing in Artificial Neural Networks ”, McGill & Perona 2017

Deciding How to Decide: Dynamic Routing in Artificial Neural Networks⁠

“Prototypical Networks for Few-Shot Learning ”, Snell et al 2017

Prototypical Networks for Few-shot Learning⁠

“Learned Optimizers That Scale and Generalize ”, Wichrowska et al 2017

Learned Optimizers that Scale and Generalize⁠

“MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ”, Finn et al 2017

MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks⁠

“Learning to Optimize Neural Nets ”, Li & Malik 2017

Learning to Optimize Neural Nets⁠

“Understanding Synthetic Gradients and Decoupled Neural Interfaces ”, Czarnecki et al 2017

Understanding Synthetic Gradients and Decoupled Neural Interfaces⁠

“Optimization As a Model for Few-Shot Learning ”, Ravi & Larochelle 2017

Optimization as a Model for Few-Shot Learning⁠

“Learning to Superoptimize Programs ”, Bunel et al 2017

Learning to superoptimize programs⁠

“Discovering Objects and Their Relations from Entangled Scene Representations ”, Raposo et al 2017

Discovering objects and their relations from entangled scene representations⁠

“Google Vizier: A Service for Black-Box Optimization ”, Golovin 2017

⁠Google Vizier: A Service for Black-Box Optimization⁠

“An Actor-Critic Algorithm for Learning Rate Learning ”, Xu et al 2016

An Actor-critic Algorithm for Learning Rate Learning⁠

“A Bird’s Eye View of Synthetic Gradients ”, Greydanus 2016

⁠A Bird’s Eye View of Synthetic Gradients :

View External Link:

⁠https://greydanus.github.io/2016/11/26/synthetic-gradients/

“Learning to Reinforcement Learn ”, Wang et al 2016

Learning to reinforcement learn⁠

“Learning to Learn without Gradient Descent by Gradient Descent ”, Chen et al 2016

Learning to Learn without Gradient Descent by Gradient Descent⁠

“RL²: Fast Reinforcement Learning via Slow Reinforcement Learning ”, Duan et al 2016

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning⁠

“Designing Neural Network Architectures Using Reinforcement Learning ”, Baker et al 2016

Designing Neural Network Architectures using Reinforcement Learning⁠

“Using Fast Weights to Attend to the Recent Past ”, Ba et al 2016

Using Fast Weights to Attend to the Recent Past⁠

“HyperNetworks ”, Ha et al 2016

HyperNetworks⁠

“Decoupled Neural Interfaces Using Synthetic Gradients ”, Jaderberg et al 2016

Decoupled Neural Interfaces using Synthetic Gradients⁠

“Learning to Learn by Gradient Descent by Gradient Descent ”, Andrychowicz et al 2016

Learning to learn by gradient descent by gradient descent⁠

“Matching Networks for One Shot Learning ”, Vinyals et al 2016

Matching Networks for One Shot Learning⁠

“Learning to Optimize ”, Li & Malik 2016

Learning to Optimize⁠

“One-Shot Learning With Memory-Augmented Neural Networks ”, Santoro et al 2016

One-shot Learning with Memory-Augmented Neural Networks⁠

“Adaptive Computation Time for Recurrent Neural Networks ”, Graves 2016

Adaptive Computation Time for Recurrent Neural Networks⁠

“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models ”, Schmidhuber 2015

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models⁠

“Gradient-Based Hyperparameter Optimization through Reversible Learning ”, Maclaurin et al 2015

Gradient-based Hyperparameter Optimization through Reversible Learning⁠

“Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education ”, Zhu 2015b

Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education⁠

“Human-Level Concept Learning through Probabilistic Program Induction ”, Lake 2015

⁠Human-level concept learning through probabilistic program induction⁠ :

View PDF:

⁠/doc/www/www.cs.utoronto.ca/abf7f07b81284b90744432cca3bf0d8cee85e469.pdf⁠

“Robots That Can Adapt like Animals ”, Cully et al 2014

Robots that can adapt like animals⁠

“Deep Learning in Neural Networks: An Overview ”, Schmidhuber 2014

Deep Learning in Neural Networks: An Overview⁠

“Practical Bayesian Optimization of Machine Learning Algorithms ”, Snoek et al 2012

Practical Bayesian Optimization of Machine Learning Algorithms⁠

“Evolutionary Importance of Phenotypic Accommodation in Novel Environments: an Empirical Test of the Baldwin Effect ”, Badyaev 2009

Evolutionary importance of phenotypic accommodation in novel environments: an empirical test of the Baldwin effect⁠

“Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements ”, Schmidhuber 2003

⁠Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements⁠

“Optimal Ordered Problem Solver (OOPS) ”, Schmidhuber 2002

Optimal Ordered Problem Solver (OOPS)⁠

“Learning to Learn Using Gradient Descent ”, Hochreiter et al 2001

Learning to Learn Using Gradient Descent⁠

“On the Optimization of a Synaptic Learning Rule ”, Bengio et al 1997

On the Optimization of a Synaptic Learning Rule⁠

“Interactions between Learning and Evolution ”, Ackley & Littman 1992

Interactions between Learning and Evolution⁠

“Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks ”, Schmidhuber 1992b

Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks⁠

“Learning a Synaptic Learning Rule ”, Bengio et al 1991

Learning a synaptic learning rule⁠

“Reinforcement Learning: An Introduction § Designing Reward Signals ”, Sutton & Barto 2025 (page 491)

⁠Reinforcement Learning: An Introduction § Designing Reward Signals⁠ :

View PDF (27MB):

⁠/doc/www/incompleteideas.net/2ddcafc570cef087ed62b0113ee2917df3a4f33a.pdf#page=491⁠

“Exploring Hyperparameter Meta-Loss Landscapes With Jax ”

⁠Exploring hyperparameter meta-loss landscapes with Jax⁠ :

View HTML:

⁠/doc/www/lukemetz.com/f8c247b7a53735e17642638a18b45a312a3cf84f.html#google⁠

“Metalearning ”

Metalearning⁠

“Universal Search § OOPS and Other Incremental Variations ”

Universal search § OOPS and other incremental variations⁠

“Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious ”

⁠Extrapolating to Unnatural Language Processing with GPT-3’s In-context Learning: The Good, the Bad, and the Mysterious :

View HTML:

⁠/doc/www/ai.stanford.edu/53906a0a199a213fa1bce0b97ecad6b5063931e4.html⁠

“How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning ”

⁠How does in-context learning work? A framework for understanding the differences from traditional supervised learning :

View HTML:

⁠/doc/www/ai.stanford.edu/bdf17c80e1ed5dc516811f03acef03415b220143.html⁠

“Rapid Motor Adaptation for Legged Robots ”

⁠Rapid Motor Adaptation for Legged Robots :

View External Link:

⁠https://ashish-kmr.github.io/rma-legged-robots/

“Collaborating With Humans Requires Understanding Them ”

⁠Collaborating with Humans Requires Understanding Them⁠ :

View External Link:

⁠https://bair.berkeley.edu/blog/2019/10/21/coordination/⁠

“Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability [Blog] ”

⁠Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability [blog]⁠

“Hypernetworks [Blog] ”, Ha 2025

⁠Hypernetworks [blog]

“Action and Perception As Divergence Minimization ”

⁠Action and Perception as Divergence Minimization :

View External Link:

⁠https://danijar.com/project/apd/

“AlphaStar: Mastering the Real-Time Strategy Game StarCraft II ”

⁠AlphaStar: Mastering the Real-Time Strategy Game StarCraft II⁠

“Prefrontal Cortex As a Meta-Reinforcement Learning System [Blog] ”

⁠Prefrontal cortex as a meta-reinforcement learning system [blog]⁠ :

View HTML:

⁠/doc/www/deepmind.google/c78ec48980d210d2a589b077f45a8d1f9e303dce.html⁠

“The Lie Comes First, the Worlds to Accommodate It ”

⁠the lie comes first, the worlds to accommodate it

“Sgdstore/experiments/omniglot at Master ”

⁠sgdstore/experiments/omniglot at master⁠ :

View HTML:

⁠/doc/www/github.com/864f04985a5c15c527c5b4465cc42d8707e7bbb5.html#omniglot⁠

“Curriculum For Reinforcement Learning ”

⁠Curriculum For Reinforcement Learning⁠ :

View HTML:

⁠/doc/www/lilianweng.github.io/82f88b3c03ad0b252a33823c09b770832a95bbcc.html#openai⁠

“Neural Architecture Search ”

⁠Neural Architecture Search⁠ :

View HTML:

⁠/doc/www/lilianweng.github.io/e6974ff95b5a7295d018b89e865e011d9f66369b.html#openai⁠

“MetaGenRL: Improving Generalization in Meta Reinforcement Learning ”

⁠MetaGenRL: Improving Generalization in Meta Reinforcement Learning :

View HTML:

⁠/doc/www/louiskirsch.com/cdfbfecb16cd0249c87142b4c650a487782f1399.html⁠

“The RetroInstruct Guide To Synthetic Text Data ”, Pressman 2025

⁠The RetroInstruct Guide To Synthetic Text Data

“2022: 25-Year Anniversary: LSTM (1997), All Computable Metaverses, Hierarchical Q-Learning, Adversarial Intrinsic Reinforcement Learning, Low-Complexity NNs, Low-Complexity Art, Meta-RL, Soccer Learning ”

⁠2022: 25-year anniversary: LSTM (1997), all computable metaverses, hierarchical Q-learning, adversarial intrinsic Reinforcement Learning, low-complexity NNs, low-complexity art, Meta-RL, soccer learning⁠ :

View HTML:

⁠https://people.idsia.ch/~juergen/25years1997.html⁠

“Metalearning or Learning to Learn Since 1987 ”

⁠Metalearning or Learning to Learn Since 1987⁠ :

View HTML:

⁠/doc/www/people.idsia.ch/76c24cf0db4abf4ce2b77d22182272d8e62d1a28.html⁠

“The Future of Artificial Intelligence Is Self-Organizing and Self-Assembling ”

The Future of Artificial Intelligence is Self-Organizing and Self-Assembling

“Domain-Adaptive Meta-Learning ”

⁠Domain-Adaptive Meta-Learning⁠ :

View HTML:

⁠/doc/www/sites.google.com/31439931d05db1bcf5dd08d9e9a09c6be5b970a0.html⁠

“How to Fix Reinforcement Learning ”

How to fix reinforcement learning⁠

“Introducing Adept ”

Introducing Adept

“Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes ”

⁠Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes⁠

“Risks from Learned Optimization: Introduction ”

⁠Risks from Learned Optimization: Introduction⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/FkgsxrGf3QxhfLWHG/risks-from-learned-optimization-introduction⁠

“How Good Are LLMs at Doing ML on an Unknown Dataset? ”

⁠How good are LLMs at doing ML on an unknown dataset?⁠

“Early Situational Awareness and Its Implications, a Story ”

⁠Early situational awareness and its implications, a story⁠ :

View HTML:

⁠/doc/www/www.greaterwrong.com/20ea9a879c0915ecfa2f2f87dba168dc160967cb.html⁠

“RWKV Language Model ”

⁠RWKV Language Model

“AI Is Learning How to Create Itself ”

⁠AI is learning how to create itself⁠ :

View HTML:

⁠/doc/www/www.technologyreview.com/2874fc1c58ced20eb47a85ca46af67972fcf2ab4.html⁠

“Matt Botvinick: Neuroscience, Psychology, and AI at DeepMind ”

⁠Matt Botvinick: Neuroscience, Psychology, and AI at DeepMind⁠ :

⁠https://www.youtube.com/watch?v=3t06ajvBtl0⁠

“SMASH: One-Shot Model Architecture Search through HyperNetworks [Video] ”

⁠SMASH: One-Shot Model Architecture Search through HyperNetworks [video]⁠ :

⁠https://www.youtube.com/watch?v=79tmPL9AL48⁠

“Solving Rubik’s Cube With a Robot Hand: Perturbations ”

⁠Solving Rubik’s Cube with a Robot Hand: Perturbations⁠

“WELM ”

⁠WELM⁠

Wikipedia (14)

Miscellaneous

Bibliography

https://arxiv.org/abs/2503.14456: “RWKV-7 ‘Goose’ With Expressive Dynamic State Evolution ”⁠, Bo Peng⁠, Ruichong Zhang, Daniel Goldstein …, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng
link-bibliography⁠
https://arxiv.org/abs/2502.06807#openai: “Competitive Programming With Large Reasoning Models ”⁠, Ahmed El-Kishky, Alexander Wei, Andre Saraiva …, Borys Minaev, Daniel Selsam⁠, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Łukasz Kaiser⁠, ⁠Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese⁠, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou
link-bibliography⁠
https://arxiv.org/abs/2501.01956: “Metadata Conditioning Accelerates Language Model Pre-Training ”⁠, Tianyu Gao, Alexander Wettig, Luxi He …, Yihe Dong, Sadhika Malladi, Danqi Chen⁠
link-bibliography⁠
https://arxiv.org/abs/2412.06464#nvidia: “Gated Delta Networks: Improving Mamba-2 With Delta Rule ”⁠, Songlin Yang, Jan Kautz, Ali Hatamizadeh
link-bibliography⁠
https://arxiv.org/abs/2410.07095#openai: “MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering ”⁠, Jun Shern Chan, Neil Chowdhury, Oliver Jaffe …, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, ⁠Lilian Weng, ⁠Aleksander Madry
link-bibliography⁠
https://arxiv.org/abs/2406.13131: “When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models ”⁠, Ting-Yun Chang, Jesse Thomason, Robin Jia
link-bibliography⁠
https://arxiv.org/abs/2406.11233: “Probing the Decision Boundaries of In-Context Learning in Large Language Models ”⁠, Siyan Zhao, Tung Nguyen, Aditya Grover⁠
link-bibliography⁠
https://arxiv.org/abs/2405.07883: “Zero-Shot Tokenizer Transfer ”⁠, Benjamin Minixhofer, Edoardo Maria Ponti, Ivan Vulić⁠
link-bibliography⁠
https://arxiv.org/abs/2404.12699: “SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-Trained Models ”⁠, Jiangyi Deng, Shengyuan Pang, Yanjiao Chen …, Liangming Xia, Yijie Bai, Haiqin Weng, Wenyuan Xu⁠
link-bibliography⁠
https://ieeexplore.ieee.org/abstract/document/10446522: “Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution ”⁠, Sadegh Mahdavi, Renjie Liao, Christos Thrampoulidis
link-bibliography⁠
https://arxiv.org/abs/2404.07544: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”⁠, Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu
link-bibliography⁠
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission ”⁠, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister …, Agustin Dal Lago, Ashley Edwards, Richard Everett⁠, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh⁠, Alexander Zacherl, Lei M. Zhang
link-bibliography⁠
https://openreview.net/forum?id=psXVkKO9No#deepmind: “Self-AIXI: Self-Predictive Universal AI ”⁠, Elliot Catt, Jordi Grau-Moya, Marcus Hutter⁠ …, Matthew Aitchison, Tim Genewein, Gregoire Deletang, Li Kevin Wenliang, ⁠Joel Veness⁠
link-bibliography⁠
https://arxiv.org/abs/2308.09175#deepmind: “Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”⁠, Tom Zahavy, Vivek Veeriah, Shaobo Hou …, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut⁠, Demis Hassabis⁠, Satinder Singh⁠
link-bibliography⁠
https://arxiv.org/abs/2307.03381: “Teaching Arithmetic to Small Transformers ”⁠, Nayoung Lee, Kartik Sreenivasan, Jason D. Lee …, Kangwook Lee, Dimitris Papailiopoulos
link-bibliography⁠
https://arxiv.org/abs/2306.14892: “Supervised Pretraining Can Learn In-Context Reinforcement Learning ”⁠, Jonathan N. Lee, Annie Xie, Aldo Pacchiano …, Yash Chandak, Chelsea Finn⁠, Ofir Nachum, Emma Brunskill
link-bibliography⁠
https://arxiv.org/abs/2306.13831: “Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks ”⁠, Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers …, Rodrigo de Lazcano, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, Jordan Terry
link-bibliography⁠
https://arxiv.org/abs/2307.01201#deepmind: “Schema-Learning and Rebinding As Mechanisms of In-Context Learning and Emergence ”⁠, Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju …, Murray Shanahan⁠, Miguel Lazaro-Gredilla, Dileep George⁠
link-bibliography⁠
https://arxiv.org/abs/2306.09222#google: “RGD: Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization ”⁠, Ramnath Kumar, Kushal Majmundar, Dheeraj Nagaraj, Arun Sai Suggala
link-bibliography⁠
https://arxiv.org/abs/2304.02015#alibaba: “How Well Do Large Language Models Perform in Arithmetic Tasks? ”⁠, Zheng Yuan, Hongyi Yuan, Chuanqi Tan …, Wei Wang, Songfang Huang
link-bibliography⁠
https://arxiv.org/abs/2303.03846#google: “Larger Language Models Do In-Context Learning Differently ”⁠, Jerry Wei, Jason Wei, ⁠Yi Tay …, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, ⁠Denny Zhou, ⁠Tengyu Ma
link-bibliography⁠
https://arxiv.org/abs/2212.07677#google: “Transformers Learn In-Context by Gradient Descent ”⁠, Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo …, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov
link-bibliography⁠
https://arxiv.org/abs/2212.02475#google: “FWL: Meta-Learning Fast Weight Language Models ”⁠, Kevin Clark, Kelvin Guu, Ming-Wei Chang …, Panupong Pasupat, Geoffrey Hinton⁠, Mohammad Norouzi⁠
link-bibliography⁠
https://arxiv.org/abs/2211.15661#google: “What Learning Algorithm Is In-Context Learning? Investigations With Linear Models ”⁠, Ekin Akyürek, Dale Schuurmans, ⁠Jacob Andreas …, ⁠Tengyu Ma, ⁠Denny Zhou
link-bibliography⁠
https://arxiv.org/abs/2211.01786: “BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning ”⁠, ⁠Niklas Muennighoff, Thomas Wang⁠, Lintang Sutawika …, Adam Roberts⁠, ⁠Stella Biderman, Teven Le Scao⁠, M. Saiful Bari, ⁠Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev⁠, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, ⁠Zaid Alyafeai, Albert Webson, Edward Raff, ⁠Colin Raffel
link-bibliography⁠
https://arxiv.org/abs/2209.14500: “SAP: Bidirectional Language Models Are Also Few-Shot Learners ”⁠, Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli …, Noah Constant⁠, ⁠Colin Raffel, Chris Callison-Burch
link-bibliography⁠
https://arxiv.org/abs/2209.12892: “g.pt: Learning to Learn With Generative Models of Neural Network Checkpoints ”⁠, William Peebles, Ilija Radosavovic, Tim Brooks …, Alexei A. Efros⁠, Jitendra Malik⁠
link-bibliography⁠
https://arxiv.org/abs/2208.01448#amazon: “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model ”⁠, Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald …, Rahul Gupta⁠, Wael Hamza, Haidar Khan⁠, Charith Peris, Stephen Rawls, Andy Rosenbaum, Anna Rumshisky, Chandana Satya Prakash, Mukund Sridhar, Fabian Triefenbach, Apurv Verma, Gokhan Tur, Prem Natarajan
link-bibliography⁠
https://arxiv.org/abs/2208.01066: “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes ”⁠, Shivam Garg, Dimitris Tsipras, ⁠Percy Liang⁠, Gregory Valiant
link-bibliography⁠
https://arxiv.org/abs/2207.01848: “TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data ”⁠, Noah Hollmann, Samuel Müller, Katharina Eggensperger, ⁠Frank Hutter
link-bibliography⁠
https://arxiv.org/abs/2206.13499: “Prompting Decision Transformer for Few-Shot Policy Generalization ”⁠, Mengdi Xu, Yikang Shen, Shun Zhang …, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum⁠, Chuang Gan
link-bibliography⁠
https://arxiv.org/abs/2206.07137: “RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt ”⁠, Sören Mindermann, Jan Brauner, Muhammed Razzak …, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N. Gomez⁠, Adrien Morisot, Sebastian Farquhar, Yarin Gal
link-bibliography⁠
https://arxiv.org/abs/2205.13320#google: “Towards Learning Universal Hyperparameter Optimizers With Transformers ”⁠, Yutian Chen⁠, Xingyou Song, Chansoo Lee …, Zi Wang⁠, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc’aurelio Ranzato, Sagi Perel, Nando de Freitas⁠
link-bibliography⁠
https://arxiv.org/abs/2205.06175#deepmind: “Gato: A Generalist Agent ”⁠, Scott Reed, Konrad Zolna, Emilio Parisotto …, Sergio Gomez Colmenarejo, Alexander Novikov⁠, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay⁠, Jost Tobias Springenberg, Tom Eccles⁠, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess⁠, Yutian Chen⁠, Raia Hadsell, Oriol Vinyals⁠, Mahyar Bordbar, Nando de Freitas⁠
link-bibliography⁠
https://arxiv.org/abs/2205.05131#google: “UL2: Unifying Language Learning Paradigms ”⁠, ⁠Yi Tay, Mostafa Dehghani, Vinh Q. Tran …, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, ⁠Neil Houlsby, Donald Metzler
link-bibliography⁠
https://arxiv.org/abs/2204.07705: “Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks ”⁠, ⁠Yizhong Wang, ⁠Swaroop Mishra, Pegah Alipoormolabashi …, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi⁠, ⁠Noah A. Smith, ⁠Hannaneh Hajishirzi, ⁠Daniel Khashabi
link-bibliography⁠
https://arxiv.org/abs/2203.03691: “HyperMixer: An MLP-Based Low Cost Alternative to Transformers ”⁠, Florian Mai, Arnaud Pannatier, Fabio Fehr …, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson
link-bibliography⁠
https://arxiv.org/abs/2203.02094#microsoft: “LiteTransformerSearch: Training-Free Neural Architecture Search for Efficient Language Models ”⁠, Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee …, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck⁠, Farinaz Koushanfar⁠, Debadeepta Dey
link-bibliography⁠
https://arxiv.org/abs/2203.00759: “HyperPrompt: Prompt-Based Task-Conditioning of Transformers ”⁠, Yun He, Huaixiu Steven Zheng, ⁠Yi Tay …, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, YaGuang Li, Zhao Chen⁠, Donald Metzler, Heng-Tze Cheng, Ed H. Chi⁠
link-bibliography⁠
https://arxiv.org/abs/2202.12837#facebook: “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ”⁠, Sewon Min, Xinxi Lyu, Ari Holtzman⁠ …, Mikel Artetxe⁠, Mike Lewis⁠, ⁠Hannaneh Hajishirzi, Luke Zettlemoyer⁠
link-bibliography⁠
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning ”⁠, Siqi Liu, Luke Marris, Daniel Hennes …, Josh Merel, Nicolas Heess⁠, ⁠Thore Graepel
link-bibliography⁠
2022-miki.pdf: “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild ”⁠, Takahiro Miki, Joonho Lee, Jemin Hwangbo …, Lorenz Wellhausen, Vladlen Koltun⁠, Marco Hutter
link-bibliography⁠
https://arxiv.org/abs/2112.10510: “PFNs: Transformers Can Do Bayesian Inference ”⁠, Samuel Müller, Noah Hollmann, Sebastian Pineda Arango …, Josif Grabocka, ⁠Frank Hutter
link-bibliography⁠
https://arxiv.org/abs/2112.00861#anthropic: “A General Language Assistant As a Laboratory for Alignment ”⁠, ⁠Amanda Askell, Yuntao Bai⁠, Anna Chen …, Dawn Drain, ⁠Deep Ganguli, Tom Henighan, ⁠Andy L. Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, ⁠Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez⁠, ⁠Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei⁠, Tom B. Brown⁠, ⁠Jack Clark⁠, Sam McCandlish⁠, Chris Olah, Jared Kaplan
link-bibliography⁠
https://arxiv.org/abs/2111.01587#deepmind: “Procedural Generalization by Planning With Self-Supervised World Models ”⁠, Ankesh Anand, Jacob Walker, Yazhe Li …, Eszter Vértes, Julian Schrittwieser, Sherjil Ozair, Théophane Weber, Jessica B. Hamrick
link-bibliography⁠
https://arxiv.org/abs/2106.00958#openai: “LHOPT: A Generalizable Approach to Learning Optimizers ”⁠, Diogo Almeida, Clemens Winter, Jie Tang⁠, Wojciech Zaremba⁠
link-bibliography⁠
https://www.sciencedirect.com/science/article/pii/S0004370221000862#deepmind: “Reward Is Enough ”⁠, David Silver⁠, Satinder Singh⁠, Doina Precup⁠, Richard S. Sutton⁠
link-bibliography⁠
https://arxiv.org/abs/2104.06272#deepmind: “Podracer Architectures for Scalable Reinforcement Learning ”⁠, Matteo Hessel, Manuel Kroiss, Aidan Clark …, Iurii Kemaev, John Quan⁠, Thomas Keck, Fabio Viola, Hado van Hasselt⁠
link-bibliography⁠
https://arxiv.org/abs/2103.01075#google: “OmniNet: Omnidirectional Representations from Transformers ”⁠, ⁠Yi Tay, Mostafa Dehghani, Vamsi Aribandi …, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler
link-bibliography⁠
https://arxiv.org/abs/2003.10580#google: “Meta Pseudo Labels ”⁠, Hieu Pham, Zihang Dai⁠, Qizhe Xie …, Minh-Thang Luong, Quoc V. Le⁠
link-bibliography⁠
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms ”⁠, Adam Scholl
link-bibliography⁠
https://arxiv.org/abs/2003.06212: “Accelerating and Improving AlphaZero Using Population Based Training ”⁠, Ti-Rong Wu, Ting-Han Wei, I-Chen Wu⁠
link-bibliography⁠
https://openai.com/research/procgen-benchmark: “Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-To-Use Procedurally-Generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills ”⁠, Karl Cobbe, Christopher Hesse, ⁠Jacob Hilton, ⁠John Schulman
link-bibliography⁠
https://lilianweng.github.io/lil-log/2019/06/23/meta-reinforcement-learning.html#openai: “Meta Reinforcement Learning ”⁠, ⁠Lilian Weng
link-bibliography⁠
https://arxiv.org/abs/1906.06669: “One Epoch Is All You Need ”⁠, Aran Komatsuzaki
link-bibliography⁠
https://david-abel.github.io/notes/icml_2019.pdf: “ICML 2019 Notes ”⁠, David Abel
link-bibliography⁠
https://arxiv.org/abs/1905.01320#deepmind: “Meta-Learners’ Learning Dynamics Are unlike Learners’ ”⁠, Neil C. Rabinowitz
link-bibliography⁠
https://arxiv.org/abs/1904.11455#deepmind: “Ray Interference: a Source of Plateaus in Deep Reinforcement Learning ”⁠, Tom Schaul, Diana Borsa, Joseph Modayil, ⁠Razvan Pascanu⁠
link-bibliography⁠
https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html#openai: “Meta-Learning: Learning to Learn Fast ”⁠, ⁠Lilian Weng
link-bibliography⁠
https://arxiv.org/abs/1806.07857: “RUDDER: Return Decomposition for Delayed Rewards ”⁠, Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich …, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter⁠
link-bibliography⁠
https://arxiv.org/abs/1805.09501#google: “AutoAugment: Learning Augmentation Policies from Data ”⁠, Ekin D. Cubuk, ⁠Barret Zoph, Dandelion Mane …, Vijay Vasudevan, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/1804.00222#google: “Meta-Learning Update Rules for Unsupervised Representation Learning ”⁠, Luke Metz⁠, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein⁠
link-bibliography⁠
https://arxiv.org/abs/1803.02999#openai: “Reptile/FOMAML: On First-Order Meta-Learning Algorithms ”⁠, Alex Nichol, Joshua Achiam, ⁠John Schulman
link-bibliography⁠
https://arxiv.org/abs/1708.05344: “SMASH: One-Shot Model Architecture Search through HyperNetworks ”⁠, Andrew Brock⁠, Theodore Lim, J. M. Ritchie, Nick Weston
link-bibliography⁠
https://arxiv.org/abs/1609.09106#google: “HyperNetworks ”⁠, ⁠David Ha, Andrew Dai, Quoc V. Le⁠
link-bibliography⁠
2015-zhu-2.pdf: “Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education ”⁠, Xiaojin Zhu
link-bibliography⁠
https://pmc.ncbi.nlm.nih.gov/articles/PMC2666683/: “Evolutionary Importance of Phenotypic Accommodation in Novel Environments: an Empirical Test of the Baldwin Effect ”⁠, Alexander V. Badyaev
link-bibliography⁠
https://arxiv.org/abs/cs/0207097#schmidhuber: “Optimal Ordered Problem Solver (OOPS) ”⁠, Jürgen Schmidhuber⁠
link-bibliography⁠
1991-bengio.pdf: “Learning a Synaptic Learning Rule ”⁠, Yoshua Bengio⁠, Samy Bengio⁠, Jocelyn Cloutier
link-bibliography⁠