WBE and DRL: a Middle Way of imitation learning from the human brain
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
State-space models can learn in-context by gradient descent
Thinking LLMs: General Instruction Following with Thought Generation
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Generating Diverse and Reliable Features for Few-Shot Learning
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Probing the Decision Boundaries of In-context Learning in Large Language Models
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
Discovering Preference Optimization Algorithms with and for Large Language Models
State Soup: In-Context Skill Learning, Retrieval and Mixing
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Auto Evol-Instruct: Automatic Instruction Evolving for Large Language Models
A Theoretical Understanding of Self-Correction through In-context Alignment
Position: Understanding LLMs Requires More Than Statistical Generalization
SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution
Best Practices and Lessons Learned on Synthetic Data for Language Models
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
How Well Can Transformers Emulate In-context Newton’s Method?
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models
Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling
Deep de Finetti: Recovering Topic Distributions from Large Language Models
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models
Eureka: Human-Level Reward Design via Coding Large Language Models
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression
Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
Improving Long-Horizon Imitation Through Instruction Prediction
Schema-learning and rebinding as mechanisms of in-context learning and emergence
RGD: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization
Transformers learn to implement preconditioned gradient descent for in-context learning
Fundamental Limitations of Alignment in Large Language Models
How well do Large Language Models perform in Arithmetic tasks?
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery
Human-like systematic generalization through a meta-learning neural network
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
What learning algorithm is in-context learning? Investigations with linear models
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models
BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning
ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning
In-context Reinforcement Learning with Algorithm Distillation
SAP: Bidirectional Language Models Are Also Few-shot Learners
g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data
Prompting Decision Transformer for Few-Shot Policy Generalization
RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
Towards Learning Universal Hyperparameter Optimizers with Transformers
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers
Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Effective Mutation Rate Adaptation through Group Elite Selection
Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs
HyperMixer: An MLP-based Low Cost Alternative to Transformers
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
HyperPrompt: Prompt-based Task-Conditioning of Transformers
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL
Learning Synthetic Environments and Reward Networks for Reinforcement Learning
From data to functa: Your data point is a function and you should treat it like one
Environment Generation for Zero-Shot Compositional Reinforcement Learning
Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies
Learning robust perceptive locomotion for quadrupedal robots in the wild
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning
The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence
How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy
Noether Networks: Meta-Learning Useful Conserved Quantities
A General Language Assistant as a Laboratory for Alignment
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
An Explanation of In-context Learning as Implicit Bayesian Inference
Procedural Generalization by Planning with Self-Supervised World Models
Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators
Shaking the foundations: delusions in sequence models for interaction and control
Meta-learning, social cognition and consciousness in brains and machines
T0: Multitask Prompted Training Enables Zero-Shot Task Generalization
Scalable Online Planning via Reinforcement Learning Fine-Tuning
Dropout’s Dream Land: Generalization from Learned Simulators to Reality
Is Curiosity All You Need? On the Utility of Emergent Behaviors from Curious Exploration
The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning
Dataset Distillation with Infinitely Wide Convolutional Networks
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
Towards mental time travel: a hierarchical memory for reinforcement learning agents
A Full-stack Accelerator Search Technique for Vision Applications
Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Podracer architectures for scalable Reinforcement Learning
Asymmetric self-play for automatic goal discovery in robotic manipulation
OmniNet: Omnidirectional Representations from Transformers
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution
Training Learned Optimizers with Randomly Initialized Learned Optimizers
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Reverse engineering learned optimizers reveals known and novel mechanisms
MELD: Meta-Reinforcement Learning from Images via Latent State Models
Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves
Matt Botvinick on the spontaneous emergence of learning algorithms
Deep Reinforcement Learning and Its Neuroscientific Implications
Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions
FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining
Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search
Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks
A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation
Approximate exploitability: Learning a best response in large games
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Accelerating and Improving AlphaZero Using Population Based Training
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
AutoML-Zero: Open source code for the paper: "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch"
Effective Diversity in Population Based Reinforcement Learning
AI Helps Warehouse Robots Pick Up New Tricks: Backed by machine learning luminaries, Covariant.ai’s bots can handle jobs previously needing a human touch
Smooth markets: A basic mechanism for organizing gradient-based learners
Leveraging Procedural Generation to Benchmark Reinforcement Learning
Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills
Increasing Generality in Machine Learning through Procedural Content Generation
SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning
Optimizing Millions of Hyperparameters by Implicit Differentiation
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction [blog]
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
ANIL: Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
A critique of pure learning and what artificial neural networks can learn from animal brains
Compositional generalization through meta sequence-to-sequence learning
Risks from Learned Optimization in Advanced Machine Learning Systems
SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers
AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Understanding and correcting pathologies in the training of learned optimizers
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
Automatically Composing Representation Transformations as a Means for Generalization
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
Guided evolutionary strategies: Augmenting random search with surrogate gradients
Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning
Fingerprint Policy Optimization for Robust Reinforcement Learning
Continuous Learning in a Hierarchical Multiscale Neural Network
Meta-Learning Update Rules for Unsupervised Representation Learning
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces
ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks
Learning to Generalize: Meta-Learning for Domain Generalization
SMASH: One-Shot Model Architecture Search through HyperNetworks
Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Understanding Synthetic Gradients and Decoupled Neural Interfaces
Discovering objects and their relations from entangled scene representations
Learning to Learn without Gradient Descent by Gradient Descent
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Designing Neural Network Architectures using Reinforcement Learning
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
Gradient-based Hyperparameter Optimization through Reversible Learning
Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education
Human-Level Concept Learning through Probabilistic Program Induction
Practical Bayesian Optimization of Machine Learning Algorithms
Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks
Reinforcement Learning: An Introduction § Designing Reward Signals
Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious
How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning
Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability [Blog]
AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
Prefrontal Cortex As a Meta-Reinforcement Learning System [Blog]
MetaGenRL: Improving Generalization in Meta Reinforcement Learning
2022: 25-Year Anniversary: LSTM (1997), All Computable Metaverses, Hierarchical Q-Learning, Adversarial Intrinsic Reinforcement Learning, Low-Complexity NNs, Low-Complexity Art, Meta-RL, Soccer Learning
The Future of Artificial Intelligence Is Self-Organizing and Self-Assembling
Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes
Matt Botvinick: Neuroscience, Psychology, and AI at DeepMind
SMASH: One-Shot Model Architecture Search through HyperNetworks [Video]
2023-lee-figure9-arithmeticcanbelearnedevenwithnoiseintheinnermonologuetranscripts.jpg
2022-patel-figure1-mt5fewshotpromptingwordbywordforneuralmachinetranslation.png
2020-real-googlebrain-automlzero-bestalgorithmannotation.mp4
2018-metz-appendix-figure1-detailedschematicdiagramofmetalearningarchitecture.png
2018-metz-figure1-schematicofmetalearningrepresentationsforunsupervisedlearning.jpg
2018-metz-figure5-generalizationofmetalearnedruletounseenlayesrunitsandactivations.jpg
2018-metz-figure6-learnedfiltersandrepresentationsofthemetalearnednet.jpg
https://ai.facebook.com/blog/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it
https://ai.meta.com/blog/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it/
https://blog.waymo.com/2020/04/using-automated-data-augmentation-to.html#google
https://pages.ucsd.edu/~rbelew/courses/cogs184_w10/readings/HintonNowlan97.pdf
https://research.google/blog/introducing-flan-more-generalizable-language-models-with-instruction-fine-tuning/
https://research.google/blog/permutation-invariant-neural-networks-for-reinforcement-learning/
https://research.google/blog/training-machine-learning-models-more-efficiently-with-dataset-distillation/
https://www.alignmentforum.org/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais
https://www.lesswrong.com/posts/QNQuWB3hS5FrGp5yZ/programmatic-backdoors-dnns-can-use-sgd-to-run-arbitrary
https://www.lesswrong.com/posts/Y4rrwkopoigaNGxmS/a-mind-needn-t-be-curious-to-reap-the-benefits-of-curiosity#XeAmDn3NsMqdF6Mij
https://www.lesswrong.com/posts/bC5xd7wQCnTDw7Kyx/getting-up-to-speed-on-the-speed-prior-in-2022
https://www.lesswrong.com/posts/ddR8dExcEFJKJtWvR/how-evolutionary-lineages-of-llms-can-plan-their-own-futur
https://www.lesswrong.com/posts/sY3a4Rfa48CgteBEm/chatgpt-can-learn-indirect-control
https://www.nature.com/articles/s41467-020-19244-4#deepmind
https://www.quantamagazine.org/researchers-build-ai-that-builds-ai-20220125/
https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AMeta-RL&sort=top&restrict_sr=on&t=all
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
https%253A%252F%252Farxiv.org%252Fabs%252F2410.07095%2523openai.html
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Probing the Decision Boundaries of In-context Learning in Large Language Models
Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution
https%253A%252F%252Fieeexplore.ieee.org%252Fabstract%252Fdocument%252F10446522.html
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling
https%253A%252F%252Farxiv.org%252Fabs%252F2401.16380%2523apple.html
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DpsXVkKO9No%2523deepmind.html
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
Schema-learning and rebinding as mechanisms of in-context learning and emergence
https%253A%252F%252Farxiv.org%252Fabs%252F2307.01201%2523deepmind.html
RGD: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization
https%253A%252F%252Farxiv.org%252Fabs%252F2306.09222%2523google.html
How well do Large Language Models perform in Arithmetic tasks?
https%253A%252F%252Farxiv.org%252Fabs%252F2304.02015%2523alibaba.html
https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2212.07677%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2212.02475%2523google.html
What learning algorithm is in-context learning? Investigations with linear models
https%253A%252F%252Farxiv.org%252Fabs%252F2211.15661%2523google.html
BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning
SAP: Bidirectional Language Models Are Also Few-shot Learners
g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
https%253A%252F%252Farxiv.org%252Fabs%252F2208.01448%2523amazon.html
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data
Prompting Decision Transformer for Few-Shot Policy Generalization
RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Towards Learning Universal Hyperparameter Optimizers with Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2205.13320%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.06175%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html
Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
HyperMixer: An MLP-based Low Cost Alternative to Transformers
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2203.02094%2523microsoft.html
HyperPrompt: Prompt-based Task-Conditioning of Transformers
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
https%253A%252F%252Farxiv.org%252Fabs%252F2202.12837%2523facebook.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html
Learning robust perceptive locomotion for quadrupedal robots in the wild
%252Fdoc%252Freinforcement-learning%252Fmeta-learning%252F2022-miki.pdf.html
A General Language Assistant as a Laboratory for Alignment
https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html
Procedural Generalization by Planning with Self-Supervised World Models
https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2106.00958%2523openai.html
https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html
Podracer architectures for scalable Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html
OmniNet: Omnidirectional Representations from Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2103.01075%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2003.10580%2523google.html
https%253A%252F%252Fgreydanus.github.io%252F2020%252F12%252F01%252Fscaling-down%252F.html
Matt Botvinick on the spontaneous emergence of learning algorithms
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FWnqua6eQkewL3bqsF%252Fmatt-botvinick-on-the-spontaneous-emergence-of-learning.html
Accelerating and Improving AlphaZero Using Population Based Training
Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills
https%253A%252F%252Fopenai.com%252Fresearch%252Fprocgen-benchmark.html
https%253A%252F%252Fdavid-abel.github.io%252Fnotes%252Ficml_2019.pdf.html
https%253A%252F%252Farxiv.org%252Fabs%252F1905.01320%2523deepmind.html
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F1904.11455%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F1805.09501%2523google.html
Meta-Learning Update Rules for Unsupervised Representation Learning
https%253A%252F%252Farxiv.org%252Fabs%252F1804.00222%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F1803.02999%2523openai.html
SMASH: One-Shot Model Architecture Search through HyperNetworks
Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education
https%253A%252F%252Farxiv.org%252Fabs%252Fcs%252F0207097%2523schmidhuber.html
%252Fdoc%252Freinforcement-learning%252Fmeta-learning%252F1991-bengio.pdf.html
Wikipedia Bibliography: