Bibliography:

  1. ‘RL’ tag

  2. ‘continual learning’ tag

  3. ‘LM tokenization’ tag

  4. ‘inner monologue (AI)’ tag

  5. ‘hidden-information game’ tag

  6. ‘robotics’ tag

  7. ‘RL scaling’ tag

  8. Free-Play Periods for RL Agents

  9. WBE and DRL: a Middle Way of imitation learning from the human brain

  10. Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

  11. State-space models can learn in-context by gradient descent

  12. Thinking LLMs: General Instruction Following with Thought Generation

  13. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

  14. Contextual Document Embeddings

  15. Generating Diverse and Reliable Features for Few-Shot Learning

  16. When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models

  17. Probing the Decision Boundaries of In-context Learning in Large Language Models

  18. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

  19. Discovering Preference Optimization Algorithms with and for Large Language Models

  20. State Soup: In-Context Skill Learning, Retrieval and Mixing

  21. Attention as a Hypernetwork

  22. BERTs are Generative In-Context Learners

  23. To Believe or Not to Believe Your LLM

  24. Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

  25. Auto Evol-Instruct: Automatic Instruction Evolving for Large Language Models

  26. A Theoretical Understanding of Self-Correction through In-context Alignment

  27. MLPs Learn In-Context

  28. Zero-Shot Tokenizer Transfer

  29. Position: Understanding LLMs Requires More Than Statistical Generalization

  30. SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models

  31. Many-Shot In-Context Learning

  32. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

  33. Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution

  34. Best Practices and Lessons Learned on Synthetic Data for Language Models

  35. From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

  36. Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

  37. Evolutionary Optimization of Model Merging Recipes

  38. How Well Can Transformers Emulate In-context Newton’s Method?

  39. Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

  40. Neural Network Parameter Diffusion

  41. The Matrix: A Bayesian learning model for LLMs

  42. Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling

  43. An Information-Theoretic Analysis of In-Context Learning

  44. Deep de Finetti: Recovering Topic Distributions from Large Language Models

  45. Generative Multimodal Models are In-Context Learners

  46. VILA: On Pre-training for Visual Language Models

  47. Evolving Reservoirs for Meta Reinforcement Learning

  48. The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

  49. Learning few-shot imitation as cultural transmission

  50. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

  51. Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

  52. ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

  53. Self-AIXI: Self-Predictive Universal AI

  54. HyperFields: Towards Zero-Shot Generation of NeRFs from Text

  55. Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

  56. Eureka: Human-Level Reward Design via Coding Large Language Models

  57. How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

  58. Motif: Intrinsic Motivation from Artificial Intelligence Feedback

  59. ExpeL: LLM Agents Are Experiential Learners

  60. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  61. RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models

  62. CausalLM is not optimal for in-context learning

  63. MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

  64. Self Expanding Neural Networks

  65. Teaching Arithmetic to Small Transformers

  66. One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

  67. Trainable Transformer in Transformer

  68. Supervised Pretraining Can Learn In-Context Reinforcement Learning

  69. Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

  70. Language models are weak learners

  71. Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

  72. Improving Long-Horizon Imitation Through Instruction Prediction

  73. Schema-learning and rebinding as mechanisms of in-context learning and emergence

  74. RGD: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

  75. Transformers learn to implement preconditioned gradient descent for in-context learning

  76. Learning Transformer Programs

  77. Fundamental Limitations of Alignment in Large Language Models

  78. How well do Large Language Models perform in Arithmetic tasks?

  79. Larger language models do in-context learning differently

  80. BiLD: Big Little Transformer Decoder

  81. Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

  82. Looped Transformers as Programmable Computers

  83. A Survey of Meta-Reinforcement Learning

  84. Human-like systematic generalization through a meta-learning neural network

  85. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

  86. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

  87. Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

  88. Transformers learn in-context by gradient descent

  89. FWL: Meta-Learning Fast Weight Language Models

  90. What learning algorithm is in-context learning? Investigations with linear models

  91. Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models

  92. VeLO: Training Versatile Learned Optimizers by Scaling Up

  93. Mysteries of mode collapse § Inescapable wedding parties

  94. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  95. ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning

  96. In-context Reinforcement Learning with Algorithm Distillation

  97. SAP: Bidirectional Language Models Are Also Few-shot Learners

  98. g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints

  99. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

  100. Few-shot Adaptation Works with UnpredicTable Data

  101. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

  102. Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

  103. TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data

  104. Offline RL Policies Should be Trained to be Adaptive

  105. Goal-Conditioned Generators of Deep Policies

  106. Prompting Decision Transformer for Few-Shot Policy Generalization

  107. RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

  108. NOAH: Neural Prompt Search

  109. Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

  110. Towards Learning Universal Hyperparameter Optimizers with Transformers

  111. Instruction Induction: From Few Examples to Natural Language Task Descriptions

  112. Gato: A Generalist Agent

  113. Unifying Language Learning Paradigms

  114. Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

  115. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  116. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

  117. Effective Mutation Rate Adaptation through Group Elite Selection

  118. Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

  119. Can language models learn from explanations in context?

  120. Auto-Lambda: Disentangling Dynamic Task Relationships

  121. In-Context Learning and Induction Heads

  122. HyperMixer: An MLP-based Low Cost Alternative to Transformers

  123. LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

  124. Evolving Curricula with Regret-Based Environment Design

  125. HyperPrompt: Prompt-based Task-Conditioning of Transformers

  126. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

  127. All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

  128. NeuPL: Neural Population Learning

  129. Learning Synthetic Environments and Reward Networks for Reinforcement Learning

  130. Datamodels: Predicting Predictions from Training Data

  131. From data to functa: Your data point is a function and you should treat it like one

  132. Environment Generation for Zero-Shot Compositional Reinforcement Learning

  133. Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

  134. Learning robust perceptive locomotion for quadrupedal robots in the wild

  135. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

  136. In Defense of the Unitary Scalarization for Deep Multi-Task Learning

  137. HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

  138. Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning

  139. The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence

  140. A Mathematical Framework for Transformer Circuits

  141. PFNs: Transformers Can Do Bayesian Inference

  142. How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

  143. Noether Networks: Meta-Learning Useful Conserved Quantities

  144. A rational reinterpretation of dual-process theories

  145. A General Language Assistant as a Laboratory for Alignment

  146. A Modern Self-Referential Weight Matrix That Learns to Modify Itself

  147. A Survey of Generalization in Deep Reinforcement Learning

  148. Gradients are Not All You Need

  149. An Explanation of In-context Learning as Implicit Bayesian Inference

  150. Procedural Generalization by Planning with Self-Supervised World Models

  151. MetaICL: Learning to Learn In Context

  152. Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators

  153. Shaking the foundations: delusions in sequence models for interaction and control

  154. Meta-learning, social cognition and consciousness in brains and machines

  155. T0: Multitask Prompted Training Enables Zero-Shot Task Generalization

  156. Replay-Guided Adversarial Environment Design

  157. Embodied intelligence via learning and evolution

  158. Transformers are Meta-Reinforcement Learners

  159. Scalable Online Planning via Reinforcement Learning Fine-Tuning

  160. Dropout’s Dream Land: Generalization from Learned Simulators to Reality

  161. Is Curiosity All You Need? On the Utility of Emergent Behaviors from Curious Exploration

  162. Bootstrapped Meta-Learning

  163. The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning

  164. FLAN: Finetuned Language Models Are Zero-Shot Learners

  165. The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

  166. Open-Ended Learning Leads to Generally Capable Agents

  167. Dataset Distillation with Infinitely Wide Convolutional Networks

  168. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

  169. PonderNet: Learning to Ponder

  170. Multimodal Few-Shot Learning with Frozen Language Models

  171. LHOPT: A Generalizable Approach to Learning Optimizers

  172. Towards mental time travel: a hierarchical memory for reinforcement learning agents

  173. A Full-stack Accelerator Search Technique for Vision Applications

  174. Reward is enough

  175. Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020

  176. CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

  177. Podracer architectures for scalable Reinforcement Learning

  178. BLUR: Meta-Learning Bidirectional Update Rules

  179. Asymmetric self-play for automatic goal discovery in robotic manipulation

  180. OmniNet: Omnidirectional Representations from Transformers

  181. Linear Transformers Are Secretly Fast Weight Programmers

  182. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

  183. ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution

  184. Training Learned Optimizers with Randomly Initialized Learned Optimizers

  185. Evolving Reinforcement Learning Algorithms

  186. Meta Pseudo Labels

  187. Meta Learning Backpropagation And Improving It

  188. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

  189. Scaling down Deep Learning

  190. Reverse engineering learned optimizers reveals known and novel mechanisms

  191. Dataset Meta-Learning from Kernel Ridge-Regression

  192. MELD: Meta-Reinforcement Learning from Images via Latent State Models

  193. Meta-trained agents implement Bayes-optimal agents

  194. Learning not to learn: Nature versus nurture in silico

  195. Prioritized Level Replay

  196. Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

  197. Hidden Incentives for Auto-Induced Distributional Shift

  198. Grounded Language Learning Fast and Slow

  199. Matt Botvinick on the spontaneous emergence of learning algorithms

  200. Discovering Reinforcement Learning Algorithms

  201. Deep Reinforcement Learning and Its Neuroscientific Implications

  202. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

  203. Rapid Task-Solving in Novel Environments

  204. FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining

  205. GPT-3: Language Models are Few-Shot Learners

  206. Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search

  207. Automatic Discovery of Interpretable Planning Strategies

  208. Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks

  209. A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation

  210. Approximate exploitability: Learning a best response in large games

  211. Meta-Learning in Neural Networks: A Survey

  212. Agent57: Outperforming the Atari Human Benchmark

  213. Designing Network Design Spaces

  214. Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

  215. Accelerating and Improving AlphaZero Using Population Based Training

  216. Meta-learning curiosity algorithms

  217. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

  218. AutoML-Zero: Open source code for the paper: "AutoML-Zero: Evolving Machine Learning Algorithms From Scratch"

  219. Effective Diversity in Population Based Reinforcement Learning

  220. AI Helps Warehouse Robots Pick Up New Tricks: Backed by machine learning luminaries, Covariant.ai’s bots can handle jobs previously needing a human touch

  221. Smooth markets: A basic mechanism for organizing gradient-based learners

  222. AutoML-Zero: Evolving Code That Learns

  223. Learning Neural Activations

  224. Meta-Learning without Memorization

  225. MetaFun: Meta-Learning with Iterative Functional Updates

  226. Leveraging Procedural Generation to Benchmark Reinforcement Learning

  227. Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills

  228. Increasing Generality in Machine Learning through Procedural Content Generation

  229. SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning

  230. Optimizing Millions of Hyperparameters by Implicit Differentiation

  231. Learning to Predict Without Looking Ahead: World Models Without Forward Prediction

  232. Learning to Predict Without Looking Ahead: World Models Without Forward Prediction [blog]

  233. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

  234. Solving Rubik’s Cube with a Robot Hand

  235. Solving Rubik’s Cube with a Robot Hand [blog]

  236. Gradient Descent: The Ultimate Optimizer

  237. Data Valuation using Reinforcement Learning

  238. Multiplicative Interactions and Where to Find Them

  239. ANIL: Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

  240. Emergent Tool Use From Multi-Agent Autocurricula

  241. Meta-Learning with Implicit Gradients

  242. A critique of pure learning and what artificial neural networks can learn from animal brains

  243. AutoML: A Survey of the State-of-the-Art

  244. Metalearned Neural Memory

  245. Algorithms for Hyper-Parameter Optimization

  246. Evolving the Hearthstone Meta

  247. Meta Reinforcement Learning

  248. One Epoch Is All You Need

  249. Compositional generalization through meta sequence-to-sequence learning

  250. Risks from Learned Optimization in Advanced Machine Learning Systems

  251. ICML 2019 Notes

  252. SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

  253. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

  254. Alpha MAML: Adaptive Model-Agnostic Meta-Learning

  255. Reinforcement Learning, Fast and Slow

  256. Meta reinforcement learning as task inference

  257. Learning Loss for Active Learning

  258. Meta-learning of Sequential Strategies

  259. Searching for MobileNetV3

  260. Meta-learners’ learning dynamics are unlike learners’

  261. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

  262. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search

  263. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

  264. Task2Vec: Task Embedding for Meta-Learning

  265. The Omniglot challenge: a 3-year progress report

  266. FIGR: Few-shot Image Generation with Reptile

  267. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

  268. Meta-Learning Neural Bloom Filters

  269. ffc205d2c6c2a415641f822e0c909bd847352e99.pdf

  270. Malthusian Reinforcement Learning

  271. Quantifying Generalization in Reinforcement Learning

  272. An Introduction to Deep Reinforcement Learning

  273. Meta-Learning: Learning to Learn Fast

  274. Evolving Space-Time Neural Architectures for Videos

  275. Understanding and correcting pathologies in the training of learned optimizers

  276. BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

  277. Deep Reinforcement Learning

  278. Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

  279. Backprop Evolution

  280. Learning Dexterous In-Hand Manipulation

  281. LEO: Meta-Learning with Latent Embedding Optimization

  282. Automatically Composing Representation Transformations as a Means for Generalization

  283. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

  284. Guided evolutionary strategies: Augmenting random search with surrogate gradients

  285. RUDDER: Return Decomposition for Delayed Rewards

  286. Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning

  287. Fingerprint Policy Optimization for Robust Reinforcement Learning

  288. AutoAugment: Learning Augmentation Policies from Data

  289. Meta-Gradient Reinforcement Learning

  290. Continuous Learning in a Hierarchical Multiscale Neural Network

  291. Prefrontal cortex as a meta-reinforcement learning system

  292. Meta-Learning Update Rules for Unsupervised Representation Learning

  293. Reviving and Improving Recurrent Back-Propagation

  294. Kickstarting Deep Reinforcement Learning

  295. Reptile: On First-Order Meta-Learning Algorithms

  296. Some Considerations on Learning to Explore via Meta-Reinforcement Learning

  297. One Big Net For Everything

  298. Machine Theory of Mind

  299. Evolved Policy Gradients

  300. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

  301. Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces

  302. ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks

  303. Population Based Training of Neural Networks

  304. BlockDrop: Dynamic Inference Paths in Residual Networks

  305. Learning to select computations

  306. Learning to Generalize: Meta-Learning for Domain Generalization

  307. Efficient K-shot Learning with Regularized Deep Networks

  308. Online Learning of a Memory for Learning Rates

  309. One-Shot Visual Imitation Learning via Meta-Learning

  310. Supervising Unsupervised Learning

  311. Learning with Opponent-Learning Awareness

  312. SMASH: One-Shot Model Architecture Search through HyperNetworks

  313. Stochastic Optimization with Bandit Sampling

  314. A Simple Neural Attentive Meta-Learner

  315. Reinforcement Learning for Learning Rate Control

  316. Metacontrol for Adaptive Imagination-Based Optimization

  317. Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

  318. Prototypical Networks for Few-shot Learning

  319. Learned Optimizers that Scale and Generalize

  320. MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

  321. Learning to Optimize Neural Nets

  322. Understanding Synthetic Gradients and Decoupled Neural Interfaces

  323. Optimization as a Model for Few-Shot Learning

  324. Learning to superoptimize programs

  325. Discovering objects and their relations from entangled scene representations

  326. Google Vizier: A Service for Black-Box Optimization

  327. An Actor-critic Algorithm for Learning Rate Learning

  328. A Bird’s Eye View of Synthetic Gradients

  329. Learning to reinforcement learn

  330. Learning to Learn without Gradient Descent by Gradient Descent

  331. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning

  332. Designing Neural Network Architectures using Reinforcement Learning

  333. Using Fast Weights to Attend to the Recent Past

  334. HyperNetworks

  335. Decoupled Neural Interfaces using Synthetic Gradients

  336. Learning to learn by gradient descent by gradient descent

  337. Matching Networks for One Shot Learning

  338. Learning to Optimize

  339. One-shot Learning with Memory-Augmented Neural Networks

  340. Adaptive Computation Time for Recurrent Neural Networks

  341. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

  342. Gradient-based Hyperparameter Optimization through Reversible Learning

  343. Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education

  344. Human-Level Concept Learning through Probabilistic Program Induction

  345. abf7f07b81284b90744432cca3bf0d8cee85e469.pdf

  346. Robots that can adapt like animals

  347. Deep Learning in Neural Networks: An Overview

  348. Practical Bayesian Optimization of Machine Learning Algorithms

  349. Optimal Ordered Problem Solver (OOPS)

  350. Learning to Learn Using Gradient Descent

  351. On the Optimization of a Synaptic Learning Rule

  352. Interactions between Learning and Evolution

  353. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks

  354. Learning a synaptic learning rule

  355. Reinforcement Learning: An Introduction § Designing Reward Signals

  356. 2ddcafc570cef087ed62b0113ee2917df3a4f33a.pdf#page=491

  357. Exploring Hyperparameter Meta-Loss Landscapes With Jax

  358. f8c247b7a53735e17642638a18b45a312a3cf84f.html#google

  359. Metalearning

  360. Universal Search § OOPS and Other Incremental Variations

  361. Extrapolating to Unnatural Language Processing With GPT-3’s In-Context Learning: The Good, the Bad, and the Mysterious

  362. 53906a0a199a213fa1bce0b97ecad6b5063931e4.html

  363. How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning

  364. bdf17c80e1ed5dc516811f03acef03415b220143.html

  365. Rapid Motor Adaptation for Legged Robots

  366. Collaborating With Humans Requires Understanding Them

  367. Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability [Blog]

  368. Hypernetworks [Blog]

  369. Action and Perception As Divergence Minimization

  370. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II

  371. Prefrontal Cortex As a Meta-Reinforcement Learning System [Blog]

  372. c78ec48980d210d2a589b077f45a8d1f9e303dce.html

  373. The Lie Comes First, the Worlds to Accommodate It

  374. Sgdstore/experiments/omniglot at Master

  375. 864f04985a5c15c527c5b4465cc42d8707e7bbb5.html#omniglot

  376. Curriculum For Reinforcement Learning

  377. 82f88b3c03ad0b252a33823c09b770832a95bbcc.html#openai

  378. Neural Architecture Search

  379. e6974ff95b5a7295d018b89e865e011d9f66369b.html#openai

  380. MetaGenRL: Improving Generalization in Meta Reinforcement Learning

  381. cdfbfecb16cd0249c87142b4c650a487782f1399.html

  382. 2022: 25-Year Anniversary: LSTM (1997), All Computable Metaverses, Hierarchical Q-Learning, Adversarial Intrinsic Reinforcement Learning, Low-Complexity NNs, Low-Complexity Art, Meta-RL, Soccer Learning

  383. Metalearning or Learning to Learn Since 1987

  384. 76c24cf0db4abf4ce2b77d22182272d8e62d1a28.html

  385. The Future of Artificial Intelligence Is Self-Organizing and Self-Assembling

  386. Domain-Adaptive Meta-Learning

  387. 31439931d05db1bcf5dd08d9e9a09c6be5b970a0.html

  388. How to Fix Reinforcement Learning

  389. Introducing Adept

  390. Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes

  391. Risks from Learned Optimization: Introduction

  392. How Good Are LLMs at Doing ML on an Unknown Dataset?

  393. Early Situational Awareness and Its Implications, a Story

  394. 20ea9a879c0915ecfa2f2f87dba168dc160967cb.html

  395. AI Is Learning How to Create Itself

  396. 2874fc1c58ced20eb47a85ca46af67972fcf2ab4.html

  397. Matt Botvinick: Neuroscience, Psychology, and AI at DeepMind

  398. SMASH: One-Shot Model Architecture Search through HyperNetworks [Video]

  399. Solving Rubik’s Cube With a Robot Hand: Perturbations

  400. WELM

  401. 2023-lee-figure9-arithmeticcanbelearnedevenwithnoiseintheinnermonologuetranscripts.jpg

  402. 2022-patel-figure1-mt5fewshotpromptingwordbywordforneuralmachinetranslation.png

  403. 2020-real-googlebrain-automlzero-bestalgorithmannotation.mp4

  404. 2018-metz-appendix-figure1-detailedschematicdiagramofmetalearningarchitecture.png

  405. 2018-metz-figure1-schematicofmetalearningrepresentationsforunsupervisedlearning.jpg

  406. 2018-metz-figure5-generalizationofmetalearnedruletounseenlayesrunitsandactivations.jpg

  407. 2018-metz-figure6-learnedfiltersandrepresentationsofthemetalearnednet.jpg

  408. https://ai.facebook.com/blog/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it

  409. https://ai.meta.com/blog/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it/

  410. https://blog.waymo.com/2020/04/using-automated-data-augmentation-to.html#google

  411. 15822ba1a05b1de73ae8418eed6271516139faf5.html#google

  412. https://openai.com/blog/reptile/

  413. https://openai.com/index/mle-bench/

  414. https://pages.ucsd.edu/~rbelew/courses/cogs184_w10/readings/HintonNowlan97.pdf

  415. eb17a655841741a210b6b187477db8d4b5106390.pdf

  416. https://plato.stanford.edu/entries/selection-units/

  417. https://research.google/blog/introducing-flan-more-generalizable-language-models-with-instruction-fine-tuning/

  418. https://research.google/blog/permutation-invariant-neural-networks-for-reinforcement-learning/

  419. https://research.google/blog/training-machine-learning-models-more-efficiently-with-dataset-distillation/

  420. https://www.alignmentforum.org/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais

  421. https://www.deepmind.com/research/publications/alchemy

  422. https://www.lesswrong.com/posts/QNQuWB3hS5FrGp5yZ/programmatic-backdoors-dnns-can-use-sgd-to-run-arbitrary

  423. https://www.lesswrong.com/posts/Y4rrwkopoigaNGxmS/a-mind-needn-t-be-curious-to-reap-the-benefits-of-curiosity#XeAmDn3NsMqdF6Mij

  424. https://www.lesswrong.com/posts/bC5xd7wQCnTDw7Kyx/getting-up-to-speed-on-the-speed-prior-in-2022

  425. https://www.lesswrong.com/posts/ddR8dExcEFJKJtWvR/how-evolutionary-lineages-of-llms-can-plan-their-own-futur

  426. https://www.lesswrong.com/posts/sY3a4Rfa48CgteBEm/chatgpt-can-learn-indirect-control

  427. 149e0b85a156eeb708a4da09b8416d59170e30ac.html

  428. https://www.nature.com/articles/s41467-020-19244-4#deepmind

  429. https://www.nature.com/articles/s42256-018-0006-z#uber

  430. https://www.quantamagazine.org/researchers-build-ai-that-builds-ai-20220125/

  431. https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AMeta-RL&sort=top&restrict_sr=on&t=all

  432. e1043f593c393ed0947ea9af41a2009be24d6867.html

  433. https://x.com/SullyOmarr/status/1768744880673522083

  434. https://x.com/_jasonwei/status/1587858146948567041

  435. https://x.com/alexalbert__/status/1636488551817965568

  436. https://x.com/francoisfleuret/status/1714531085512544760

  437. https://x.com/goodside/status/1652496489241878533

  438. https://x.com/joshwhiton/status/1770870746010513571

  439. https://x.com/shinboson/status/1794570054165729303

  440. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

  441. Lil'Log

  442. Homepage: Aleksander Mądry

  443. https%253A%252F%252Farxiv.org%252Fabs%252F2410.07095%2523openai.html

  444. When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models

  445. https%253A%252F%252Farxiv.org%252Fabs%252F2406.13131.html

  446. Probing the Decision Boundaries of In-context Learning in Large Language Models

  447. Aditya Grover

  448. https%253A%252F%252Farxiv.org%252Fabs%252F2406.11233.html

  449. Zero-Shot Tokenizer Transfer

  450. https%253A%252F%252Farxiv.org%252Fabs%252F2405.07883.html

  451. Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution

  452. https%253A%252F%252Fieeexplore.ieee.org%252Fabstract%252Fdocument%252F10446522.html

  453. From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

  454. https%253A%252F%252Farxiv.org%252Fabs%252F2404.07544.html

  455. Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling

  456. https%253A%252F%252Farxiv.org%252Fabs%252F2401.16380%2523apple.html

  457. Learning few-shot imitation as cultural transmission

  458. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html

  459. Self-AIXI: Self-Predictive Universal AI

  460. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DpsXVkKO9No%2523deepmind.html

  461. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  462. https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html

  463. Teaching Arithmetic to Small Transformers

  464. https%253A%252F%252Farxiv.org%252Fabs%252F2307.03381.html

  465. Supervised Pretraining Can Learn In-Context Reinforcement Learning

  466. https%253A%252F%252Farxiv.org%252Fabs%252F2306.14892.html

  467. Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

  468. https%253A%252F%252Farxiv.org%252Fabs%252F2306.13831.html

  469. Schema-learning and rebinding as mechanisms of in-context learning and emergence

  470. https%253A%252F%252Farxiv.org%252Fabs%252F2307.01201%2523deepmind.html

  471. RGD: Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

  472. https%253A%252F%252Farxiv.org%252Fabs%252F2306.09222%2523google.html

  473. How well do Large Language Models perform in Arithmetic tasks?

  474. https%253A%252F%252Farxiv.org%252Fabs%252F2304.02015%2523alibaba.html

  475. Larger language models do in-context learning differently

  476. Jason Wei

  477. Yi Tay

  478. https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html

  479. Transformers learn in-context by gradient descent

  480. https%253A%252F%252Farxiv.org%252Fabs%252F2212.07677%2523google.html

  481. FWL: Meta-Learning Fast Weight Language Models

  482. https%253A%252F%252Farxiv.org%252Fabs%252F2212.02475%2523google.html

  483. What learning algorithm is in-context learning? Investigations with linear models

  484. Jacob Andreas @ MIT

  485. https%253A%252F%252Farxiv.org%252Fabs%252F2211.15661%2523google.html

  486. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  487. Thomas Wang

  488. Stella Biderman

  489. Teven Le Scao

  490. Sheng Shen’s Homepage

  491. Colin Raffel

  492. https%253A%252F%252Farxiv.org%252Fabs%252F2211.01786.html

  493. SAP: Bidirectional Language Models Are Also Few-shot Learners

  494. Colin Raffel

  495. https%253A%252F%252Farxiv.org%252Fabs%252F2209.14500.html

  496. g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints

  497. https%253A%252F%252Farxiv.org%252Fabs%252F2209.12892.html

  498. AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

  499. https%253A%252F%252Farxiv.org%252Fabs%252F2208.01448%2523amazon.html

  500. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

  501. Percy Liang

  502. https%253A%252F%252Farxiv.org%252Fabs%252F2208.01066.html

  503. TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data

  504. Profile – Machine Learning Lab

  505. https%253A%252F%252Farxiv.org%252Fabs%252F2207.01848.html

  506. Prompting Decision Transformer for Few-Shot Policy Generalization

  507. https%253A%252F%252Farxiv.org%252Fabs%252F2206.13499.html

  508. RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

  509. https%253A%252F%252Farxiv.org%252Fabs%252F2206.07137.html

  510. Towards Learning Universal Hyperparameter Optimizers with Transformers

  511. https%253A%252F%252Farxiv.org%252Fabs%252F2205.13320%2523google.html

  512. Gato: A Generalist Agent

  513. Nicolas Heess

  514. https%253A%252F%252Farxiv.org%252Fabs%252F2205.06175%2523deepmind.html

  515. Unifying Language Learning Paradigms

  516. Yi Tay

  517. Neil Houlsby

  518. https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html

  519. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  520. Yizhong Wang—University of Washington

  521. Noah A. Smith

  522. Hannaneh Hajishirzi—University of Washington

  523. https%253A%252F%252Farxiv.org%252Fabs%252F2204.07705.html

  524. HyperMixer: An MLP-based Low Cost Alternative to Transformers

  525. https%253A%252F%252Farxiv.org%252Fabs%252F2203.03691.html

  526. LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

  527. https%253A%252F%252Farxiv.org%252Fabs%252F2203.02094%2523microsoft.html

  528. HyperPrompt: Prompt-based Task-Conditioning of Transformers

  529. Yi Tay

  530. https%253A%252F%252Farxiv.org%252Fabs%252F2203.00759.html

  531. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

  532. Mike Lewis

  533. Hannaneh Hajishirzi—University of Washington

  534. Luke Zettlemoyer

  535. https%253A%252F%252Farxiv.org%252Fabs%252F2202.12837%2523facebook.html

  536. NeuPL: Neural Population Learning

  537. Nicolas Heess

  538. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html

  539. Learning robust perceptive locomotion for quadrupedal robots in the wild

  540. %252Fdoc%252Freinforcement-learning%252Fmeta-learning%252F2022-miki.pdf.html

  541. PFNs: Transformers Can Do Bayesian Inference

  542. Profile – Machine Learning Lab

  543. https%253A%252F%252Farxiv.org%252Fabs%252F2112.10510.html

  544. A General Language Assistant as a Laboratory for Alignment

  545. About Me

  546. Andy Jones

  547. https://jack-clark.net/about/

  548. Sam McCandlish

  549. Jared Kaplan

  550. https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html

  551. Procedural Generalization by Planning with Self-Supervised World Models

  552. Julian Schrittwieser

  553. Sherjil Ozair

  554. https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html

  555. LHOPT: A Generalizable Approach to Learning Optimizers

  556. https%253A%252F%252Farxiv.org%252Fabs%252F2106.00958%2523openai.html

  557. Reward is enough

  558. https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html

  559. Podracer architectures for scalable Reinforcement Learning

  560. https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html

  561. OmniNet: Omnidirectional Representations from Transformers

  562. Yi Tay

  563. https%253A%252F%252Farxiv.org%252Fabs%252F2103.01075%2523google.html

  564. Meta Pseudo Labels

  565. Zihang Dai

  566. https%253A%252F%252Farxiv.org%252Fabs%252F2003.10580%2523google.html

  567. Scaling down Deep Learning

  568. About Sam Greydanus

  569. https%253A%252F%252Fgreydanus.github.io%252F2020%252F12%252F01%252Fscaling-down%252F.html

  570. Matt Botvinick on the spontaneous emergence of learning algorithms

  571. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FWnqua6eQkewL3bqsF%252Fmatt-botvinick-on-the-spontaneous-emergence-of-learning.html

  572. Accelerating and Improving AlphaZero Using Population Based Training

  573. https%253A%252F%252Farxiv.org%252Fabs%252F2003.06212.html

  574. Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills

  575. Jacob Hilton's Homepage

  576. John Schulman’s Homepage

  577. https%253A%252F%252Fopenai.com%252Fresearch%252Fprocgen-benchmark.html

  578. One Epoch Is All You Need

  579. https%253A%252F%252Farxiv.org%252Fabs%252F1906.06669.html

  580. ICML 2019 Notes

  581. https%253A%252F%252Fdavid-abel.github.io%252Fnotes%252Ficml_2019.pdf.html

  582. Meta-learners’ learning dynamics are unlike learners’

  583. https%253A%252F%252Farxiv.org%252Fabs%252F1905.01320%2523deepmind.html

  584. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

  585. https://sites.google.com/view/razp/home

  586. https%253A%252F%252Farxiv.org%252Fabs%252F1904.11455%2523deepmind.html

  587. RUDDER: Return Decomposition for Delayed Rewards

  588. https%253A%252F%252Farxiv.org%252Fabs%252F1806.07857.html

  589. AutoAugment: Learning Augmentation Policies from Data

  590. Barret Zoph

  591. https%253A%252F%252Farxiv.org%252Fabs%252F1805.09501%2523google.html

  592. Meta-Learning Update Rules for Unsupervised Representation Learning

  593. Luke Metz

  594. Jascha Sohl-Dickstein

  595. https%253A%252F%252Farxiv.org%252Fabs%252F1804.00222%2523google.html

  596. Reptile: On First-Order Meta-Learning Algorithms

  597. John Schulman’s Homepage

  598. https%253A%252F%252Farxiv.org%252Fabs%252F1803.02999%2523openai.html

  599. SMASH: One-Shot Model Architecture Search through HyperNetworks

  600. https%253A%252F%252Farxiv.org%252Fabs%252F1708.05344.html

  601. Machine Teaching: an Inverse Problem to Machine Learning and an Approach Toward Optimal Education

  602. %252Fdoc%252Fai%252F2015-zhu-2.pdf.html

  603. Optimal Ordered Problem Solver (OOPS)

  604. https%253A%252F%252Farxiv.org%252Fabs%252Fcs%252F0207097%2523schmidhuber.html

  605. Learning a synaptic learning rule

  606. %252Fdoc%252Freinforcement-learning%252Fmeta-learning%252F1991-bengio.pdf.html