Bibliography:

  1. ‘RL’ tag

  2. ‘brain imitation learning’ tag

  3. ‘knowledge distillation’ tag

  4. ‘AlphaStar’ tag

  5. ‘Decision Transformer’ tag

  6. ‘offline RL’ tag

  7. ‘preference learning’ tag

  8. ‘robotics’ tag

  9. GPT-3 Creative Fiction

  10. The Scaling Hypothesis

  11. A Revolution in How Robots Learn

  12. Data Scaling Laws in Imitation Learning for Robotic Manipulation

  13. Motor Physics: Safety Implications of Geared Motors

  14. GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

  15. Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion

  16. Beyond A: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)

  17. Grandmaster-Level Chess Without Search

  18. Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

  19. Vision-Language Models as a Source of Rewards

  20. Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games

  21. Learning few-shot imitation as cultural transmission

  22. Calibrated Language Models Must Hallucinate

  23. Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

  24. Beyond Memorization: Violating Privacy Via Inference with Large Language Models

  25. ReST: Reinforced Self-Training (ReST) for Language Modeling

  26. AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

  27. Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc

  28. Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior

  29. Android in the Wild: A Large-Scale Dataset for Android Device Control

  30. GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

  31. ChessGPT: Bridging Policy Learning and Language Modeling

  32. SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking

  33. Survival Instinct in Offline Reinforcement Learning

  34. Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

  35. Let’s Verify Step by Step

  36. The False Promise of Imitating Proprietary LLMs

  37. LIMA: Less Is More for Alignment

  38. Revisiting the Minimalist Approach to Offline Reinforcement Learning

  39. ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

  40. MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

  41. Toolformer: Language Models Can Teach Themselves to Use Tools

  42. Conditioning Predictive Models: Risks and Strategies

  43. Imitating Human Behavior with Diffusion Models

  44. Solving math word problems with process & outcome-based feedback

  45. CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning

  46. Token Turing Machines

  47. Dungeons and Data: A Large-Scale NetHack Dataset

  48. In-context Reinforcement Learning with Algorithm Distillation

  49. Scaling Laws for Reward Model Overoptimization

  50. Human-AI Coordination via Human-Regularized Search and Learning

  51. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

  52. Nearest Neighbor Non-autoregressive Text Generation

  53. Generative Personas That Behave and Experience Like Humans

  54. Diffusion-QL: Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

  55. Limitations of Language Models in Arithmetic and Symbolic Induction

  56. Improved Policy Optimization for Online Imitation Learning

  57. Watch and Match: Supercharging Imitation with Regularized Optimal Transport

  58. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

  59. Large-Scale Retrieval for Reinforcement Learning

  60. Boosting Search Engines with Interactive Agents

  61. Housekeep: Tidying Virtual Households using Commonsense Reasoning

  62. When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

  63. Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

  64. Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

  65. Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning

  66. Inferring Rewards from Language in Context

  67. Robot peels banana with goal-conditioned dual-action deep imitation learning

  68. The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

  69. VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning

  70. LID: Pre-Trained Language Models for Interactive Decision-Making

  71. Conditional Imitation Learning for Multi-Agent Games

  72. Amortized Noisy Channel Neural Machine Translation

  73. WebGPT: Browser-assisted question-answering with human feedback

  74. Modeling Strong and Human-Like Gameplay with KL-Regularized Search

  75. JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

  76. A General Language Assistant as a Laboratory for Alignment

  77. AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale

  78. RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

  79. BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

  80. Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

  81. SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies

  82. TrufLL: Learning Natural Language Generation from Scratch

  83. Relating Neural Text Degeneration to Exposure Bias

  84. Learning to Navigate Sidewalks in Outdoor Environments

  85. PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

  86. Implicit Behavioral Cloning

  87. DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

  88. Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs

  89. A Minimalist Approach to Offline Reinforcement Learning

  90. Hyperparameter Selection for Imitation Learning

  91. From Motor Control to Team Play in Simulated Humanoid Football

  92. On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

  93. Counter-Strike Deathmatch with Large-Scale Behavioral Cloning

  94. Fully General Online Imitation Learning

  95. The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

  96. Meta Learning Backpropagation And Improving It

  97. SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

  98. Imitating Interactive Intelligence

  99. TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game

  100. RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer

  101. Emergent Social Learning via Multi-agent Reinforcement Learning

  102. The NetHack Learning Environment

  103. Automatic Discovery of Interpretable Planning Strategies

  104. Learning Agile Robotic Locomotion Skills by Imitating Animals

  105. Reinforcement Learning for Combinatorial Optimization: A Survey

  106. Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

  107. AI Helps Warehouse Robots Pick Up New Tricks: Backed by machine learning luminaries, Covariant.ai’s bots can handle jobs previously needing a human touch

  108. Deep Bayesian Reward Learning from Preferences

  109. Learning Norms from Stories: A Prior for Value Aligned Agents

  110. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

  111. Learning to Reason in Large Theories without Imitation

  112. The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors

  113. Go-Explore: a New Approach for Hard-Exploration Problems

  114. Hierarchical Reinforcement Learning for Multi-agent MOBA Game

  115. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

  116. Reward learning from human preferences and demonstrations in Atari

  117. Language GANs Falling Short

  118. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

  119. Human-Like Playtesting with Deep Learning

  120. Convergence of Value Aggregation for Imitation Learning

  121. Policy Optimization by Genetic Distillation

  122. Learning to Play Chess with Minimal Lookahead and Deep Value Neural Networks

  123. DropoutDAgger: A Bayesian Approach to Safe Imitation Learning

  124. One-Shot Visual Imitation Learning via Meta-Learning

  125. Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

  126. Learning human behaviors from motion capture by adversarial imitation

  127. Grammatical Error Correction with Neural Reinforcement Learning

  128. Path Integral Networks: End-to-End Differentiable Optimal Control

  129. Gated-Attention Architectures for Task-Oriented Language Grounding

  130. Visual Semantic Planning using Deep Successor Representations

  131. A Deep Reinforced Model for Abstractive Summarization

  132. One-Shot Imitation Learning

  133. Model-based Adversarial Imitation Learning

  134. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

  135. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

  136. Generative Adversarial Imitation Learning

  137. Mastering the game of Go with deep neural networks and tree search

  138. An Invitation to Imitation

  139. DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

  140. The hidden structure of overimitation

  141. Google DeepMind’s Grandmaster-Level Chess Without Search

  142. Language Models Model Us

  143. Sony’s Racing Car AI Just Destroyed Its Human Competitors—By Being Nice (and Fast)

  144. 510e5df19155ad246f649d86fe4edf4d42406fc7.html

  145. design#future-tag-features

    [Transclude the forward-link's context]

  146. 2023-lee-figure6-sampleefficiencyofvariousinnermonologueformatsshowingmoredetailedisbetterforimitationlearning.png

  147. https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html

  148. https://bair.berkeley.edu/blog/2022/04/25/rl-or-bc/

  149. https://generallyintelligent.substack.com/p/fine-tuning-mistral-7b-on-magic-the

  150. https://github.com/Farama-Foundation/D4RL

  151. https://github.com/openai/prm800k

  152. https://github.com/thomasahle/fastchess

  153. https://mobile-aloha.github.io/

  154. https://www.reddit.com/r/MachineLearning/comments/18u31w8/r_large_language_models_world_chess_championship/

  155. https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AI&restrict_sr=on&sort=new

  156. 4c7434c41d199ea33076fbfb06c71edcae0f1381.html

  157. https://www.youtube.com/watch?v=hhiLw5Q_UFg&t=1098s

  158. https://x.com/CupiaBart/status/1793930355617259811

  159. https://x.com/prerationalist/status/1732571243407151445

  160. Grandmaster-Level Chess Without Search

  161. https%253A%252F%252Farxiv.org%252Fabs%252F2402.04494%2523deepmind.html

  162. Learning few-shot imitation as cultural transmission

  163. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html

  164. Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

  165. https%253A%252F%252Farxiv.org%252Fabs%252F2310.16410%2523deepmind.html

  166. Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc

  167. https%253A%252F%252Farxiv.org%252Fabs%252F2308.04445.html

  168. SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking

  169. Stefano Ermon

  170. https%253A%252F%252Farxiv.org%252Fabs%252F2306.05426.html

  171. Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

  172. Jeff Clune—Professor—Computer Science—University of British Columbia

  173. https%253A%252F%252Farxiv.org%252Fabs%252F2306.00323.html

  174. Let’s Verify Step by Step

  175. Jan Leike

  176. John Schulman’s Homepage

  177. https%253A%252F%252Farxiv.org%252Fabs%252F2305.20050%2523openai.html

  178. The False Promise of Imitating Proprietary LLMs

  179. Sergey Levine

  180. https%253A%252F%252Farxiv.org%252Fabs%252F2305.15717.html

  181. Revisiting the Minimalist Approach to Offline Reinforcement Learning

  182. https%253A%252F%252Farxiv.org%252Fabs%252F2305.09836.html

  183. ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

  184. Sergey Levine

  185. https%253A%252F%252Farxiv.org%252Fabs%252F2304.13705.html

  186. CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning

  187. Mike Lewis

  188. %252Fdoc%252Freinforcement-learning%252Fimperfect-information%252Fdiplomacy%252F2022-bakhtin.pdf.html

  189. Scaling Laws for Reward Model Overoptimization

  190. Leo Gao

  191. John Schulman’s Homepage

  192. Jacob Hilton's Homepage

  193. https%253A%252F%252Farxiv.org%252Fabs%252F2210.10760%2523openai.html

  194. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

  195. Hannaneh Hajishirzi—University of Washington

  196. https%253A%252F%252Farxiv.org%252Fabs%252F2210.01241.html

  197. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

  198. Jeff Clune—Professor—Computer Science—University of British Columbia

  199. https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html

  200. Large-Scale Retrieval for Reinforcement Learning

  201. https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html

  202. Boosting Search Engines with Interactive Agents

  203. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html

  204. Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

  205. https%253A%252F%252Farxiv.org%252Fabs%252F2204.03514%2523facebook.html

  206. WebGPT: Browser-assisted question-answering with human feedback

  207. Jacob Hilton's Homepage

  208. Gretchen Krueger

  209. John Schulman’s Homepage

  210. https%253A%252F%252Farxiv.org%252Fabs%252F2112.09332%2523openai.html

  211. A General Language Assistant as a Laboratory for Alignment

  212. About Me

  213. Andy Jones

  214. https://jack-clark.net/about/

  215. Sam McCandlish

  216. Jared Kaplan

  217. https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html

  218. From Motor Control to Team Play in Simulated Humanoid Football

  219. Guy Lever

  220. Nicolas Heess

  221. https%253A%252F%252Farxiv.org%252Fabs%252F2105.12196%2523deepmind.html

  222. The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

  223. John Schulman’s Homepage

  224. https%253A%252F%252Farxiv.org%252Fabs%252F2101.11071.html

  225. Imitating Interactive Intelligence

  226. Language Understanding Grounded in Perception and Action

  227. https%253A%252F%252Farxiv.org%252Fabs%252F2012.05672%2523deepmind.html

  228. TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game

  229. https%253A%252F%252Farxiv.org%252Fabs%252F2011.13729%2523tencent.html

  230. Language GANs Falling Short

  231. https%253A%252F%252Farxiv.org%252Fabs%252F1811.02549.html

  232. Human-Like Playtesting with Deep Learning

  233. %252Fdoc%252Freinforcement-learning%252Fimitation-learning%252F2018-gudmundsson.pdf.html

  234. Learning to Play Chess with Minimal Lookahead and Deep Value Neural Networks

  235. %252Fdoc%252Freinforcement-learning%252Fchess%252F2017-sabatelli.pdf%2523page%253D3.html