Bibliography:

  1. ‘model-based RL’ tag

  2. ‘discrete diffusion model’ tag

  3. ‘instruct-tuning LLMs’ tag

  4. ‘offline RL’ tag

  5. ‘truesight (stylometrics)’ tag

  6. Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

  7. Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

  8. A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

  9. Robust agents learn causal world models

  10. diff History for Neural Language Agents

  11. Responsibility & Safety: Our approach

  12. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  13. PASTA: Pretrained Action-State Transformer Agents

  14. Supervised Pretraining Can Learn In-Context Reinforcement Learning

  15. Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model

  16. Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

  17. Learning Humanoid Locomotion with Transformers

  18. Pretraining Language Models with Human Preferences

  19. Conditioning Predictive Models: Risks and Strategies

  20. Language Models as Agent Models

  21. In-context Reinforcement Learning with Algorithm Distillation

  22. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

  23. g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints

  24. Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space

  25. Goal-Conditioned Generators of Deep Policies

  26. Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test

  27. Prompting Decision Transformer for Few-Shot Policy Generalization

  28. Boosting Search Engines with Interactive Agents

  29. When does return-conditioned supervised learning work for offline reinforcement learning?

  30. You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments

  31. MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

  32. Multi-Game Decision Transformers

  33. Quark: Controllable Text Generation with Reinforced Unlearning

  34. Planning with Diffusion for Flexible Behavior Synthesis

  35. Gato: A Generalist Agent

  36. Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

  37. All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

  38. Learning Relative Return Policies With Upside-Down Reinforcement Learning

  39. NeuPL: Neural Population Learning

  40. ODT: Online Decision Transformer

  41. Jury Learning: Integrating Dissenting Voices into Machine Learning Models

  42. Can Wikipedia Help Offline Reinforcement Learning?

  43. In Defense of the Unitary Scalarization for Deep Multi-Task Learning

  44. Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

  45. Shaking the foundations: delusions in sequence models for interaction and control

  46. Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem

  47. Decision Transformer: Reinforcement Learning via Sequence Modeling

  48. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents

  49. The Go Transformer: Natural Language Modeling for Game Play

  50. Transformers Play Chess

  51. A Very Unlikely Chess Game

  52. Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions

  53. Training Agents using Upside-Down Reinforcement Learning (UDRL)

  54. Reward Hacking Behavior Can Generalize across Tasks

  55. Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

  56. 66a80fddf9ba253c4381dedd6510d3b6b21c7579.html

  57. Interview With Robert Kralisch on Simulators

  58. TalkRL: The Reinforcement Learning Podcast: Aravind Srinivas 2: Aravind Srinivas, Research Scientist at OpenAI, Returns to Talk Decision Transformer, VideoGPT, Choosing Problems, and Explore vs Exploit in Research Careers

  59. Supplementary Video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

  60. design#future-tag-features

    [Transclude the forward-link's context]

  61. 2022-lee-figure1-multigamedecisiontransformerperformancevscompetitorson41atarigames.png

  62. 2022-lee-figure15-largermultigamedecisiontransformersaremoredatasamplefficient.png

  63. 2022-lee-figure3-causaltransformerdecisiontransformerarchitecture.jpg

  64. 2022-lee-figure5-multigamedtscalingwithmodelparametersize.jpg

  65. 2022-lee-figure7-multigamedecisiontransformerimprovesoverexpertdemonstrationsonmanyalegames.png

  66. 2022-reed-figure1-gatoageneralistagenttrainedon604tasks.png

  67. 2022-reed-figure10-roboticsfinetuningsamplefficiencybymodelscaling.jpg

  68. 2022-reed-figure2-trainingarchitectureofgatodecisiontransformer.png

  69. 2022-reed-figure5-gatoperformanceoncontroltasksdistribution.png

  70. 2022-reed-figure8-gatotokenmodellogscalingcurves.jpg

  71. https://adamkarvonen.github.io/machine_learning/2024/03/20/chess-gpt-interventions.html

  72. 275ed339c97636ea0cd0e61f328c7bb40c291bca.html

  73. https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

  74. https://kzl.github.io/assets/decision_transformer.pdf

  75. 10ba5278d2ee050f39c40114f6ca76004c69918e.pdf

  76. https://laion.ai/blog/strategic-game-dataset/

  77. https://research.google/blog/training-generalist-agents-with-multi-game-decision-transformers/

  78. https://sites.google.com/view/multi-game-transformers

  79. 2dbf2c3b6b6ec9ebae5f727ce473c1db01c6bbc9.html

  80. https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

  81. https://www.lesswrong.com/posts/F6vH6fr8ngo7csDdf/chess-as-a-case-study-in-hidden-capabilities-in-chatgpt

  82. https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse#pfHTedu4GKaWoxD5K

  83. https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms

  84. https://www.nature.com/articles/s41586-023-06647-8

  85. https://www.reddit.com/r/mlscaling/comments/vq6qh1/demis_hassabis_gato_is_our_most_general_agent_so/ienfekn/

  86. e41a6abb35ebf08c5d8969e2883cdbc33fcfa7da.html

  87. https://x.com/goodside/status/1558622567635865600

  88. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  89. https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html

  90. Supervised Pretraining Can Learn In-Context Reinforcement Learning

  91. https%253A%252F%252Farxiv.org%252Fabs%252F2306.14892.html

  92. g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints

  93. https%253A%252F%252Farxiv.org%252Fabs%252F2209.12892.html

  94. Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space

  95. https%253A%252F%252Farxiv.org%252Fabs%252F2208.10291.html

  96. Prompting Decision Transformer for Few-Shot Policy Generalization

  97. https%253A%252F%252Farxiv.org%252Fabs%252F2206.13499.html

  98. Boosting Search Engines with Interactive Agents

  99. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html

  100. MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

  101. https%253A%252F%252Farxiv.org%252Fabs%252F2205.14953.html

  102. Multi-Game Decision Transformers

  103. https://evjang.com/about/

  104. Igor Mordatch

  105. https%253A%252F%252Farxiv.org%252Fabs%252F2205.15241%2523google.html

  106. Gato: A Generalist Agent

  107. Nicolas Heess

  108. https%253A%252F%252Farxiv.org%252Fabs%252F2205.06175%2523deepmind.html

  109. NeuPL: Neural Population Learning

  110. Nicolas Heess

  111. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html

  112. Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem

  113. Sergey Levine

  114. https%253A%252F%252Ftrajectory-transformer.github.io%252F.html

  115. Decision Transformer: Reinforcement Learning via Sequence Modeling

  116. Aravind Rajeswaran

  117. Kimin Lee

  118. Aditya Grover

  119. Michael (misha) Laskin

  120. Aravind Srinivas

  121. Igor Mordatch

  122. https%253A%252F%252Fsites.google.com%252Fberkeley.edu%252Fdecision-transformer.html

  123. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents

  124. https%253A%252F%252Farxiv.org%252Fabs%252F2104.11980.html

  125. Transformers Play Chess

  126. https%253A%252F%252Fgithub.com%252Fricsonc%252Ftransformers-play-chess%252Fblob%252Fmaster%252FREADME.md.html

  127. A Very Unlikely Chess Game

  128. Scott Alexander

  129. https%253A%252F%252Fslatestarcodex.com%252F2020%252F01%252F06%252Fa-very-unlikely-chess-game%252F.html