- See Also
-
Links
- “Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, Rafailov et al 2023
- “Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Mezghani et al 2023
- “Learning Humanoid Locomotion With Transformers”, Radosavovic et al 2023
- “Pretraining Language Models With Human Preferences”, Korbak et al 2023
- “Conditioning Predictive Models: Risks and Strategies”, Hubinger et al 2023
-
“
g.pt
: Learning to Learn With Generative Models of Neural Network Checkpoints”, Peebles et al 2022 - “Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, Jiang et al 2022
- “Goal-Conditioned Generators of Deep Policies”, Faccio et al 2022
- “Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”, Hassabis & Fridman 2022
- “Prompting Decision Transformer for Few-Shot Policy Generalization”, Xu et al 2022
- “Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
- “When Does Return-conditioned Supervised Learning Work for Offline Reinforcement Learning?”, Brandfonbrener et al 2022
- “You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments”, Paster et al 2022
- “Multi-Game Decision Transformers”, Lee et al 2022
- “MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem”, Wen et al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
- “Planning With Diffusion for Flexible Behavior Synthesis”, Janner et al 2022
- “Gato: A Generalist Agent”, Reed et al 2022
- “Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?”, Cui et al 2022
- “All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL”, Arulkumaran et al 2022
- “Learning Relative Return Policies With Upside-Down Reinforcement Learning”, Ashley et al 2022
- “NeuPL: Neural Population Learning”, Liu et al 2022
- “ODT: Online Decision Transformer”, Zheng et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
- “In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, Kurin et al 2022
- “Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks”, Meng et al 2021
- “Shaking the Foundations: Delusions in Sequence Models for Interaction and Control”, Ortega et al 2021
- “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Janner et al 2021
- “Decision Transformer: Reinforcement Learning via Sequence Modeling”, Chen et al 2021
- “Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents”, Alcorn & Nguyen 2021
- “The Go Transformer: Natural Language Modeling for Game Play”, Ciolino et al 2020
- “Transformers Play Chess”, Cheng 2020
- “A Very Unlikely Chess Game”, Alexander 2020
- “Training Agents Using Upside-Down Reinforcement Learning (UDRL)”, Srivastava et al 2019
- “Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, Schmidhuber 2019
- “TalkRL: The Reinforcement Learning Podcast: Aravind Srinivas 2: Aravind Srinivas, Research Scientist at OpenAI, Returns to Talk Decision Transformer, VideoGPT, Choosing Problems, and Explore vs Exploit in Research Careers”
- Miscellaneous
- Link Bibliography
See Also
Links
“Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, Rafailov et al 2023
“Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model”
“Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions”, Mezghani et al 2023
“Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions”
“Learning Humanoid Locomotion With Transformers”, Radosavovic et al 2023
“Pretraining Language Models With Human Preferences”, Korbak et al 2023
“Conditioning Predictive Models: Risks and Strategies”, Hubinger et al 2023
“g.pt
: Learning to Learn With Generative Models of Neural Network Checkpoints”, Peebles et al 2022
“g.pt
: Learning to Learn with Generative Models of Neural Network Checkpoints”
“Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, Jiang et al 2022
“Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”
“Goal-Conditioned Generators of Deep Policies”, Faccio et al 2022
“Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”, Hassabis & Fridman 2022
“Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”
“Prompting Decision Transformer for Few-Shot Policy Generalization”, Xu et al 2022
“Prompting Decision Transformer for Few-Shot Policy Generalization”
“Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
“When Does Return-conditioned Supervised Learning Work for Offline Reinforcement Learning?”, Brandfonbrener et al 2022
“When does return-conditioned supervised learning work for offline reinforcement learning?”
“You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments”, Paster et al 2022
“You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments”
“Multi-Game Decision Transformers”, Lee et al 2022
“MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem”, Wen et al 2022
“MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem”
“Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
“Quark: Controllable Text Generation with Reinforced Unlearning”
“Planning With Diffusion for Flexible Behavior Synthesis”, Janner et al 2022
“Gato: A Generalist Agent”, Reed et al 2022
“Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?”, Cui et al 2022
“Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?”
“All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL”, Arulkumaran et al 2022
“All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL”
“Learning Relative Return Policies With Upside-Down Reinforcement Learning”, Ashley et al 2022
“Learning Relative Return Policies With Upside-Down Reinforcement Learning”
“NeuPL: Neural Population Learning”, Liu et al 2022
“ODT: Online Decision Transformer”, Zheng et al 2022
“Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
“In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, Kurin et al 2022
“In Defense of the Unitary Scalarization for Deep Multi-Task Learning”
“Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks”, Meng et al 2021
“Shaking the Foundations: Delusions in Sequence Models for Interaction and Control”, Ortega et al 2021
“Shaking the foundations: delusions in sequence models for interaction and control”
“Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Janner et al 2021
“Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem”
“Decision Transformer: Reinforcement Learning via Sequence Modeling”, Chen et al 2021
“Decision Transformer: Reinforcement Learning via Sequence Modeling”
“Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents”, Alcorn & Nguyen 2021
“baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents”
“The Go Transformer: Natural Language Modeling for Game Play”, Ciolino et al 2020
“The Go Transformer: Natural Language Modeling for Game Play”
“Transformers Play Chess”, Cheng 2020
“A Very Unlikely Chess Game”, Alexander 2020
“Training Agents Using Upside-Down Reinforcement Learning (UDRL)”, Srivastava et al 2019
“Training Agents using Upside-Down Reinforcement Learning (UDRL)”
“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, Schmidhuber 2019
“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”
“TalkRL: The Reinforcement Learning Podcast: Aravind Srinivas 2: Aravind Srinivas, Research Scientist at OpenAI, Returns to Talk Decision Transformer, VideoGPT, Choosing Problems, and Explore vs Exploit in Research Careers”
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2209.12892
: “g.pt
: Learning to Learn With Generative Models of Neural Network Checkpoints”, William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik -
https://arxiv.org/abs/2208.10291
: “Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian -
https://arxiv.org/abs/2206.13499
: “Prompting Decision Transformer for Few-Shot Policy Generalization”, Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan -
https://openreview.net/forum?id=0ZbPmmB61g#google
: “Boosting Search Engines With Interactive Agents”, -
https://arxiv.org/abs/2205.15241#google
: “Multi-Game Decision Transformers”, -
https://arxiv.org/abs/2205.14953
: “MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem”, Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang -
https://arxiv.org/abs/2205.06175#deepmind
: “Gato: A Generalist Agent”, -
https://arxiv.org/abs/2202.07415#deepmind
: “NeuPL: Neural Population Learning”, Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel -
https://trajectory-transformer.github.io/
: “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Michael Janner, Qiyang Colin Li, Sergey Levine -
https://sites.google.com/berkeley.edu/decision-transformer
: “Decision Transformer: Reinforcement Learning via Sequence Modeling”, -
https://arxiv.org/abs/2104.11980
: “Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents”, Michael A. Alcorn, Anh Nguyen -
https://github.com/ricsonc/transformers-play-chess/blob/master/README.md
: “Transformers Play Chess”, Ricson Cheng -
https://slatestarcodex.com/2020/01/06/a-very-unlikely-chess-game/
: “A Very Unlikely Chess Game”, Scott Alexander