‘Decision Transformer’ directory

See Also
Links
Miscellaneous
Bibliography

See Also

Links

“Safety Pretraining: Toward the Next Generation of Safe AI ”, Maini et al 2025

Safety Pretraining: Toward the Next Generation of Safe AI

“Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion ”, Chen et al 2024

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

“Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models ”, Karvonen 2024

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

“A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task ”, Brinkmann et al 2024

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

“Robust Agents Learn Causal World Models ”, Richens & Everitt 2024

Robust agents learn causal world models

“Diff History for Neural Language Agents ”, Piterbarg et al 2023

diff History for Neural Language Agents

“Responsibility & Safety: Our Approach ”, DeepMind 2023

Responsibility & Safety: Our approach

“Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”, Zahavy et al 2023

Diversifying AI: Towards Creative Chess with AlphaZero (AZ_db)

“PASTA: Pretrained Action-State Transformer Agents ”, Boige et al 2023

PASTA: Pretrained Action-State Transformer Agents

“Supervised Pretraining Can Learn In-Context Reinforcement Learning ”, Lee et al 2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning

“Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model ”, Rafailov et al 2023

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model

“DPO § 6.4: Validating GPT-4 Judgments With Human Judgments ”, Rafailov et al 2023 (page 10)

DPO § 6.4: Validating GPT-4 judgments with human judgments

“Think Before You Act: Unified Policy for Interleaving Language Reasoning With Actions ”, Mezghani et al 2023

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

“Learning Humanoid Locomotion With Transformers ”, Radosavovic et al 2023

Learning Humanoid Locomotion with Transformers

“Pretraining Language Models With Human Preferences ”, Korbak et al 2023

Pretraining Language Models with Human Preferences

“Conditioning Predictive Models: Risks and Strategies ”, Hubinger et al 2023

Conditioning Predictive Models: Risks and Strategies

“Language Models As Agent Models ”, Andreas 2022

Language Models as Agent Models

“In-Context Reinforcement Learning With Algorithm Distillation ”, Laskin et al 2022

In-context Reinforcement Learning with Algorithm Distillation

“Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task ”, Li et al 2022

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

“`g.pt`: Learning to Learn With Generative Models of Neural Network Checkpoints ”, Peebles et al 2022

g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints

“Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space ”, Jiang et al 2022

Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space

“Goal-Conditioned Generators of Deep Policies ”, Faccio et al 2022

Goal-Conditioned Generators of Deep Policies

“Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test ”, Hassabis & Fridman 2022

Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test

“Prompting Decision Transformer for Few-Shot Policy Generalization ”, Xu et al 2022

Prompting Decision Transformer for Few-Shot Policy Generalization

“Boosting Search Engines With Interactive Agents ”, Ciaramita et al 2022

Boosting Search Engines with Interactive Agents

“When Does Return-Conditioned Supervised Learning Work for Offline Reinforcement Learning? ”, Brandfonbrener et al 2022

When does return-conditioned supervised learning work for offline reinforcement learning?

“You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments ”, Paster et al 2022

You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments

“MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem ”, Wen et al 2022

MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

“Multi-Game Decision Transformers ”, Lee et al 2022

Multi-Game Decision Transformers

“Quark: Controllable Text Generation With Reinforced Unlearning ”, Lu et al 2022

Quark: Controllable Text Generation with Reinforced Unlearning

“Planning With Diffusion for Flexible Behavior Synthesis ”, Janner et al 2022

Planning with Diffusion for Flexible Behavior Synthesis

“Gato: A Generalist Agent ”, Reed et al 2022

Gato: A Generalist Agent

“Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation? ”, Cui et al 2022

Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

“All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL ”, Arulkumaran et al 2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

“Learning Relative Return Policies With Upside-Down Reinforcement Learning ”, Ashley et al 2022

Learning Relative Return Policies With Upside-Down Reinforcement Learning

“NeuPL: Neural Population Learning ”, Liu et al 2022

NeuPL: Neural Population Learning

“ODT: Online Decision Transformer ”, Zheng et al 2022

ODT: Online Decision Transformer

“Jury Learning: Integrating Dissenting Voices into Machine Learning Models ”, Gordon et al 2022

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

“Can Wikipedia Help Offline Reinforcement Learning? ”, Reid et al 2022

Can Wikipedia Help Offline Reinforcement Learning?

“In Defense of the Unitary Scalarization for Deep Multi-Task Learning ”, Kurin et al 2022

In Defense of the Unitary Scalarization for Deep Multi-Task Learning

“Offline Pre-Trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks ”, Meng et al 2021

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

“Shaking the Foundations: Delusions in Sequence Models for Interaction and Control ”, Ortega et al 2021

Shaking the foundations: delusions in sequence models for interaction and control

“Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem ”, Janner et al 2021

Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem

“Decision Transformer: Reinforcement Learning via Sequence Modeling ”, Chen et al 2021

Decision Transformer: Reinforcement Learning via Sequence Modeling

“Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents ”, Alcorn & Nguyen 2021

baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents

“The Go Transformer: Natural Language Modeling for Game Play ”, Ciolino et al 2020

The Go Transformer: Natural Language Modeling for Game Play

“Transformers Play Chess ”, Cheng 2020

Transformers Play Chess

“A Very Unlikely Chess Game ”, Alexander 2020

A Very Unlikely Chess Game

“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions ”, Schmidhuber 2019

Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions

“Training Agents Using Upside-Down Reinforcement Learning (UDRL) ”, Srivastava et al 2019

Training Agents using Upside-Down Reinforcement Learning (UDRL)

“Reward Hacking Behavior Can Generalize across Tasks ”

Reward hacking behavior can generalize across tasks

“Evidence of Learned Look-Ahead in a Chess-Playing Neural Network ”

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

View HTML:

/doc/www/www.greaterwrong.com/66a80fddf9ba253c4381dedd6510d3b6b21c7579.html

“Interview With Robert Kralisch on Simulators ”

Interview with Robert Kralisch on Simulators

“TalkRL: The Reinforcement Learning Podcast: Aravind Srinivas 2: Aravind Srinivas, Research Scientist at OpenAI, Returns to Talk Decision Transformer, VideoGPT, Choosing Problems, and Explore vs Exploit in Research Careers ”

TalkRL: The Reinforcement Learning Podcast: Aravind Srinivas 2: Aravind Srinivas, Research Scientist at OpenAI, returns to talk Decision Transformer, VideoGPT, choosing problems, and explore vs exploit in research careers

“Supplementary Video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances ”

Supplementary video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

https://www.youtube.com/watch?v=ysFav0b472w

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`emergent-models safety-ai divergent-learning preference-optimization game-play`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`trajectory-synthesis decision-planning sequence-modeling multi-agent diffusion-planning trajectory-transformer`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`upside-down-rl`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Miscellaneous

Bibliography

https://arxiv.org/abs/2308.09175#deepmind: “Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”, Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

link-bibliography
https://arxiv.org/abs/2306.14892: “Supervised Pretraining Can Learn In-Context Reinforcement Learning ”, Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill

link-bibliography
https://arxiv.org/pdf/2305.18290#page=10: “DPO § 6.4: Validating GPT-4 Judgments With Human Judgments ”, Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

link-bibliography
https://arxiv.org/abs/2209.12892: “g.pt: Learning to Learn With Generative Models of Neural Network Checkpoints ”, William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik

link-bibliography
https://arxiv.org/abs/2208.10291: “Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space ”, Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian

link-bibliography
https://arxiv.org/abs/2206.13499: “Prompting Decision Transformer for Few-Shot Policy Generalization ”, Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan

link-bibliography
https://openreview.net/forum?id=0ZbPmmB61g#google: “Boosting Search Engines With Interactive Agents ”, Massimiliano Ciaramita, Leonard Adolphs, Michelle Chen Huebscher, Sascha Rothe, Christian Buck, Thomas Hofmann, Yannic Kilcher, Lasse Espeholt, Pier Giuseppe Sessa, Lierni Sestorain, Benjamin Börschinger

link-bibliography
https://arxiv.org/abs/2205.14953: “MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem ”, Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang

link-bibliography
https://arxiv.org/abs/2205.15241#google: “Multi-Game Decision Transformers ”, Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch

link-bibliography
https://arxiv.org/abs/2205.06175#deepmind: “Gato: A Generalist Agent ”, Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

link-bibliography
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning ”, Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel

link-bibliography
https://trajectory-transformer.github.io/: “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem ”, Michael Janner, Qiyang Colin Li, Sergey Levine

link-bibliography
https://sites.google.com/berkeley.edu/decision-transformer: “Decision Transformer: Reinforcement Learning via Sequence Modeling ”, Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

link-bibliography
https://arxiv.org/abs/2104.11980: “Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents ”, Michael A. Alcorn, Anh Nguyen

link-bibliography
https://github.com/ricsonc/transformers-play-chess/blob/master/README.md: “Transformers Play Chess ”, Ricson Cheng

link-bibliography
https://slatestarcodex.com/2020/01/06/a-very-unlikely-chess-game/: “A Very Unlikely Chess Game ”, Scott Alexander

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]