Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model
Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
In-context Reinforcement Learning with Algorithm Distillation
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints
Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space
Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test
Prompting Decision Transformer for Few-Shot Policy Generalization
When does return-conditioned supervised learning work for offline reinforcement learning?
You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments
MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Quark: Controllable Text Generation with Reinforced Unlearning
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL
Learning Relative Return Policies With Upside-Down Reinforcement Learning
Jury Learning: Integrating Dissenting Voices into Machine Learning Models
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Shaking the foundations: delusions in sequence models for interaction and control
Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem
Decision Transformer: Reinforcement Learning via Sequence Modeling
baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents
The Go Transformer: Natural Language Modeling for Game Play
Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions
Training Agents using Upside-Down Reinforcement Learning (UDRL)
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
TalkRL: The Reinforcement Learning Podcast: Aravind Srinivas 2: Aravind Srinivas, Research Scientist at OpenAI, Returns to Talk Decision Transformer, VideoGPT, Choosing Problems, and Explore vs Exploit in Research Careers
Supplementary Video for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
2022-lee-figure1-multigamedecisiontransformerperformancevscompetitorson41atarigames.png
2022-lee-figure15-largermultigamedecisiontransformersaremoredatasamplefficient.png
2022-lee-figure3-causaltransformerdecisiontransformerarchitecture.jpg
2022-lee-figure5-multigamedtscalingwithmodelparametersize.jpg
2022-lee-figure7-multigamedecisiontransformerimprovesoverexpertdemonstrationsonmanyalegames.png
2022-reed-figure1-gatoageneralistagenttrainedon604tasks.png
2022-reed-figure10-roboticsfinetuningsamplefficiencybymodelscaling.jpg
2022-reed-figure2-trainingarchitectureofgatodecisiontransformer.png
2022-reed-figure5-gatoperformanceoncontroltasksdistribution.png
https://adamkarvonen.github.io/machine_learning/2024/03/20/chess-gpt-interventions.html
https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/
https://research.google/blog/training-generalist-agents-with-multi-game-decision-transformers/
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post
https://www.lesswrong.com/posts/F6vH6fr8ngo7csDdf/chess-as-a-case-study-in-hidden-capabilities-in-chatgpt
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse#pfHTedu4GKaWoxD5K
https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms
https://www.reddit.com/r/mlscaling/comments/vq6qh1/demis_hassabis_gato_is_our_most_general_agent_so/ienfekn/
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html
Supervised Pretraining Can Learn In-Context Reinforcement Learning
g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints
Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space
Prompting Decision Transformer for Few-Shot Policy Generalization
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html
MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
https%253A%252F%252Farxiv.org%252Fabs%252F2205.15241%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.06175%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html
Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem
https%253A%252F%252Ftrajectory-transformer.github.io%252F.html
Decision Transformer: Reinforcement Learning via Sequence Modeling
https%253A%252F%252Fsites.google.com%252Fberkeley.edu%252Fdecision-transformer.html
baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents
https%253A%252F%252Fgithub.com%252Fricsonc%252Ftransformers-play-chess%252Fblob%252Fmaster%252FREADME.md.html
https%253A%252F%252Fslatestarcodex.com%252F2020%252F01%252F06%252Fa-very-unlikely-chess-game%252F.html
Wikipedia Bibliography: