Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
What Are Dreams For? Converging lines of research suggest that we might be misunderstanding something we do every night of our lives
ReST: Reinforced Self-Training (ReST) for Language Modeling
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior
Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
Twitching in Sensorimotor Development from Sleeping Rats to Robots
BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations
Improving Language Models with Advantage-based Offline Policy Gradients
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions
Off-the-Grid MARL (OG-MARL): Datasets with Baselines for Offline Multi-Agent Reinforcement Learning
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
In-context Reinforcement Learning with Algorithm Distillation
CORL: Research-oriented Deep Offline Reinforcement Learning Library
Diffusion-QL: Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
Prompting Decision Transformer for Few-Shot Policy Generalization
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Offline RL for Natural Language Generation with Implicit Language Q Learning
When does return-conditioned supervised learning work for offline reinforcement learning?
Newton’s method for reinforcement learning and model predictive control
You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
A Workflow for Offline Model-Free Robotic Reinforcement Learning
Conservative Objective Models for Effective Offline Model-Based Optimization
What are the Statistical Limits of Offline RL with Linear Function Approximation?
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
Q✱ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
Scaling data-driven robotics with reward sketching and batch reinforcement learning
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
https://jacobbuckman.com/2020-11-30-conceptual-fundamentals-of-offline-rl/
https://netflixtechblog.com/learning-a-personalized-homepage-aa8ec670359a#1c3e
https://proceedings.neurips.cc/paper/2014/file/8bb88f80d334b1869781beb89f7b73be-Paper.pdf
https://sites.google.com/view/offlinerltutorial-neurips2020/home
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)
https%253A%252F%252Farxiv.org%252Fabs%252F2312.06585%2523deepmind.html
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Prompting Decision Transformer for Few-Shot Policy Generalization
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.15241%2523google.html
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2015-gomezuribe.pdf.html