Bibliography:

  1. ‘RL’ tag

  2. ‘imitation learning’ tag

  3. ‘Decision Transformer’ tag

  4. ‘robotics’ tag

  5. Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

  6. Dataset Reset Policy Optimization for RLHF

  7. Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

  8. Vision-Language Models as a Source of Rewards

  9. Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)

  10. Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

  11. Course Correcting Koopman Representations

  12. Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

  13. Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

  14. What Are Dreams For? Converging lines of research suggest that we might be misunderstanding something we do every night of our lives

  15. ReST: Reinforced Self-Training (ReST) for Language Modeling

  16. AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

  17. Learning to Model the World with Language

  18. Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior

  19. PASTA: Pretrained Action-State Transformer Agents

  20. Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

  21. Twitching in Sensorimotor Development from Sleeping Rats to Robots

  22. Survival Instinct in Offline Reinforcement Learning

  23. BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations

  24. Improving Language Models with Advantage-based Offline Policy Gradients

  25. Revisiting the Minimalist Approach to Offline Reinforcement Learning

  26. Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

  27. Off-the-Grid MARL (OG-MARL): Datasets with Baselines for Offline Multi-Agent Reinforcement Learning

  28. Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

  29. Dungeons and Data: A Large-Scale NetHack Dataset

  30. In-context Reinforcement Learning with Algorithm Distillation

  31. CORL: Research-oriented Deep Offline Reinforcement Learning Library

  32. Diffusion-QL: Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

  33. Offline RL Policies Should be Trained to be Adaptive

  34. Prompting Decision Transformer for Few-Shot Policy Generalization

  35. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

  36. Large-Scale Retrieval for Reinforcement Learning

  37. Offline RL for Natural Language Generation with Implicit Language Q Learning

  38. When does return-conditioned supervised learning work for offline reinforcement learning?

  39. Newton’s method for reinforcement learning and model predictive control

  40. You Can’t Count on Luck: Why Decision Transformers Fail in Stochastic Environments

  41. Multi-Game Decision Transformers

  42. When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

  43. Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)

  44. Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

  45. A Workflow for Offline Model-Free Robotic Reinforcement Learning

  46. Conservative Objective Models for Effective Offline Model-Based Optimization

  47. A Minimalist Approach to Offline Reinforcement Learning

  48. Is Pessimism Provably Efficient for Offline RL?

  49. What are the Statistical Limits of Offline RL with Linear Function Approximation?

  50. MOPO: Model-based Offline Policy Optimization

  51. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

  52. D4RL: Datasets for Deep Data-Driven Reinforcement Learning

  53. Q Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

  54. Scaling data-driven robotics with reward sketching and batch reinforcement learning

  55. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

  56. The Netflix Recommender System

  57. https://bair.berkeley.edu/blog/2021/10/25/coms_mbo/

  58. f7e518011142710f5ee3bd1f36fd944d2b79e79b.html

  59. https://bair.berkeley.edu/blog/2022/04/25/rl-or-bc/

  60. https://github.com/Farama-Foundation/D4RL

  61. https://github.com/hanjuku-kaso/awesome-offline-rl

  62. https://github.com/tinkoff-ai/CORL

  63. https://jacobbuckman.com/2020-11-30-conceptual-fundamentals-of-offline-rl/

  64. https://netflixtechblog.com/learning-a-personalized-homepage-aa8ec670359a#1c3e

  65. 2f5717e8bbdc98d2caf5c7a972359fbb7e7c0b28.html#1c3e

  66. https://paperswithcode.com/task/offline-rl

  67. https://proceedings.neurips.cc/paper/2014/file/8bb88f80d334b1869781beb89f7b73be-Paper.pdf

  68. 6b9be76e6bb7e96e45deb8048751ae72ce31243b.pdf

  69. https://sites.google.com/view/offlinerltutorial-neurips2020/home

  70. Dataset Reset Policy Optimization for RLHF

  71. https%253A%252F%252Farxiv.org%252Fabs%252F2404.08495.html

  72. Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models (ReSTEM)

  73. Abhishek Kumar

  74. Igor Mordatch

  75. Behnam Neyshabur

  76. Jascha Sohl-Dickstein

  77. https%253A%252F%252Farxiv.org%252Fabs%252F2312.06585%2523deepmind.html

  78. Revisiting the Minimalist Approach to Offline Reinforcement Learning

  79. https%253A%252F%252Farxiv.org%252Fabs%252F2305.09836.html

  80. Prompting Decision Transformer for Few-Shot Policy Generalization

  81. https%253A%252F%252Farxiv.org%252Fabs%252F2206.13499.html

  82. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

  83. Jeff Clune—Professor—Computer Science—University of British Columbia

  84. https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html

  85. Large-Scale Retrieval for Reinforcement Learning

  86. https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html

  87. Multi-Game Decision Transformers

  88. https://evjang.com/about/

  89. Igor Mordatch

  90. https%253A%252F%252Farxiv.org%252Fabs%252F2205.15241%2523google.html

  91. The Netflix Recommender System

  92. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2015-gomezuribe.pdf.html