Data Scaling Laws in Imitation Learning for Robotic Manipulation
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion
Beyond A✱: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
ReST: Reinforced Self-Training (ReST) for Language Modeling
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior
Android in the Wild: A Large-Scale Dataset for Android Device Control
GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Revisiting the Minimalist Approach to Offline Reinforcement Learning
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Toolformer: Language Models Can Teach Themselves to Use Tools
Solving math word problems with process & outcome-based feedback
CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning
In-context Reinforcement Learning with Algorithm Distillation
Human-AI Coordination via Human-Regularized Search and Learning
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Generative Personas That Behave and Experience Like Humans
Diffusion-QL: Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
Limitations of Language Models in Arithmetic and Symbolic Induction
Improved Policy Optimization for Online Imitation Learning
Watch and Match: Supercharging Imitation with Regularized Optimal Transport
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Housekeep: Tidying Virtual Households using Commonsense Reasoning
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning
Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning
Robot peels banana with goal-conditioned dual-action deep imitation learning
The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning
LID: Pre-Trained Language Models for Interactive Decision-Making
WebGPT: Browser-assisted question-answering with human feedback
Modeling Strong and Human-Like Gameplay with KL-Regularized Search
JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning
A General Language Assistant as a Laboratory for Alignment
AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies
PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks
DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs
From Motor Control to Team Play in Simulated Humanoid Football
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
Counter-Strike Deathmatch with Large-Scale Behavioral Cloning
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game
RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer
Emergent Social Learning via Multi-agent Reinforcement Learning
Learning Agile Robotic Locomotion Skills by Imitating Animals
Reinforcement Learning for Combinatorial Optimization: A Survey
Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
AI Helps Warehouse Robots Pick Up New Tricks: Backed by machine learning luminaries, Covariant.ai’s bots can handle jobs previously needing a human touch
Learning Norms from Stories: A Prior for Value Aligned Agents
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
Hierarchical Reinforcement Learning for Multi-agent MOBA Game
ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
Reward learning from human preferences and demonstrations in Atari
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Learning to Play Chess with Minimal Lookahead and Deep Value Neural Networks
DropoutDAgger: A Bayesian Approach to Safe Imitation Learning
Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
Learning human behaviors from motion capture by adversarial imitation
Grammatical Error Correction with Neural Reinforcement Learning
Path Integral Networks: End-to-End Differentiable Optimal Control
Gated-Attention Architectures for Task-Oriented Language Grounding
Visual Semantic Planning using Deep Successor Representations
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Mastering the game of Go with deep neural networks and tree search
DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Sony’s Racing Car AI Just Destroyed Its Human Competitors—By Being Nice (and Fast)
2023-lee-figure6-sampleefficiencyofvariousinnermonologueformatsshowingmoredetailedisbetterforimitationlearning.png
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
https://generallyintelligent.substack.com/p/fine-tuning-mistral-7b-on-magic-the
https://www.reddit.com/r/MachineLearning/comments/18u31w8/r_large_language_models_world_chess_championship/
https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AI&restrict_sr=on&sort=new
https%253A%252F%252Farxiv.org%252Fabs%252F2402.04494%2523deepmind.html
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero
https%253A%252F%252Farxiv.org%252Fabs%252F2310.16410%2523deepmind.html
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc
SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling with Backtracking
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2305.20050%2523openai.html
Revisiting the Minimalist Approach to Offline Reinforcement Learning
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning
%252Fdoc%252Freinforcement-learning%252Fimperfect-information%252Fdiplomacy%252F2022-bakhtin.pdf.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.10760%2523openai.html
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
https%253A%252F%252Farxiv.org%252Fabs%252F2204.03514%2523facebook.html
WebGPT: Browser-assisted question-answering with human feedback
https%253A%252F%252Farxiv.org%252Fabs%252F2112.09332%2523openai.html
A General Language Assistant as a Laboratory for Alignment
https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html
From Motor Control to Team Play in Simulated Humanoid Football
https%253A%252F%252Farxiv.org%252Fabs%252F2105.12196%2523deepmind.html
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
https%253A%252F%252Farxiv.org%252Fabs%252F2012.05672%2523deepmind.html
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game
https%253A%252F%252Farxiv.org%252Fabs%252F2011.13729%2523tencent.html
%252Fdoc%252Freinforcement-learning%252Fimitation-learning%252F2018-gudmundsson.pdf.html
Learning to Play Chess with Minimal Lookahead and Deep Value Neural Networks
%252Fdoc%252Freinforcement-learning%252Fchess%252F2017-sabatelli.pdf%2523page%253D3.html
Wikipedia Bibliography: