“‘Model-Based RL’ Tag”,2018-12-12
![]()
Bibliography for tag
reinforcement-learning/model, most recent first: 7 related tags, 207 annotations, & 30 links (parent).
- See Also
- Gwern
- Links
- “Centaur: a Foundation Model of Human Cognition”, et al 2024
- “Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making”, et al 2024
- “Interpretable Contrastive Monte Carlo Tree Search Reasoning”, et al 2024
- “OpenAI Co-Founder Sutskever’s New Safety-Focused AI Startup SSI Raises $1 Billion”, et al 2024
- “The Brain Simulates Actions and Their Consequences during REM Sleep”, 2024
- “Solving Path of Exile Item Crafting With Value Iteration”, 2024
- “Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models”, et al 2024
- “DT-VIN: Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning”, et al 2024
- “MCTSr: Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine With LLaMA-3-8B”, et al 2024
- “Safety Alignment Should Be Made More Than Just a Few Tokens Deep”, et al 2024
- “Can Language Models Serve As Text-Based World Simulators?”, et al 2024
- “Evaluating the World Model Implicit in a Generative Model”, et al 2024
- “OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision”, et al 2024
- “Diffusion On Syntax Trees For Program Synthesis”, et al 2024
- “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ”, et al 2024
- “DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data”, et al 2024
- “Amit’s A✱ Pages”, 2024
- “From r to Q✱: Your Language Model Is Secretly a Q-Function”, et al 2024
- “Identifying General Reaction Conditions by Bandit Optimization”, et al 2024b
- “Gradient-Based Planning With World Models”, V et al 2023
- “ReCoRe: Regularized Contrastive Representation Learning of World Model”, et al 2023
- “Can a Transformer Represent a Kalman Filter?”, 2023
- “Self-Supervised Behavior Cloned Transformers Are Path Crawlers for Text Games”, 2023
- “Why Won’t OpenAI Say What the Q✱ Algorithm Is? Supposed AI Breakthroughs Are Frequently Veiled in Secrecy, Hindering Scientific Consensus”, 2023
- “Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, et al 2023
- “The Neural Basis of Mental Navigation in Rats: A Brain–machine Interface Demonstrates Volitional Control of Hippocampal Activity”, 2023
- “Volitional Activation of Remote Place Representations With a Hippocampal Brain–machine Interface”, et al 2023
- “Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion”, et al 2023
- “Self-AIXI: Self-Predictive Universal AI”, et al 2023
- “Othello Is Solved”, 2023
- “Course Correcting Koopman Representations”, et al 2023
- “Predictive Auxiliary Objectives in Deep RL Mimic Learning in the Brain”, 2023
- “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, et al 2023
- “Comparative Study of Model-Based and Model-Free Reinforcement Learning Control Performance in HVAC Systems”, 2023
- “Learning to Model the World With Language”, et al 2023
- “Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations”, et al 2023
- “Fighting Uncertainty With Gradients: Offline Reinforcement Learning via Diffusion Score Matching”, et al 2023
- “Improving Long-Horizon Imitation Through Instruction Prediction”, et al 2023
- “When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)”, et al 2023
- “Long-Term Value of Exploration: Measurements, Findings and Algorithms”, et al 2023
- “Emergence of Belief-Like Representations through Reinforcement Learning”, et al 2023
- “Six Experiments in Action Minimization”, 2023
- “Finding Paths of Least Action With Gradient Descent”, 2023
- “MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, et al 2023
- “Graph Schemas As Abstractions for Transfer Learning, Inference, and Planning”, et al 2023
- “John Carmack’s ‘Different Path’ to Artificial General Intelligence”, 2023
- “DreamerV3: Mastering Diverse Domains through World Models”, et al 2023
- “Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, et al 2022
- “PALMER: Perception-Action Loop With Memory for Long-Horizon Planning”, et al 2022
- “Space Is a Latent [CSCG] Sequence: Structured Sequence Learning As a Unified Theory of Representation in the Hippocampus”, et al 2022
- “CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, et al 2022
- “Online Learning and Bandits With Queried Hints”, et al 2022
- “E3B: Exploration via Elliptical Episodic Bonuses”, et al 2022
- “Creating a Dynamic Quadrupedal Robotic Goalkeeper With Reinforcement Learning”, et al 2022
- “Top-Down Design of Protein Nanomaterials With Reinforcement Learning”, et al 2022
- “Simplifying Model-Based RL: Learning Representations, Latent-Space Models, and Policies With One Objective (ALM)”, et al 2022
- “IRIS: Transformers Are Sample-Efficient World Models”, et al 2022
- “LGE: Cell-Free Latent Go-Explore”, Gallouédec & 2022
- “LaTTe: Language Trajectory TransformEr”, et al 2022
- “PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations”, et al 2022
- “Learning With Combinatorial Optimization Layers: a Probabilistic Approach”, et al 2022
- “Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex”, et al 2022
- “Inner Monologue: Embodied Reasoning through Planning With Language Models”, et al 2022
- “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, et al 2022
- “DayDreamer: World Models for Physical Robot Learning”, et al 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, et al 2022
- “GODEL: Large-Scale Pre-Training for Goal-Directed Dialog”, et al 2022
- “BYOL-Explore: Exploration by Bootstrapped Prediction”, et al 2022
- “Director: Deep Hierarchical Planning from Pixels”, et al 2022
- “Flexible Diffusion Modeling of Long Videos”, et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, et al 2022
- “Semantic Exploration from Language Abstractions and Pretrained Representations”, et al 2022
- “Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning”, et al 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, et al 2022
- “Reinforcement Learning With Action-Free Pre-Training from Videos”, et al 2022
- “On-The-Fly Strategy Adaptation for Ad-Hoc Agent Coordination”, et al 2022
- “VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja- et al 2022
- “Learning Synthetic Environments and Reward Networks for Reinforcement Learning”, et al 2022
- “How to Build a Cognitive Map: Insights from Models of the Hippocampal Formation”, et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, et al 2022
- “Rotting Infinitely Many-Armed Bandits”, et al 2022
- “Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, et al 2022
- “What Is the Point of Computers? A Question for Pure Mathematicians”, 2021
- “An Experimental Design Perspective on Model-Based Reinforcement Learning”, et al 2021
- “Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates”, 2021
- “Learning Representations for Pixel-Based Control: What Matters and Why?”, et al 2021
- “Learning Behaviors through Physics-Driven Latent Imagination”, et al 2021
- “Is Bang-Bang Control All You Need? Solving Continuous Control With Bernoulli Policies”, et al 2021
- “Skill Induction and Planning With Latent Language”, et al 2021
- “Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks”, et al 2021
- “TrufLL: Learning Natural Language Generation from Scratch”, et al 2021
- “Dropout’s Dream Land: Generalization from Learned Simulators to Reality”, 2021
- “FitVid: Overfitting in Pixel-Level Video Prediction”, et al 2021
- “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, et al 2021
- “A Graph Placement Methodology for Fast Chip Design”, et al 2021
- “Planning for Novelty: Width-Based Algorithms for Common Problems in Control, Planning and Reinforcement Learning”, 2021
- “The Whole Prefrontal Cortex Is Premotor Cortex”, 2021
- “PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, et al 2021
- “Constructions in Combinatorics via Neural Networks”, 2021
- “Machine Translation Decoding beyond Beam Search”, et al 2021
- “Learning What To Do by Simulating the Past”, et al 2021
- “Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, et al 2021
- “Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing”, et al 2021
- “Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, 2021
- “Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, et al 2021
- “COMBO: Conservative Offline Model-Based Policy Optimization”, et al 2021
- “A✱ Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks”, et al 2021
- “ViNG: Learning Open-World Navigation With Visual Goals”, et al 2020
- “Inductive Biases for Deep Learning of Higher-Level Cognition”, 2020
- “Multimodal Dynamics Modeling for Off-Road Autonomous Vehicles”, et al 2020
- “Targeting for Long-Term Outcomes”, et al 2020
- “What Are the Statistical Limits of Offline RL With Linear Function Approximation?”, et al 2020
- “A Time Leap Challenge for SAT Solving”, et al 2020
- “The Overfitted Brain: Dreams Evolved to Assist Generalization”, 2020
- “RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning”, et al 2020
- “Mathematical Reasoning via Self-Supervised Skip-Tree Training”, et al 2020
- “MOPO: Model-Based Offline Policy Optimization”, et al 2020
- “Learning to Simulate Dynamic Environments With GameGAN”, et al 2020
- “Planning to Explore via Self-Supervised World Models”, et al 2020
- “Learning to Simulate Dynamic Environments With GameGAN [Homepage]”, et al 2020
- “Reinforcement Learning With Augmented Data”, et al 2020
- “Learning to Fly via Deep Model-Based Reinforcement Learning”, Becker- et al 2020
- “Introducing Dreamer: Scalable Reinforcement Learning Using World Models”, 2020
- “Reinforcement Learning for Combinatorial Optimization: A Survey”, et al 2020
- “Learning to Prove Theorems by Learning to Generate Theorems”, 2020
- “The Gambler’s Problem and Beyond”, et al 2019
- “Combining Q-Learning and Search With Amortized Value Estimates”, et al 2019
- “Dream to Control: Learning Behaviors by Latent Imagination”, et al 2019
- “Approximate Inference in Discrete Distributions With Monte Carlo Tree Search and Value Functions”, et al 2019
- “Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?”, et al 2019
- “Designing Agent Incentives to Avoid Reward Tampering”, et al 2019
- “An Application of Reinforcement Learning to Aerobatic Helicopter Flight”, et al 2019
- “When to Trust Your Model: Model-Based Policy Optimization (MOPO)”, et al 2019
- “VISR: Fast Task Inference With Variational Intrinsic Successor Features”, et al 2019
- “Learning to Reason in Large Theories without Imitation”, et al 2019
- “Biasing MCTS With Features for General Games”, et al 2019
- “Bayesian Layers: A Module for Neural Network Uncertainty”, et al 2018
- “PlaNet: Learning Latent Dynamics for Planning from Pixels”, et al 2018
- “Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning”, et al 2018
- “Human-Like Playtesting With Deep Learning”, et al 2018
- “General Value Function Networks”, et al 2018
- “Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search”, et al 2018
- “The Alignment Problem for Bayesian History-Based Reinforcement Learners”, 2018
- “Neural Scene Representation and Rendering”, et al 2018
- “Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models”, et al 2018
- “Mining Gold from Implicit Models to Improve Likelihood-Free Inference”, et al 2018
- “Learning to Optimize Tensor Programs”, et al 2018
- “Reinforcement Learning and Control As Probabilistic Inference: Tutorial and Review”, 2018
- “Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks With Existing Applications”, et al 2018
- “World Models”, 2018
- “Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling”, et al 2018
- “Differentiable Dynamic Programming for Structured Prediction and Attention”, 2018
- “How to Explore Chemical Space Using Algorithms and Automation”, et al 2018
- “Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI”, et al 2018
- “Generalization Guides Human Exploration in Vast Decision Spaces”, et al 2018
- “Safe Policy Search With Gaussian Process Models”, et al 2017
- “Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics”, 2017
- “Analogical-Based Bayesian Optimization”, et al 2017
- “A Game-Theoretic Analysis of the Off-Switch Game”, et al 2017
- “Neural Network Dynamics for Model-Based Deep Reinforcement Learning With Model-Free Fine-Tuning”, et al 2017
- “Learning Transferable Architectures for Scalable Image Recognition”, et al 2017
- “Learning Model-Based Planning from Scratch”, et al 2017
- “Value Prediction Network”, et al 2017
- “Path Integral Networks: End-To-End Differentiable Optimal Control”, et al 2017
- “Visual Semantic Planning Using Deep Successor Representations”, et al 2017
- “AIXIjs: A Software Demo for General Reinforcement Learning”, 2017
- “Metacontrol for Adaptive Imagination-Based Optimization”, et al 2017
- “DeepArchitect: Automatically Designing and Training Deep Architectures”, 2017
- “Stochastic Constraint Programming As Reinforcement Learning”, et al 2017
- “Recurrent Environment Simulators”, et al 2017
- “Prediction and Control With Temporal Segment Models”, et al 2017
- “Rotting Bandits”, et al 2017
- “The Kelly Coin-Flipping Game: Exact Solutions”, et al 2017
- “The Hippocampus As a Predictive Map”, et al 2017
- “The Predictron: End-To-End Learning and Planning”, et al 2016
- “Model-Based Adversarial Imitation Learning”, et al 2016
- “DeepMath: Deep Sequence Models for Premise Selection”, et al 2016
- “Value Iteration Networks”, et al 2016
- “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, 2015
- “Classical Planning Algorithms on the Atari Video Games”, et al 2015
- “Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-Armed Bandit Problem With Multiple Plays”, et al 2015
- “Compress and Control”, et al 2014
- “Learning to Win by Reading Manuals in a Monte-Carlo Framework”, et al 2014
- “Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science”, 2013
- “Model-Based Bayesian Exploration”, et al 2013
- “PUCT: Continuous Upper Confidence Trees With Polynomial Exploration-Consistency”, et al 2013
- “Planning As Satisfiability: Heuristics”, 2012
- “Width and Serialization of Classical Planning Problems”, 2012
- “An Empirical Evaluation of Thompson Sampling”, 2011
- “Monte-Carlo Planning in Large POMDPs”, 2010
- “A Monte Carlo AIXI Approximation”, et al 2009
- “Evolution And Episodic Memory: An Analysis And Demonstration Of A Social Function Of Episodic Recollection”, et al 2009
- “Resilient Machines Through Continuous Self-Modeling”, et al 2006
- “Policy Mining: Learning Decision Policies from Fixed Sets of Data”, 2003
- “The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions”, 2002
- “Iterative Widening”, 2001
- “Abstract Proof Search”, 2000
- “A Critique of Pure Reason”, 1987
- “Human Window on the World”, 1985
- “Why the Law of Effect Will Not Go Away”, 1974
- “Getting the World Record in HATETRIS”
- “Solving Probabilistic Tic-Tac-Toe”, 2024
- “Approximate Bayes Optimal Policy Search Using Neural Networks”
- “Embodying Addiction: A Predictive Processing Account”
- “Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku”, 2024
- “Developing a Computer Use Model”, 2024
- “Best-Of-n With Misaligned Reward Models for Math Reasoning”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography