- See Also
-
Links
- “Why Won’t OpenAI Say What the Q✱ Algorithm Is? Supposed AI Breakthroughs Are Frequently Veiled in Secrecy, Hindering Scientific Consensus”, Hao 2023
- “Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023
- “Predictive Auxiliary Objectives in Deep RL Mimic Learning in the Brain”, Fang & Stachenfeld 2023
- “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023
- “When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)”, Mozannar et al 2023
- “MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, Wang et al 2023
- “DreamerV3: Mastering Diverse Domains through World Models”, Hafner et al 2023
- “Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, Levin et al 2022
- “PALMER: Perception-Action Loop With Memory for Long-Horizon Planning”, Beker et al 2022
- “CICERO: Human-level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, Team et al 2022
- “Online Learning and Bandits With Queried Hints”, Bhaskara et al 2022
- “E3B: Exploration via Elliptical Episodic Bonuses”, Henaff et al 2022
- “Creating a Dynamic Quadrupedal Robotic Goalkeeper With Reinforcement Learning”, Huang et al 2022
- “Top-down Design of Protein Nanomaterials With Reinforcement Learning”, Lutz et al 2022
- “Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies With One Objective (ALM)”, Ghugare et al 2022
- “IRIS: Transformers Are Sample-Efficient World Models”, Micheli et al 2022
- “LaTTe: Language Trajectory TransformEr”, Bucker et al 2022
- “Learning With Combinatorial Optimization Layers: a Probabilistic Approach”, Dalle et al 2022
- “PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations”, Lee et al 2022
- “Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex”, Tennant et al 2022
- “Inner Monologue: Embodied Reasoning through Planning With Language Models”, Huang et al 2022
- “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
- “DayDreamer: World Models for Physical Robot Learning”, Wu et al 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
- “BYOL-Explore: Exploration by Bootstrapped Prediction”, Guo et al 2022
- “Director: Deep Hierarchical Planning from Pixels”, Hafner et al 2022
- “Flexible Diffusion Modeling of Long Videos”, Harvey et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
- “Semantic Exploration from Language Abstractions and Pretrained Representations”, Tam et al 2022
- “Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning”, Valassakis et al 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Ahn et al 2022
- “Reinforcement Learning With Action-Free Pre-Training from Videos”, Seo et al 2022
- “On-the-fly Strategy Adaptation for Ad-hoc Agent Coordination”, Zand et al 2022
- “VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja-Diaz et al 2022
- “Learning Synthetic Environments and Reward Networks for Reinforcement Learning”, Ferreira et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
- “How to Build a Cognitive Map: Insights from Models of the Hippocampal Formation”, Whittington et al 2022
- “Rotting Infinitely Many-armed Bandits”, Kim et al 2022
- “Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
- “What Is the Point of Computers? A Question for Pure Mathematicians”, Buzzard 2021
- “An Experimental Design Perspective on Model-Based Reinforcement Learning”, Mehta et al 2021
- “Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates”, Kantack 2021
- “Learning Representations for Pixel-based Control: What Matters and Why?”, Tomar et al 2021
- “Learning Behaviors through Physics-driven Latent Imagination”, Richard et al 2021
- “Is Bang-Bang Control All You Need? Solving Continuous Control With Bernoulli Policies”, Seyde et al 2021
- “Skill Induction and Planning With Latent Language”, Sharma et al 2021
- “Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks”, Wu et al 2021
- “TrufLL: Learning Natural Language Generation from Scratch”, Donati et al 2021
- “Dropout’s Dream Land: Generalization from Learned Simulators to Reality”, Wellmer & Kwok 2021
- “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, Freeman et al 2021
- “FitVid: Overfitting in Pixel-Level Video Prediction”, Babaeizadeh et al 2021
- “A Graph Placement Methodology for Fast Chip Design”, Mirhoseini et al 2021
- “The Whole Prefrontal Cortex Is Premotor Cortex”, Fine & Hayden 2021
- “PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, Zellers et al 2021
- “Constructions in Combinatorics via Neural Networks”, Wagner 2021
- “Machine Translation Decoding beyond Beam Search”, Leblond et al 2021
- “Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, Waymo 2021
- “Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing”, Brunnbauer et al 2021
- “Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, Scanlon et al 2021
- “COMBO: Conservative Offline Model-Based Policy Optimization”, Yu et al 2021
- “A* Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks”, Agostinelli et al 2021
- “ViNG: Learning Open-World Navigation With Visual Goals”, Shah et al 2020
- “Targeting for Long-term Outcomes”, Yang et al 2020
- “What Are the Statistical Limits of Offline RL With Linear Function Approximation?”, Wang et al 2020
- “A Time Leap Challenge for SAT Solving”, Fichte et al 2020
- “The Overfitted Brain: Dreams Evolved to Assist Generalization”, Hoel 2020
- “RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning”, Gulcehre et al 2020
- “MOPO: Model-based Offline Policy Optimization”, Yu et al 2020
- “Learning to Simulate Dynamic Environments With GameGAN”, Kim et al 2020
- “Planning to Explore via Self-Supervised World Models”, Sekar et al 2020
- “Learning to Simulate Dynamic Environments With GameGAN [homepage]”, Kim et al 2020
- “Reinforcement Learning With Augmented Data”, Laskin et al 2020
- “Learning to Fly via Deep Model-Based Reinforcement Learning”, Becker-Ehmck et al 2020
- “Introducing Dreamer: Scalable Reinforcement Learning Using World Models”, Hafner 2020
- “Learning to Prove Theorems by Learning to Generate Theorems”, Wang & Deng 2020
- “The Gambler’s Problem and Beyond”, Wang et al 2019
- “Dream to Control: Learning Behaviors by Latent Imagination”, Hafner et al 2019
- “Approximate Inference in Discrete Distributions With Monte Carlo Tree Search and Value Functions”, Buesing et al 2019
- “Designing Agent Incentives to Avoid Reward Tampering”, Everitt et al 2019
- “An Application of Reinforcement Learning to Aerobatic Helicopter Flight”, Abbeel et al 2019
- “When to Trust Your Model: Model-Based Policy Optimization (MOPO)”, Janner et al 2019
- “Fast Task Inference With Variational Intrinsic Successor Features”, Hansen et al 2019
- “Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018
- “PlaNet: Learning Latent Dynamics for Planning from Pixels”, Hafner et al 2018
- “Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning”, Foerster et al 2018
- “Human-Like Playtesting With Deep Learning”, Gudmundsson et al 2018
- “Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search”, Zela et al 2018
- “General Value Function Networks”, Schlegel et al 2018
- “The Alignment Problem for Bayesian History-Based Reinforcement Learners”, Everitt & Hutter 2018
- “Neural Scene Representation and Rendering”, Eslami et al 2018
- “Mining Gold from Implicit Models to Improve Likelihood-free Inference”, Brehmer et al 2018
- “Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models”, Kurtl et al 2018
- “Learning to Optimize Tensor Programs”, Chen et al 2018
- “Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks With Existing Applications”, Hadash et al 2018
- “World Models”, Ha & Schmidhuber 2018
- “Differentiable Dynamic Programming for Structured Prediction and Attention”, Mensch & Blondel 2018
- “Generalization Guides Human Exploration in Vast Decision Spaces”, Wu et al 2018
- “Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI”, Segler et al 2018
- “How to Explore Chemical Space Using Algorithms and Automation”, Gromski et al 2018
- “Safe Policy Search With Gaussian Process Models”, Polymenakos et al 2017
- “Analogical-based Bayesian Optimization”, Le et al 2017
- “A Game-Theoretic Analysis of the Off-Switch Game”, Wängberg et al 2017
- “Neural Network Dynamics for Model-Based Deep Reinforcement Learning With Model-Free Fine-Tuning”, Nagabandi et al 2017
- “Learning Transferable Architectures for Scalable Image Recognition”, Zoph et al 2017
- “Learning Model-based Planning from Scratch”, Pascanu et al 2017
- “Value Prediction Network”, Oh et al 2017
- “Path Integral Networks: End-to-End Differentiable Optimal Control”, Okada et al 2017
- “Visual Semantic Planning Using Deep Successor Representations”, Zhu et al 2017
- “AIXIjs: A Software Demo for General Reinforcement Learning”, Aslanides 2017
- “DeepArchitect: Automatically Designing and Training Deep Architectures”, Negrinho & Gordon 2017
- “Stochastic Constraint Programming As Reinforcement Learning”, Prestwich et al 2017
- “Recurrent Environment Simulators”, Chiappa et al 2017
- “Prediction and Control With Temporal Segment Models”, Mishra et al 2017
- “Rotting Bandits”, Levine et al 2017
- “The Kelly Coin-Flipping Game: Exact Solutions”, Branwen et al 2017
- “The Hippocampus As a Predictive Map”, Stachenfeld et al 2017
- “The Predictron: End-To-End Learning and Planning”, Silver et al 2016
- “Model-based Adversarial Imitation Learning”, Baram et al 2016
- “DeepMath—Deep Sequence Models for Premise Selection”, Alemi et al 2016
- “Value Iteration Networks”, Tamar et al 2016
- “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, Schmidhuber 2015
- “Resorting Media Ratings”, Gwern 2015
- “Compress and Control”, Veness et al 2014
- “Learning to Win by Reading Manuals in a Monte-Carlo Framework”, Branavan et al 2014
- “Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science”, Clark 2013
- “PUCT: Continuous Upper Confidence Trees With Polynomial Exploration-Consistency”, Auger et al 2013
- “Planning As Satisfiability: Heuristics”, Rintanen 2012
- “Monte-Carlo Planning in Large POMDPs”, Silver & Veness 2010
- “A Monte Carlo AIXI Approximation”, Veness et al 2009
- “Evolution And Episodic Memory: An Analysis And Demonstration Of A Social Function Of Episodic Recollection”, Klein et al 2009
- “Resilient Machines Through Continuous Self-Modeling”, Bongard et al 2006
- “Policy Mining: Learning Decision Policies from Fixed Sets of Data”, Zadrozny 2003
- “The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions”, Schmidhuber 2002
- “A Critique of Pure Reason”, McDermott 1987
- “Human Window on the World”, Michie 1985
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Why Won’t OpenAI Say What the Q✱ Algorithm Is? Supposed AI Breakthroughs Are Frequently Veiled in Secrecy, Hindering Scientific Consensus”, Hao 2023
“Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023
“Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”
“Predictive Auxiliary Objectives in Deep RL Mimic Learning in the Brain”, Fang & Stachenfeld 2023
“Predictive auxiliary objectives in deep RL mimic learning in the brain”
“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023
“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”
“When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)”, Mozannar et al 2023
“When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)”
“MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, Wang et al 2023
“MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”
“DreamerV3: Mastering Diverse Domains through World Models”, Hafner et al 2023
“Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, Levin et al 2022
“Merging enzymatic and synthetic chemistry with computational synthesis planning”
“PALMER: Perception-Action Loop With Memory for Long-Horizon Planning”, Beker et al 2022
“PALMER: Perception-Action Loop with Memory for Long-Horizon Planning”
“CICERO: Human-level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, Team et al 2022
“Online Learning and Bandits With Queried Hints”, Bhaskara et al 2022
“E3B: Exploration via Elliptical Episodic Bonuses”, Henaff et al 2022
“Creating a Dynamic Quadrupedal Robotic Goalkeeper With Reinforcement Learning”, Huang et al 2022
“Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning”
“Top-down Design of Protein Nanomaterials With Reinforcement Learning”, Lutz et al 2022
“Top-down design of protein nanomaterials with reinforcement learning”
“Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies With One Objective (ALM)”, Ghugare et al 2022
“IRIS: Transformers Are Sample-Efficient World Models”, Micheli et al 2022
“LaTTe: Language Trajectory TransformEr”, Bucker et al 2022
“Learning With Combinatorial Optimization Layers: a Probabilistic Approach”, Dalle et al 2022
“Learning with Combinatorial Optimization Layers: a Probabilistic Approach”
“PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations”, Lee et al 2022
“Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex”, Tennant et al 2022
“Spatial representation by ramping activity of neurons in the retrohippocampal cortex”
“Inner Monologue: Embodied Reasoning through Planning With Language Models”, Huang et al 2022
“Inner Monologue: Embodied Reasoning through Planning with Language Models”
“LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
“LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action”
“DayDreamer: World Models for Physical Robot Learning”, Wu et al 2022
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”
“BYOL-Explore: Exploration by Bootstrapped Prediction”, Guo et al 2022
“Director: Deep Hierarchical Planning from Pixels”, Hafner et al 2022
“Flexible Diffusion Modeling of Long Videos”, Harvey et al 2022
“Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
“Housekeep: Tidying Virtual Households using Commonsense Reasoning”
“Semantic Exploration from Language Abstractions and Pretrained Representations”, Tam et al 2022
“Semantic Exploration from Language Abstractions and Pretrained Representations”
“Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning”, Valassakis et al 2022
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Ahn et al 2022
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”
“Reinforcement Learning With Action-Free Pre-Training from Videos”, Seo et al 2022
“Reinforcement Learning with Action-Free Pre-Training from Videos”
“On-the-fly Strategy Adaptation for Ad-hoc Agent Coordination”, Zand et al 2022
“On-the-fly Strategy Adaptation for ad-hoc Agent Coordination”
“VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja-Diaz et al 2022
“VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”
“Learning Synthetic Environments and Reward Networks for Reinforcement Learning”, Ferreira et al 2022
“Learning Synthetic Environments and Reward Networks for Reinforcement Learning”
“LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
“LID: Pre-Trained Language Models for Interactive Decision-Making”
“How to Build a Cognitive Map: Insights from Models of the Hippocampal Formation”, Whittington et al 2022
“How to build a cognitive map: insights from models of the hippocampal formation”
“Rotting Infinitely Many-armed Bandits”, Kim et al 2022
“Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
“Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”
“What Is the Point of Computers? A Question for Pure Mathematicians”, Buzzard 2021
“What is the point of computers? A question for pure mathematicians”
“An Experimental Design Perspective on Model-Based Reinforcement Learning”, Mehta et al 2021
“An Experimental Design Perspective on Model-Based Reinforcement Learning”
“Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates”, Kantack 2021
“Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates”
“Learning Representations for Pixel-based Control: What Matters and Why?”, Tomar et al 2021
“Learning Representations for Pixel-based Control: What Matters and Why?”
“Learning Behaviors through Physics-driven Latent Imagination”, Richard et al 2021
“Learning Behaviors through Physics-driven Latent Imagination”
“Is Bang-Bang Control All You Need? Solving Continuous Control With Bernoulli Policies”, Seyde et al 2021
“Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies”
“Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks”, Wu et al 2021
“Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks”
“TrufLL: Learning Natural Language Generation from Scratch”, Donati et al 2021
“Dropout’s Dream Land: Generalization from Learned Simulators to Reality”, Wellmer & Kwok 2021
“Dropout’s Dream Land: Generalization from Learned Simulators to Reality”
“Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, Freeman et al 2021
“Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”
“FitVid: Overfitting in Pixel-Level Video Prediction”, Babaeizadeh et al 2021
“A Graph Placement Methodology for Fast Chip Design”, Mirhoseini et al 2021
“The Whole Prefrontal Cortex Is Premotor Cortex”, Fine & Hayden 2021
“PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, Zellers et al 2021
“PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”
“Constructions in Combinatorics via Neural Networks”, Wagner 2021
“Machine Translation Decoding beyond Beam Search”, Leblond et al 2021
“Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, Waymo 2021
“Replaying real life: how the Waymo Driver avoids fatal human crashes”
“Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing”, Brunnbauer et al 2021
“Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing”
“Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, Scanlon et al 2021
“COMBO: Conservative Offline Model-Based Policy Optimization”, Yu et al 2021
“COMBO: Conservative Offline Model-Based Policy Optimization”
“A* Search Without Expansions: Learning Heuristic Functions With Deep Q-Networks”, Agostinelli et al 2021
“A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks”
“ViNG: Learning Open-World Navigation With Visual Goals”, Shah et al 2020
“Targeting for Long-term Outcomes”, Yang et al 2020
“What Are the Statistical Limits of Offline RL With Linear Function Approximation?”, Wang et al 2020
“What are the Statistical Limits of Offline RL with Linear Function Approximation?”
“A Time Leap Challenge for SAT Solving”, Fichte et al 2020
“The Overfitted Brain: Dreams Evolved to Assist Generalization”, Hoel 2020
“The Overfitted Brain: Dreams evolved to assist generalization”
“RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning”, Gulcehre et al 2020
“RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning”
“MOPO: Model-based Offline Policy Optimization”, Yu et al 2020
“Learning to Simulate Dynamic Environments With GameGAN”, Kim et al 2020
“Planning to Explore via Self-Supervised World Models”, Sekar et al 2020
“Learning to Simulate Dynamic Environments With GameGAN [homepage]”, Kim et al 2020
“Learning to Simulate Dynamic Environments with GameGAN [homepage]”
“Reinforcement Learning With Augmented Data”, Laskin et al 2020
“Learning to Fly via Deep Model-Based Reinforcement Learning”, Becker-Ehmck et al 2020
“Learning to Fly via Deep Model-Based Reinforcement Learning”
“Introducing Dreamer: Scalable Reinforcement Learning Using World Models”, Hafner 2020
“Introducing Dreamer: Scalable Reinforcement Learning Using World Models”
“Learning to Prove Theorems by Learning to Generate Theorems”, Wang & Deng 2020
“Learning to Prove Theorems by Learning to Generate Theorems”
“The Gambler’s Problem and Beyond”, Wang et al 2019
“Dream to Control: Learning Behaviors by Latent Imagination”, Hafner et al 2019
“Dream to Control: Learning Behaviors by Latent Imagination”
“Approximate Inference in Discrete Distributions With Monte Carlo Tree Search and Value Functions”, Buesing et al 2019
“Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions”
“Designing Agent Incentives to Avoid Reward Tampering”, Everitt et al 2019
“An Application of Reinforcement Learning to Aerobatic Helicopter Flight”, Abbeel et al 2019
“An Application of Reinforcement Learning to Aerobatic Helicopter Flight”
“When to Trust Your Model: Model-Based Policy Optimization (MOPO)”, Janner et al 2019
“When to Trust Your Model: Model-Based Policy Optimization (MOPO)”
“Fast Task Inference With Variational Intrinsic Successor Features”, Hansen et al 2019
“Fast Task Inference with Variational Intrinsic Successor Features”
“Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018
“PlaNet: Learning Latent Dynamics for Planning from Pixels”, Hafner et al 2018
“Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning”, Foerster et al 2018
“Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning”
“Human-Like Playtesting With Deep Learning”, Gudmundsson et al 2018
“Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search”, Zela et al 2018
“Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search”
“General Value Function Networks”, Schlegel et al 2018
“The Alignment Problem for Bayesian History-Based Reinforcement Learners”, Everitt & Hutter 2018
“The Alignment Problem for Bayesian History-Based Reinforcement Learners”
“Neural Scene Representation and Rendering”, Eslami et al 2018
“Mining Gold from Implicit Models to Improve Likelihood-free Inference”, Brehmer et al 2018
“Mining gold from implicit models to improve likelihood-free inference”
“Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models”, Kurtl et al 2018
“Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models”
“Learning to Optimize Tensor Programs”, Chen et al 2018
“Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks With Existing Applications”, Hadash et al 2018
“World Models”, Ha & Schmidhuber 2018
“Differentiable Dynamic Programming for Structured Prediction and Attention”, Mensch & Blondel 2018
“Differentiable Dynamic Programming for Structured Prediction and Attention”
“Generalization Guides Human Exploration in Vast Decision Spaces”, Wu et al 2018
“Generalization guides human exploration in vast decision spaces”
“Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI”, Segler et al 2018
“Planning chemical syntheses with deep neural networks and symbolic AI”
“How to Explore Chemical Space Using Algorithms and Automation”, Gromski et al 2018
“How to explore chemical space using algorithms and automation”
“Safe Policy Search With Gaussian Process Models”, Polymenakos et al 2017
“Analogical-based Bayesian Optimization”, Le et al 2017
“A Game-Theoretic Analysis of the Off-Switch Game”, Wängberg et al 2017
“Neural Network Dynamics for Model-Based Deep Reinforcement Learning With Model-Free Fine-Tuning”, Nagabandi et al 2017
“Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning”
“Learning Transferable Architectures for Scalable Image Recognition”, Zoph et al 2017
“Learning Transferable Architectures for Scalable Image Recognition”
“Learning Model-based Planning from Scratch”, Pascanu et al 2017
“Value Prediction Network”, Oh et al 2017
“Path Integral Networks: End-to-End Differentiable Optimal Control”, Okada et al 2017
“Path Integral Networks: End-to-End Differentiable Optimal Control”
“Visual Semantic Planning Using Deep Successor Representations”, Zhu et al 2017
“Visual Semantic Planning using Deep Successor Representations”
“AIXIjs: A Software Demo for General Reinforcement Learning”, Aslanides 2017
“AIXIjs: A Software Demo for General Reinforcement Learning”
“DeepArchitect: Automatically Designing and Training Deep Architectures”, Negrinho & Gordon 2017
“DeepArchitect: Automatically Designing and Training Deep Architectures”
“Stochastic Constraint Programming As Reinforcement Learning”, Prestwich et al 2017
“Stochastic Constraint Programming as Reinforcement Learning”
“Recurrent Environment Simulators”, Chiappa et al 2017
“Prediction and Control With Temporal Segment Models”, Mishra et al 2017
“Rotting Bandits”, Levine et al 2017
“The Kelly Coin-Flipping Game: Exact Solutions”, Branwen et al 2017
“The Hippocampus As a Predictive Map”, Stachenfeld et al 2017
“The Predictron: End-To-End Learning and Planning”, Silver et al 2016
“Model-based Adversarial Imitation Learning”, Baram et al 2016
“DeepMath—Deep Sequence Models for Premise Selection”, Alemi et al 2016
“Value Iteration Networks”, Tamar et al 2016
“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, Schmidhuber 2015
“Resorting Media Ratings”, Gwern 2015
“Compress and Control”, Veness et al 2014
“Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science”, Clark 2013
“Whatever next? Predictive brains, situated agents, and the future of cognitive science”
“PUCT: Continuous Upper Confidence Trees With Polynomial Exploration-Consistency”, Auger et al 2013
“PUCT: Continuous Upper Confidence Trees with Polynomial Exploration-Consistency”
“Planning As Satisfiability: Heuristics”, Rintanen 2012
“Monte-Carlo Planning in Large POMDPs”, Silver & Veness 2010
“A Monte Carlo AIXI Approximation”, Veness et al 2009
“Evolution And Episodic Memory: An Analysis And Demonstration Of A Social Function Of Episodic Recollection”, Klein et al 2009
“Resilient Machines Through Continuous Self-Modeling”, Bongard et al 2006
“Policy Mining: Learning Decision Policies from Fixed Sets of Data”, Zadrozny 2003
“Policy Mining: Learning Decision Policies from Fixed Sets of Data”
“The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions”, Schmidhuber 2002
“The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions”
“A Critique of Pure Reason”, McDermott 1987
“Human Window on the World”, Michie 1985
Wikipedia
Miscellaneous
-
/doc/reinforcement-learning/model/2020-hafner-dreamer-threephasearchitecture.png
-
/doc/reinforcement-learning/model/2020-hafner-dreamer-modelpredictions.png
-
/doc/reinforcement-learning/model/2020-hafner-dreamer-learninganimation.mp4
-
/doc/reinforcement-learning/model/2010-silver-figure1-illustrationofpomcpmctssearchoverapomdp.png
-
https://github.com/KeeyanGhoreshi/PokemonFireredSingleSequence
-
https://if50.substack.com/p/christopher-strachey-and-the-dawn
-
https://netflixtechblog.com/artwork-personalization-c589f074ad76
-
https://www.lesswrong.com/posts/ZwshvqiqCvXPsZEct/the-learning-theoretic-agenda-status-2023
-
https://www.lesswrong.com/posts/nmxzr2zsjNtjaHh7x/actually-othello-gpt-has-a-linear-emergent-world
Link Bibliography
-
https://arxiv.org/abs/2310.04406
: “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang -
https://arxiv.org/abs/2306.04930#microsoft
: “When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)”, Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz -
https://arxiv.org/abs/2301.04104#deepmind
: “DreamerV3: Mastering Diverse Domains through World Models”, Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap -
https://www.nature.com/articles/s41467-022-35422-y
: “Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, Itai Levin, Mengjie Liu, Christopher A. Voigt, Connor W. Coley -
https://www.science.org/doi/10.1126/science.ade9097#facebook
: “CICERO: Human-level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning”, -
https://arxiv.org/abs/2209.08466
: “Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies With One Objective (ALM)”, Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov -
https://arxiv.org/abs/2209.00588
: “IRIS: Transformers Are Sample-Efficient World Models”, Vincent Micheli, Eloi Alonso, François Fleuret -
https://arxiv.org/abs/2207.05608#google
: “Inner Monologue: Embodied Reasoning through Planning With Language Models”, -
https://arxiv.org/abs/2207.04429
: “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine -
https://arxiv.org/abs/2206.14176
: “DayDreamer: World Models for Physical Robot Learning”, Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel -
https://arxiv.org/abs/2206.11795#openai
: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, -
https://arxiv.org/abs/2206.04114#google
: “Director: Deep Hierarchical Planning from Pixels”, Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel -
https://arxiv.org/abs/2204.05080#deepmind
: “Semantic Exploration from Language Abstractions and Pretrained Representations”, -
https://arxiv.org/abs/2204.01691#google
: “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, -
https://arxiv.org/abs/2106.13281#google
: “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem -
https://waymo.com/blog/2021/03/replaying-real-life.html
: “Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, Waymo -
https://blog.research.google/2020/03/introducing-dreamer-scalable.html
: “Introducing Dreamer: Scalable Reinforcement Learning Using World Models”, Danijar Hafner -
2018-gudmundsson.pdf
: “Human-Like Playtesting With Deep Learning”, -
2018-everitt.pdf
: “The Alignment Problem for Bayesian History-Based Reinforcement Learners”, Tom Everitt, Marcus Hutter -
coin-flip
: “The Kelly Coin-Flipping Game: Exact Solutions”, Gwern Branwen, Arthur Breitman, nshepperd, FeepingCreature, Gurkenglas -
resorter
: “Resorting Media Ratings”, Gwern -
2010-silver.pdf
: “Monte-Carlo Planning in Large POMDPs”, David Silver, Joel Veness