- See Also
-
Links
- “MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, Et Al 2023
- “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Et Al 2023
- “DreamerV3: Mastering Diverse Domains through World Models”, Et Al 2023
- “Curiosity in Hindsight”, Et Al 2022
- “E3B: Exploration via Elliptical Episodic Bonuses”, Et Al 2022
- “Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners”, Et Al 2022
- “A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning”, Et Al 2022
- “Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, Et Al 2022
- “Value-free Random Exploration Is Linked to Impulsivity”, 2022
- “Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling”, 2022
- “The Cost of Information Acquisition by Natural Selection”, Et Al 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Et Al 2022
- “BYOL-Explore: Exploration by Bootstrapped Prediction”, Et Al 2022
- “Multi-Objective Hyperparameter Optimization—An Overview”, Et Al 2022
- “Director: Deep Hierarchical Planning from Pixels”, Et Al 2022
- “Boosting Search Engines With Interactive Agents”, Et Al 2022
- “Towards Learning Universal Hyperparameter Optimizers With Transformers”, Et Al 2022
- “Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments”, Et Al 2022
- “Effective Mutation Rate Adaptation through Group Elite Selection”, Et Al 2022
- “Semantic Exploration from Language Abstractions and Pretrained Representations”, Et Al 2022
- “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Et Al 2022
- “CLIP on Wheels (CoW): Zero-Shot Object Navigation As Object Localization and Exploration”, Et Al 2022
- “Policy Improvement by Planning With Gumbel”, Et Al 2022
- “VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja-Et Al 2022
- “Learning Causal Overhypotheses through Exploration in Children and Computational Models”, Et Al 2022
- “Policy Learning and Evaluation With Randomized Quasi-Monte Carlo”, Et Al 2022
- “NeuPL: Neural Population Learning”, Et Al 2022
- “ODT: Online Decision Transformer”, Et Al 2022
- “EvoJAX: Hardware-Accelerated Neuroevolution”, Et Al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, Et Al 2022
- “Accelerated Quality-Diversity for Robotics through Massive Parallelism”, Et Al 2022
- “Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning”, Et Al 2022
- “Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination”, 2022
- “Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots”, Et Al 2022
- “Environment Generation for Zero-Shot Compositional Reinforcement Learning”, Et Al 2022
- “Safe Deep RL in 3D Environments Using Human Feedback”, Et Al 2022
- “Automated Reinforcement Learning (AutoRL): A Survey and Open Problems”, Parker-Et Al 2022
- “Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination”, Et Al 2021
- “The Costs and Benefits of Dispersal in Small Populations”, 2021
- “The Geometry of Decision-making in Individuals and Collectives”, Et Al 2021
- “An Experimental Design Perspective on Model-Based Reinforcement Learning”, Et Al 2021
- “JueWu-MC: Playing Minecraft With Sample-efficient Hierarchical Reinforcement Learning”, Et Al 2021
- “Correspondence between Neuroevolution and Gradient Descent”, Et Al 2021
- “Procedural Generalization by Planning With Self-Supervised World Models”, Et Al 2021
- “URLB: Unsupervised Reinforcement Learning Benchmark”, Et Al 2021
- “Mastering Atari Games With Limited Data”, Et Al 2021
- “Discovering and Achieving Goals via World Models”, Et Al 2021
- “The Structure of Genotype-phenotype Maps Makes Fitness Landscapes Navigable”, Et Al 2021
- “Monkey Plays Pac-Man With Compositional Strategies and Hierarchical Decision-making”, Et Al 2021
- “A Review of the Gumbel-max Trick and Its Extensions for Discrete Stochasticity in Machine Learning”, Et Al 2021
- “Neural Autopilot and Context-sensitivity of Habits”, 2021
- “TrufLL: Learning Natural Language Generation from Scratch”, Et Al 2021
- “Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration”, Et Al 2021
- “Bootstrapped Meta-Learning”, Et Al 2021
- “Open-Ended Learning Leads to Generally Capable Agents”, Et Al 2021
- “Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs”, Et Al 2021
- “Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability”, Et Al 2021
- “Imitation-driven Cultural Collapse”, Duran-2021
- “Multi-task Curriculum Learning in a Complex, Visual, Hard-exploration Domain: Minecraft”, Et Al 2021
- “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Et Al 2021
- “From Motor Control to Team Play in Simulated Humanoid Football”, Et Al 2021
- “Reward Is Enough”, Et Al 2021
- “Intelligence and Unambitiousness Using Algorithmic Information Theory”, Et Al 2021
- “Principled Exploration via Optimistic Bootstrapping and Backward Induction”, Et Al 2021
- “Deep Bandits Show-Off: Simple and Efficient Exploration With Deep Networks”, 2021
- “On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Et Al 2021
- “What Are Bayesian Neural Network Posteriors Really Like?”, Et Al 2021
- “Epistemic Autonomy: Self-supervised Learning in the Mammalian Hippocampus”, Santos-Et Al 2021
- “Bayesian Optimization Is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020”, Et Al 2021
- “Flexible Modulation of Sequence Generation in the Entorhinal-hippocampal System”, Et Al 2021
- “Reinforcement Learning, Bit by Bit”, Et Al 2021
- “Asymmetric Self-play for Automatic Goal Discovery in Robotic Manipulation”, OpenAI Et Al 2021
- “Informational Herding, Optimal Experimentation, and Contrarianism”, Et Al 2021
- “Go-Explore: First Return, Then Explore”, Et Al 2021
- “TacticZero: Learning to Prove Theorems from Scratch With Deep Reinforcement Learning”, Et Al 2021
- “Proof Artifact Co-training for Theorem Proving With Language Models”, Et Al 2021
- “Is Pessimism Provably Efficient for Offline RL?”, Et Al 2020
- “Imitating Interactive Intelligence”, Et Al 2020
- “Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian”, Parker-Et Al 2020
- “Meta-trained Agents Implement Bayes-optimal Agents”, Et Al 2020
- “Learning Not to Learn: Nature versus Nurture in Silico”, 2020
- “The Child As Hacker”, Et Al 2020
- “The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom”, Et Al 2020
- “Assessing Game Balance With AlphaZero: Exploring Alternative Rule Sets in Chess”, Et Al 2020
- “The Overfitted Brain: Dreams Evolved to Assist Generalization”, 2020
- “Exploration Strategies in Deep Reinforcement Learning”, 2020
- “Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search”, Et Al 2020
- “Automatic Discovery of Interpretable Planning Strategies”, Et Al 2020
- “Planning to Explore via Self-Supervised World Models”, Et Al 2020
- “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, Et Al 2020
- “First Return, Then Explore”, Et Al 2020
- “Approximate Exploitability: Learning a Best Response in Large Games”, Et Al 2020
- “Real World Games Look Like Spinning Tops”, Et Al 2020
- “Agent57: Outperforming the Human Atari Benchmark”, Et Al 2020
- “Agent57: Outperforming the Atari Human Benchmark”, Et Al 2020
- “Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and Their Solutions”, Et Al 2020
- “Meta-learning Curiosity Algorithms”, Et Al 2020
- “Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey”, Et Al 2020
- “AutoML-Zero: Evolving Machine Learning Algorithms From Scratch”, Et Al 2020
- “Never Give Up: Learning Directed Exploration Strategies”, Et Al 2020
- “Effective Diversity in Population Based Reinforcement Learning”, Parker-Et Al 2020
- “Near-perfect Point-goal Navigation from 2.5 Billion Frames of Experience”, 2020
- “Learning Human Objectives by Evaluating Hypothetical Behavior”, Et Al 2019
- “Optimal Policies Tend to Seek Power”, Et Al 2019
- “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Et Al 2019
- “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Et Al 2019
- “Emergent Tool Use From Multi-Agent Autocurricula”, Et Al 2019
- “R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”, Et Al 2019
- “Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment”, Et Al 2019
- “A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment”, Et Al 2019
- “An Optimistic Perspective on Offline Reinforcement Learning”, Et Al 2019
- “Meta Reinforcement Learning”, 2019
- “Search on the Replay Buffer: Bridging Planning and Reinforcement Learning”, Et Al 2019
- “ICML 2019 Notes”, 2019
- “Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, Et Al 2019
- “AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
- “Reinforcement Learning, Fast and Slow”, Et Al 2019
- “Meta Reinforcement Learning As Task Inference”, Et Al 2019
- “Meta-learning of Sequential Strategies”, Et Al 2019
- “Π-IW: Deep Policies for Width-Based Planning in Pixel Domains”, Et Al 2019
- “Learning To Follow Directions in Street View”, Et Al 2019
- “A Generalized Framework for Population Based Training”, Et Al 2019
- “Go-Explore: a New Approach for Hard-Exploration Problems”, Et Al 2019
- “Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions”, Et Al 2019
- “V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing”, Et Al 2019
- “Is the FDA Too Conservative or Too Aggressive?: A Bayesian Decision Analysis of Clinical Trial Design”, Et Al 2019
- “Machine-learning-guided Directed Evolution for Protein Engineering”, Et Al 2019
- “Common Neural Code for Reward and Information Value”, 2019
- “Enjoy It Again: Repeat Experiences Are Less Repetitive Than People Think”, 2019
- “The Bayesian Superorganism III: Externalized Memories Facilitate Distributed Sampling”, Et Al 2018
- “Evolutionary-Neural Hybrid Agents for Architecture Search”, Et Al 2018
- “Exploration in the Wild”, Et Al 2018
- “Off-Policy Deep Reinforcement Learning without Exploration”, Et Al 2018
- “An Introduction to Deep Reinforcement Learning”, Francois-Et Al 2018
- “The Bayesian Superorganism I: Collective Probability Estimation”, Et Al 2018
- “Exploration by Random Network Distillation”, Et Al 2018
- “Computational Noise in Reward-guided Learning Drives Behavioral Variability in Volatile Environments”, Et Al 2018
- “RND: Large-Scale Study of Curiosity-Driven Learning”, Et Al 2018
- “Visual Reinforcement Learning With Imagined Goals”, Et Al 2018
- “Is Q-learning Provably Efficient?”, Et Al 2018
- “Improving Width-based Planning With Compact Policies”, Et Al 2018
- “Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory”, Et Al 2018
- “Re-evaluating Evaluation”, Et Al 2018
- “Mix&Match—Agent Curricula for Reinforcement Learning”, Et Al 2018
- “Observe and Look Further: Achieving Consistent Performance on Atari”, Et Al 2018
- “Playing Hard Exploration Games by Watching YouTube”, Et Al 2018
- “Generalization and Search in Risky Environments”, Et Al 2018
- “Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution”, Et Al 2018
- “Learning to Navigate in Cities Without a Map”, Et Al 2018
- “The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities”, Et Al 2018
- “Some Considerations on Learning to Explore via Meta-Reinforcement Learning”, Et Al 2018
- “One Big Net For Everything”, 2018
- “Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari”, Et Al 2018
- “Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration”, Et Al 2018
- “Learning to Search With MCTSnets”, Et Al 2018
- “Learning and Querying Fast Generative Models for Reinforcement Learning”, Et Al 2018
- “Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning”, Et Al 2018
- “Safe Exploration in Continuous Action Spaces”, Et Al 2018
- “Deep Reinforcement Fuzzing”, Et Al 2018
- “Generalization Guides Human Exploration in Vast Decision Spaces”, Et Al 2018
- “Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI”, Et Al 2018
- “Innovation and Cumulative Culture through Tweaks and Leaps in Online Programming Contests”, Et Al 2018
- “Finding Competitive Network Architectures Within a Day Using UCT”, 2017
- “A Flexible Approach to Automated RNN Architecture Generation”, Et Al 2017
- “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, Et Al 2017
- “Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents”, Et Al 2017
- “The Paradoxical Sustainability of Periodic Migration and Habitat Destruction”, 2017
- “Posterior Sampling for Large Scale Reinforcement Learning”, Et Al 2017
- “Policy Optimization by Genetic Distillation”, 2017
- “Emergent Complexity via Multi-Agent Competition”, Et Al 2017
- “An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits”, 2017
- “The Uncertainty Bellman Equation and Exploration”, Et Al 2017
- “Changing Their Tune: How Consumers’ Adoption of Online Streaming Affects Music Consumption and Discovery”, Et Al 2017
- “A Rational Choice Framework for Collective Behavior”, 2017
- “Imagination-Augmented Agents for Deep Reinforcement Learning”, Et Al 2017
- “Distral: Robust Multitask Reinforcement Learning”, Et Al 2017
- “The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously”, Et Al 2017
- “Emergence of Locomotion Behaviours in Rich Environments”, Et Al 2017
- “Noisy Networks for Exploration”, Et Al 2017
- “CAN: Creative Adversarial Networks, Generating”Art” by Learning About Styles and Deviating from Style Norms”, Et Al 2017
- “Device Placement Optimization With Reinforcement Learning”, Et Al 2017
- “Scalable Generalized Linear Bandits: Online Computation and Hashing”, Et Al 2017
- “Ask the Right Questions: Active Question Reformulation With Reinforcement Learning”, Et Al 2017
- “DeepXplore: Automated Whitebox Testing of Deep Learning Systems”, Et Al 2017
- “Recurrent Environment Simulators”, Et Al 2017
- “Learned Optimizers That Scale and Generalize”, Et Al 2017
- “Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, Et Al 2017
- “Large-Scale Evolution of Image Classifiers”, Et Al 2017
- “CoDeepNEAT: Evolving Deep Neural Networks”, Et Al 2017
- “Neural Combinatorial Optimization With Reinforcement Learning”, Et Al 2017
- “Neural Data Filter for Bootstrapping Stochastic Gradient Descent”, Et Al 2017
- “Search in Patchy Media: Exploitation-exploration Tradeoff”
- “Towards Information-Seeking Agents”, Et Al 2016
- “Learning to Learn without Gradient Descent by Gradient Descent”, Et Al 2016
- “Learning to Perform Physics Experiments via Deep Reinforcement Learning”, Et Al 2016
- “Neural Architecture Search With Reinforcement Learning”, 2016
- “Combating Reinforcement Learning’s Sisyphean Curse With Intrinsic Fear”, Et Al 2016
- “Bayesian Reinforcement Learning: A Survey”, Et Al 2016
- “Human Collective Intelligence As Distributed Bayesian Inference”, Et Al 2016
- “Universal Darwinism As a Process of Bayesian Inference”, 2016
- “Unifying Count-Based Exploration and Intrinsic Motivation”, Et Al 2016
- “Candy Japan’s New Box A/B Test”, 2016
- “D-TS: Double Thompson Sampling for Dueling Bandits”, 2016
- “Deep Exploration via Bootstrapped DQN”, Et Al 2016
- “Online Batch Selection for Faster Training of Neural Networks”, 2015
- “The Psychology and Neuroscience of Curiosity”, 2015
- “What My Deep Model Doesn’t Know…”, 2015
- “Thompson Sampling With the Online Bootstrap”, 2014
- “On the Complexity of Best Arm Identification in Multi-Armed Bandit Models”, Et Al 2014
- “Freeze-Thaw Bayesian Optimization”, Et Al 2014
- “Search for the Wreckage of Air France Flight AF 447”, Et Al 2014
- “(More) Efficient Reinforcement Learning via Posterior Sampling”, Et Al 2013
- “PUCT: Continuous Upper Confidence Trees With Polynomial Exploration-Consistency”, Et Al 2013
- “Experimental Design for Partially Observed Markov Decision Processes”, 2012
- “Learning Is Planning: near Bayes-optimal Reinforcement Learning via Monte-Carlo Tree Search”, 2012
- “Abandoning Objectives: Evolution Through the Search for Novelty Alone”, 2011
- “PILCO: A Model-Based and Data-Efficient Approach to Policy Search”, 2011
- “Formal Theory of Creativity & Fun & Intrinsic Motivation (1990–2010)”, 2010
- “The Epistemic Benefit of Transient Diversity”, 2009
- “Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players”, Et Al 2009
- “Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes”, 2008
- “Pure Exploration for Multi-Armed Bandit Problems”, Et Al 2008
- “Towards Efficient Evolutionary Design of Autonomous Robots”, 2008
- “Bayesian Adaptive Exploration”, 2003
- “NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002
- “A Bayesian Framework for Reinforcement Learning”, 2000
- “Case Studies in Evolutionary Experimentation and Computation”, 2000
- “Efficient Progressive Sampling”, Et Al 1999b
- “Interactions between Learning and Evolution”, 1992
- “Evolution Strategy: Nature’s Way of Optimization”, 1989
- “Evolutionsstrategien”, 1977
- “Evolutionsstrategie: Optimierung Technischer Systeme Nach Prinzipien Der Biologischen Evolution”, 1973
- “Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too)”
- “Reinforcement Learning With Prediction-Based Rewards”
- “Conditions for Mathematical Equivalence of Stochastic Gradient Descent and Natural Selection”
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, Et Al 2023
“MimicPlay: Long-Horizon Imitation Learning by Watching Human Play”, 2023-02-24 ( ; similar)
“MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Et Al 2023
“MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, 2023-02-12 ( ; similar; bibliography)
“DreamerV3: Mastering Diverse Domains through World Models”, Et Al 2023
“DreamerV3: Mastering Diverse Domains through World Models”, 2023-01-10 ( ; similar; bibliography)
“Curiosity in Hindsight”, Et Al 2022
“Curiosity in hindsight”, 2022-11-18 (similar)
“E3B: Exploration via Elliptical Episodic Bonuses”, Et Al 2022
“E3B: Exploration via Elliptical Episodic Bonuses”, 2022-10-11 ( ; similar)
“Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners”, Et Al 2022
“Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners”, 2022-09-05 ( ; similar; bibliography)
“A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning”, Et Al 2022
“A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning”, 2022-08-23 ( ; similar)
“Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, Et Al 2022
“Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, 2022-08-22 ( ; similar; bibliography)
“Value-free Random Exploration Is Linked to Impulsivity”, 2022
“Value-free random exploration is linked to impulsivity”, 2022-08-04 ( ; similar; bibliography)
“Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling”, 2022
“Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling”, 2022-07-09 ( ; similar)
“The Cost of Information Acquisition by Natural Selection”, Et Al 2022
“The cost of information acquisition by natural selection”, 2022-07-03 ( ; similar)
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Et Al 2022
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, 2022-06-23 ( ; similar; bibliography)
“BYOL-Explore: Exploration by Bootstrapped Prediction”, Et Al 2022
“BYOL-Explore: Exploration by Bootstrapped Prediction”, 2022-06-16 ( ; similar)
“Multi-Objective Hyperparameter Optimization—An Overview”, Et Al 2022
“Multi-Objective Hyperparameter Optimization—An Overview”, 2022-06-15 (similar)
“Director: Deep Hierarchical Planning from Pixels”, Et Al 2022
“Director: Deep Hierarchical Planning from Pixels”, 2022-06-08 ( ; similar; bibliography)
“Boosting Search Engines With Interactive Agents”, Et Al 2022
“Boosting Search Engines with Interactive Agents”, 2022-06-04 ( ; similar; bibliography)
“Towards Learning Universal Hyperparameter Optimizers With Transformers”, Et Al 2022
“Towards Learning Universal Hyperparameter Optimizers with Transformers”, 2022-05-26 ( ; similar; bibliography)
“Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments”, Et Al 2022
“Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments”, 2022-05-14 (similar)
“Effective Mutation Rate Adaptation through Group Elite Selection”, Et Al 2022
“Effective Mutation Rate Adaptation through Group Elite Selection”, 2022-04-11 ( ; similar)
“Semantic Exploration from Language Abstractions and Pretrained Representations”, Et Al 2022
“Semantic Exploration from Language Abstractions and Pretrained Representations”, 2022-04-08 ( ; similar; bibliography)
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Et Al 2022
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, 2022-04-07 ( ; similar; bibliography)
“CLIP on Wheels (CoW): Zero-Shot Object Navigation As Object Localization and Exploration”, Et Al 2022
“CLIP on Wheels (CoW): Zero-Shot Object Navigation as Object Localization and Exploration”, 2022-03-20 ( ; similar)
“Policy Improvement by Planning With Gumbel”, Et Al 2022
“Policy improvement by planning with Gumbel”, 2022-03-04 ( ; similar; bibliography)
“VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, Borja-Et Al 2022
“VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning”, 2022-03-01 ( ; similar)
“Learning Causal Overhypotheses through Exploration in Children and Computational Models”, Et Al 2022
“Learning Causal Overhypotheses through Exploration in Children and Computational Models”, 2022-02-21 ( ; similar)
“Policy Learning and Evaluation With Randomized Quasi-Monte Carlo”, Et Al 2022
“Policy Learning and Evaluation with Randomized Quasi-Monte Carlo”, 2022-02-16 ( ; backlinks; similar)
“NeuPL: Neural Population Learning”, Et Al 2022
“NeuPL: Neural Population Learning”, 2022-02-15 ( ; similar; bibliography)
“ODT: Online Decision Transformer”, Et Al 2022
“ODT: Online Decision Transformer”, 2022-02-11 ( ; similar)
“EvoJAX: Hardware-Accelerated Neuroevolution”, Et Al 2022
“EvoJAX: Hardware-Accelerated Neuroevolution”, 2022-02-10 ( ; similar)
“LID: Pre-Trained Language Models for Interactive Decision-Making”, Et Al 2022
“LID: Pre-Trained Language Models for Interactive Decision-Making”, 2022-02-03 ( ; backlinks; similar)
“Accelerated Quality-Diversity for Robotics through Massive Parallelism”, Et Al 2022
“Accelerated Quality-Diversity for Robotics through Massive Parallelism”, 2022-02-02 ( ; backlinks; similar)
“Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning”, Et Al 2022
“Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning”, 2022-01-31 (similar)
“Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination”, 2022
“Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination”, 2022-01-28 ( ; similar)
“Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots”, Et Al 2022
“Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots”, 2022-01-24 ( ; similar)
“Environment Generation for Zero-Shot Compositional Reinforcement Learning”, Et Al 2022
“Environment Generation for Zero-Shot Compositional Reinforcement Learning”, 2022-01-21 ( ; similar)
“Safe Deep RL in 3D Environments Using Human Feedback”, Et Al 2022
“Safe Deep RL in 3D Environments using Human Feedback”, 2022-01-20 ( ; similar)
“Automated Reinforcement Learning (AutoRL): A Survey and Open Problems”, Parker-Et Al 2022
“Automated Reinforcement Learning (AutoRL): A Survey and Open Problems”, 2022-01-11 ( ; similar)
“Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination”, Et Al 2021
“Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination”, 2021-12-22 ( ; similar; bibliography)
“The Costs and Benefits of Dispersal in Small Populations”, 2021
“The costs and benefits of dispersal in small populations”, 2021-12-16 ( ; similar)
“The Geometry of Decision-making in Individuals and Collectives”, Et Al 2021
“The geometry of decision-making in individuals and collectives”, 2021-12-14 ( ; similar)
“An Experimental Design Perspective on Model-Based Reinforcement Learning”, Et Al 2021
“An Experimental Design Perspective on Model-Based Reinforcement Learning”, 2021-12-09 ( ; similar)
“JueWu-MC: Playing Minecraft With Sample-efficient Hierarchical Reinforcement Learning”, Et Al 2021
“JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning”, 2021-12-07 (similar)
“Correspondence between Neuroevolution and Gradient Descent”, Et Al 2021
“Correspondence between neuroevolution and gradient descent”, 2021-11-02 ( ; backlinks; similar)
“Procedural Generalization by Planning With Self-Supervised World Models”, Et Al 2021
“Procedural Generalization by Planning with Self-Supervised World Models”, 2021-11-02 ( ; similar; bibliography)
“URLB: Unsupervised Reinforcement Learning Benchmark”, Et Al 2021
“URLB: Unsupervised Reinforcement Learning Benchmark”, 2021-10-31 (similar)
“Mastering Atari Games With Limited Data”, Et Al 2021
“Mastering Atari Games with Limited Data”, 2021-10-30 ( ; backlinks; similar)
“Discovering and Achieving Goals via World Models”, Et Al 2021
“Discovering and Achieving Goals via World Models”, 2021-10-18 ( ; similar)
“The Structure of Genotype-phenotype Maps Makes Fitness Landscapes Navigable”, Et Al 2021
“The structure of genotype-phenotype maps makes fitness landscapes navigable”, 2021-10-12 ( ; backlinks; similar)
“Monkey Plays Pac-Man With Compositional Strategies and Hierarchical Decision-making”, Et Al 2021
“Monkey Plays Pac-Man with Compositional Strategies and Hierarchical Decision-making”, 2021-10-04 ( ; similar)
“A Review of the Gumbel-max Trick and Its Extensions for Discrete Stochasticity in Machine Learning”, Et Al 2021
“A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning”, 2021-10-04 ( ; backlinks; similar)
“Neural Autopilot and Context-sensitivity of Habits”, 2021
“Neural autopilot and context-sensitivity of habits”, 2021-10-01 ( ; backlinks; similar)
“TrufLL: Learning Natural Language Generation from Scratch”, Et Al 2021
“TrufLL: Learning Natural Language Generation from Scratch”, 2021-09-20 (similar)
“Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration”, Et Al 2021
“Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration”, 2021-09-17 ( ; similar)
“Bootstrapped Meta-Learning”, Et Al 2021
“Bootstrapped Meta-Learning”, 2021-09-09 ( ; similar)
“Open-Ended Learning Leads to Generally Capable Agents”, Et Al 2021
“Open-Ended Learning Leads to Generally Capable Agents”, 2021-07-27 ( ; similar)
“Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs”, Et Al 2021
“Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs”, 2021-07-21 ( ; similar)
“Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability”, Et Al 2021
“Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability”, 2021-07-13 ( ; similar)
“Imitation-driven Cultural Collapse”, Duran-2021
“Imitation-driven Cultural Collapse”, 2021-07-12 ( ; backlinks; similar)
“Multi-task Curriculum Learning in a Complex, Visual, Hard-exploration Domain: Minecraft”, Et Al 2021
“Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft”, 2021-06-28 ( ; similar)
“Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Et Al 2021
“Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem”, 2021-06-03 ( ; backlinks; similar; bibliography)
“From Motor Control to Team Play in Simulated Humanoid Football”, Et Al 2021
“From Motor Control to Team Play in Simulated Humanoid Football”, 2021-05-25 ( ; similar; bibliography)
“Reward Is Enough”, Et Al 2021
“Reward is enough”, 2021-05-24 ( ; similar; bibliography)
“Intelligence and Unambitiousness Using Algorithmic Information Theory”, Et Al 2021
“Intelligence and Unambitiousness Using Algorithmic Information Theory”, 2021-05-13 ( ; similar)
“Principled Exploration via Optimistic Bootstrapping and Backward Induction”, Et Al 2021
“Principled Exploration via Optimistic Bootstrapping and Backward Induction”, 2021-05-13 (similar)
“Deep Bandits Show-Off: Simple and Efficient Exploration With Deep Networks”, 2021
“Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks”, 2021-05-10 (similar)
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Et Al 2021
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, 2021-05-04 ( ; similar)
“What Are Bayesian Neural Network Posteriors Really Like?”, Et Al 2021
“What Are Bayesian Neural Network Posteriors Really Like?”, 2021-04-29 ( ; similar)
“Epistemic Autonomy: Self-supervised Learning in the Mammalian Hippocampus”, Santos-Et Al 2021
“Epistemic Autonomy: Self-supervised Learning in the Mammalian Hippocampus”, 2021-04-24 ( ; similar)
“Bayesian Optimization Is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020”, Et Al 2021
“Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020”, 2021-04-20 ( ; similar)
“Flexible Modulation of Sequence Generation in the Entorhinal-hippocampal System”, Et Al 2021
“Flexible modulation of sequence generation in the entorhinal-hippocampal system”, 2021-04-12 ( ; similar)
“Reinforcement Learning, Bit by Bit”, Et Al 2021
“Reinforcement Learning, Bit by Bit”, 2021-03-06 (similar)
“Asymmetric Self-play for Automatic Goal Discovery in Robotic Manipulation”, OpenAI Et Al 2021
“Asymmetric self-play for automatic goal discovery in robotic manipulation”, 2021-03-05 ( ; similar)
“Informational Herding, Optimal Experimentation, and Contrarianism”, Et Al 2021
“Informational Herding, Optimal Experimentation, and Contrarianism”, 2021-02-25 ( ; similar)
“Go-Explore: First Return, Then Explore”, Et Al 2021
“Go-Explore: First return, then explore”, 2021-02-24 ( ; similar; bibliography)
“TacticZero: Learning to Prove Theorems from Scratch With Deep Reinforcement Learning”, Et Al 2021
“TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning”, 2021-02-19 ( ; backlinks; similar)
“Proof Artifact Co-training for Theorem Proving With Language Models”, Et Al 2021
“Proof Artifact Co-training for Theorem Proving with Language Models”, 2021-02-11 ( ; backlinks; similar)
“Is Pessimism Provably Efficient for Offline RL?”, Et Al 2020
“Is Pessimism Provably Efficient for Offline RL?”, 2020-12-30 (similar)
“Imitating Interactive Intelligence”, Et Al 2020
“Imitating Interactive Intelligence”, 2020-12-10 ( ; similar; bibliography)
“Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian”, Parker-Et Al 2020
“Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian”, 2020-11-12 (similar)
“Meta-trained Agents Implement Bayes-optimal Agents”, Et Al 2020
“Meta-trained agents implement Bayes-optimal agents”, 2020-10-21 ( ; similar)
“Learning Not to Learn: Nature versus Nurture in Silico”, 2020
“Learning not to learn: Nature versus nurture in silico”, 2020-10-09 ( ; backlinks; similar)
“The Child As Hacker”, Et Al 2020
“The Child as Hacker”, 2020-10-01 ( ; similar)
“The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom”, Et Al 2020
“The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom”, 2020-09-09 ( ; backlinks; similar)
“Assessing Game Balance With AlphaZero: Exploring Alternative Rule Sets in Chess”, Et Al 2020
“Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess”, 2020-09-09 ( ; similar)
“The Overfitted Brain: Dreams Evolved to Assist Generalization”, 2020
“The Overfitted Brain: Dreams evolved to assist generalization”, 2020-07-19 ( ; backlinks; similar)
“Exploration Strategies in Deep Reinforcement Learning”, 2020
“Exploration Strategies in Deep Reinforcement Learning”, 2020-06-07 (similar)
“Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search”, Et Al 2020
“Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search”, 2020-05-27 ( ; similar)
“Automatic Discovery of Interpretable Planning Strategies”, Et Al 2020
“Automatic Discovery of Interpretable Planning Strategies”, 2020-05-24 ( ; similar)
“Planning to Explore via Self-Supervised World Models”, Et Al 2020
“Planning to Explore via Self-Supervised World Models”, 2020-05-12 (similar)
“Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, Et Al 2020
“Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”, 2020-05-04 (backlinks; similar)
“First Return, Then Explore”, Et Al 2020
“First return, then explore”, 2020-04-27 ( ; similar)
“Approximate Exploitability: Learning a Best Response in Large Games”, Et Al 2020
“Approximate exploitability: Learning a best response in large games”, 2020-04-20 ( ; similar)
“Real World Games Look Like Spinning Tops”, Et Al 2020
“Real World Games Look Like Spinning Tops”, 2020-04-20 ( ; similar)
“Agent57: Outperforming the Human Atari Benchmark”, Et Al 2020
“Agent57: Outperforming the human Atari benchmark”, 2020-03-31 ( ; backlinks; similar; bibliography)
“Agent57: Outperforming the Atari Human Benchmark”, Et Al 2020
“Agent57: Outperforming the Atari Human Benchmark”, 2020-03-30 ( ; similar)
“Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and Their Solutions”, Et Al 2020
“Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions”, 2020-03-19 ( ; similar)
“Meta-learning Curiosity Algorithms”, Et Al 2020
“Meta-learning curiosity algorithms”, 2020-03-11 ( ; similar)
“Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey”, Et Al 2020
“Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey”, 2020-03-10 (similar)
“AutoML-Zero: Evolving Machine Learning Algorithms From Scratch”, Et Al 2020
“AutoML-Zero: Evolving Machine Learning Algorithms From Scratch”, 2020-03-06 ( ; similar)
“Never Give Up: Learning Directed Exploration Strategies”, Et Al 2020
“Never Give Up: Learning Directed Exploration Strategies”, 2020-02-14 (similar)
“Effective Diversity in Population Based Reinforcement Learning”, Parker-Et Al 2020
“Effective Diversity in Population Based Reinforcement Learning”, 2020-02-03 ( ; similar)
“Near-perfect Point-goal Navigation from 2.5 Billion Frames of Experience”, 2020
“Near-perfect point-goal navigation from 2.5 billion frames of experience”, 2020-01-21 ( ; backlinks; similar)
“Learning Human Objectives by Evaluating Hypothetical Behavior”, Et Al 2019
“Learning Human Objectives by Evaluating Hypothetical Behavior”, 2019-12-05 ( ; similar)
“Optimal Policies Tend to Seek Power”, Et Al 2019
“Optimal Policies Tend to Seek Power”, 2019-12-03 ( ; backlinks; similar)
“DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Et Al 2019
“DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, 2019-11-01 ( ; similar; bibliography)
“Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Et Al 2019
“Emergent Tool Use from Multi-Agent Interaction § Surprising behavior”, 2019-09-17 ( ; similar; bibliography)
“Emergent Tool Use From Multi-Agent Autocurricula”, Et Al 2019
“Emergent Tool Use From Multi-Agent Autocurricula”, 2019-09-17 ( ; similar)
“R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”, Et Al 2019
“R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”, 2019-09-03 ( ; similar)
“Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment”, Et Al 2019
“Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment”, 2019-08-06 (similar)
“A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment”, Et Al 2019
“A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment”, 2019-07-26 (similar)
“An Optimistic Perspective on Offline Reinforcement Learning”, Et Al 2019
“An Optimistic Perspective on Offline Reinforcement Learning”, 2019-07-10 (similar)
“Meta Reinforcement Learning”, 2019
“Meta Reinforcement Learning”, 2019-06-23 ( ; similar)
“Search on the Replay Buffer: Bridging Planning and Reinforcement Learning”, Et Al 2019
“Search on the Replay Buffer: Bridging Planning and Reinforcement Learning”, 2019-06-12 (similar)
“ICML 2019 Notes”, 2019
“ICML 2019 Notes”, 2019-06 ( ; similar; bibliography)
“Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, Et Al 2019
“Human-level performance in 3D multiplayer games with population-based reinforcement learning”, 2019-05-31 ( ; similar; bibliography)
“AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
“AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence”, 2019-05-27 ( ; similar)
“Reinforcement Learning, Fast and Slow”, Et Al 2019
“Reinforcement Learning, Fast and Slow”, 2019-05-16 ( ; similar)
“Meta Reinforcement Learning As Task Inference”, Et Al 2019
“Meta reinforcement learning as task inference”, 2019-05-15 ( ; similar)
“Meta-learning of Sequential Strategies”, Et Al 2019
“Meta-learning of Sequential Strategies”, 2019-05-08 ( ; similar)
“Π-IW: Deep Policies for Width-Based Planning in Pixel Domains”, Et Al 2019
“π-IW: Deep Policies for Width-Based Planning in Pixel Domains”, 2019-04-12 ( ; similar)
“Learning To Follow Directions in Street View”, Et Al 2019
“Learning To Follow Directions in Street View”, 2019-03-01 ( ; similar)
“A Generalized Framework for Population Based Training”, Et Al 2019
“A Generalized Framework for Population Based Training”, 2019-02-05 (similar)
“Go-Explore: a New Approach for Hard-Exploration Problems”, Et Al 2019
“Go-Explore: a New Approach for Hard-Exploration Problems”, 2019-01-30 (similar)
“Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions”, Et Al 2019
“Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions”, 2019-01-07 ( ; similar)
“V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing”, Et Al 2019
“V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing”, 2019-01-04 ( ; similar)
“Is the FDA Too Conservative or Too Aggressive?: A Bayesian Decision Analysis of Clinical Trial Design”, Et Al 2019
“Is the FDA too conservative or too aggressive?: A Bayesian decision analysis of clinical trial design”, 2019-01-04 ( ; similar)
“Machine-learning-guided Directed Evolution for Protein Engineering”, Et Al 2019
“Common Neural Code for Reward and Information Value”, 2019
“Common neural code for reward and information value”, 2019 ( ; similar)
“Enjoy It Again: Repeat Experiences Are Less Repetitive Than People Think”, 2019
“Enjoy it again: Repeat experiences are less repetitive than people think”, 2019 ( ; similar)
“The Bayesian Superorganism III: Externalized Memories Facilitate Distributed Sampling”, Et Al 2018
“The Bayesian Superorganism III: externalized memories facilitate distributed sampling”, 2018-12-21 ( ; similar)
“Evolutionary-Neural Hybrid Agents for Architecture Search”, Et Al 2018
“Evolutionary-Neural Hybrid Agents for Architecture Search”, 2018-12-21 (backlinks; similar)
“Exploration in the Wild”, Et Al 2018
“Exploration in the wild”, 2018-12-14 ( ; backlinks; similar)
“Off-Policy Deep Reinforcement Learning without Exploration”, Et Al 2018
“Off-Policy Deep Reinforcement Learning without Exploration”, 2018-12-07 (backlinks; similar)
“An Introduction to Deep Reinforcement Learning”, Francois-Et Al 2018
“An Introduction to Deep Reinforcement Learning”, 2018-11-30 ( ; similar)
“The Bayesian Superorganism I: Collective Probability Estimation”, Et Al 2018
“The Bayesian Superorganism I: collective probability estimation”, 2018-11-12 ( ; similar)
“Exploration by Random Network Distillation”, Et Al 2018
“Exploration by Random Network Distillation”, 2018-10-30 ( ; similar)
“Computational Noise in Reward-guided Learning Drives Behavioral Variability in Volatile Environments”, Et Al 2018
“Computational noise in reward-guided learning drives behavioral variability in volatile environments”, 2018-10-11 ( ; similar)
“RND: Large-Scale Study of Curiosity-Driven Learning”, Et Al 2018
“RND: Large-Scale Study of Curiosity-Driven Learning”, 2018-08-13 (backlinks; similar)
“Visual Reinforcement Learning With Imagined Goals”, Et Al 2018
“Visual Reinforcement Learning with Imagined Goals”, 2018-07-12 ( ; similar)
“Is Q-learning Provably Efficient?”, Et Al 2018
“Is Q-learning Provably Efficient?”, 2018-07-10 ( ; similar)
“Improving Width-based Planning With Compact Policies”, Et Al 2018
“Improving width-based planning with compact policies”, 2018-06-15 ( ; similar)
“Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory”, Et Al 2018
“Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory”, 2018-06-14 ( ; backlinks; similar)
“Re-evaluating Evaluation”, Et Al 2018
“Re-evaluating Evaluation”, 2018-06-07 ( ; similar)
“Mix&Match—Agent Curricula for Reinforcement Learning”, Et Al 2018
“Mix&Match—Agent Curricula for Reinforcement Learning”, 2018-06-05 ( ; similar)
“Observe and Look Further: Achieving Consistent Performance on Atari”, Et Al 2018
“Observe and Look Further: Achieving Consistent Performance on Atari”, 2018-05-29 (similar)
“Playing Hard Exploration Games by Watching YouTube”, Et Al 2018
“Playing hard exploration games by watching YouTube”, 2018-05-29 ( ; similar)
“Generalization and Search in Risky Environments”, Et Al 2018
“Generalization and search in risky environments”, 2018-05-14 (similar)
“Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution”, Et Al 2018
“Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution”, 2018-04-24 (backlinks; similar)
“Learning to Navigate in Cities Without a Map”, Et Al 2018
“Learning to Navigate in Cities Without a Map”, 2018-03-31 (similar)
“The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities”, Et Al 2018
“The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities”, 2018-03-09 ( ; backlinks; similar)
“Some Considerations on Learning to Explore via Meta-Reinforcement Learning”, Et Al 2018
“Some Considerations on Learning to Explore via Meta-Reinforcement Learning”, 2018-03-03 ( ; similar)
“One Big Net For Everything”, 2018
“One Big Net For Everything”, 2018-02-24 ( ; similar)
“Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari”, Et Al 2018
“Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari”, 2018-02-24 ( ; backlinks; similar; bibliography)
“Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration”, Et Al 2018
“Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration”, 2018-02-24 ( ; backlinks; similar)
“Learning to Search With MCTSnets”, Et Al 2018
“Learning to Search with MCTSnets”, 2018-02-13 ( ; similar)
“Learning and Querying Fast Generative Models for Reinforcement Learning”, Et Al 2018
“Learning and Querying Fast Generative Models for Reinforcement Learning”, 2018-02-08 (similar)
“Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning”, Et Al 2018
“Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning”, 2018-01-26 ( ; similar)
“Safe Exploration in Continuous Action Spaces”, Et Al 2018
“Safe Exploration in Continuous Action Spaces”, 2018-01-26 ( ; similar)
“Deep Reinforcement Fuzzing”, Et Al 2018
“Deep Reinforcement Fuzzing”, 2018-01-14 ( ; similar)
“Generalization Guides Human Exploration in Vast Decision Spaces”, Et Al 2018
“Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI”, Et Al 2018
“Innovation and Cumulative Culture through Tweaks and Leaps in Online Programming Contests”, Et Al 2018
“Innovation and cumulative culture through tweaks and leaps in online programming contests”, 2018 ( ; similar)
“Finding Competitive Network Architectures Within a Day Using UCT”, 2017
“Finding Competitive Network Architectures Within a Day Using UCT”, 2017-12-20 (backlinks; similar)
“A Flexible Approach to Automated RNN Architecture Generation”, Et Al 2017
“A Flexible Approach to Automated RNN Architecture Generation”, 2017-12-20 ( ; backlinks; similar)
“Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, Et Al 2017
“Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, 2017-12-18 ( ; similar; bibliography)
“Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents”, Et Al 2017
“Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents”, 2017-12-18 (similar)
“The Paradoxical Sustainability of Periodic Migration and Habitat Destruction”, 2017
“The paradoxical sustainability of periodic migration and habitat destruction”, 2017-11-29 (similar)
“Posterior Sampling for Large Scale Reinforcement Learning”, Et Al 2017
“Posterior Sampling for Large Scale Reinforcement Learning”, 2017-11-21 ( ; backlinks; similar)
“Policy Optimization by Genetic Distillation”, 2017
“Policy Optimization by Genetic Distillation”, 2017-11-03 ( ; backlinks; similar)
“Emergent Complexity via Multi-Agent Competition”, Et Al 2017
“Emergent Complexity via Multi-Agent Competition”, 2017-10-10 ( ; similar)
“An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits”, 2017
“An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits”, 2017-10-08 (similar)
“The Uncertainty Bellman Equation and Exploration”, Et Al 2017
“The Uncertainty Bellman Equation and Exploration”, 2017-09-15 (similar)
“Changing Their Tune: How Consumers’ Adoption of Online Streaming Affects Music Consumption and Discovery”, Et Al 2017
“Changing Their Tune: How Consumers’ Adoption of Online Streaming Affects Music Consumption and Discovery”, 2017-09-11 ( ; similar)
“A Rational Choice Framework for Collective Behavior”, 2017
“A Rational Choice Framework for Collective Behavior”, 2017-09 ( ; backlinks; similar)
“Imagination-Augmented Agents for Deep Reinforcement Learning”, Et Al 2017
“Imagination-Augmented Agents for Deep Reinforcement Learning”, 2017-07-19 (similar)
“Distral: Robust Multitask Reinforcement Learning”, Et Al 2017
“Distral: Robust Multitask Reinforcement Learning”, 2017-07-13 ( ; similar)
“The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously”, Et Al 2017
“The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously”, 2017-07-11 ( ; similar)
“Emergence of Locomotion Behaviours in Rich Environments”, Et Al 2017
“Emergence of Locomotion Behaviours in Rich Environments”, 2017-07-07 ( ; similar)
“Noisy Networks for Exploration”, Et Al 2017
“Noisy Networks for Exploration”, 2017-06-30 ( ; similar)
“CAN: Creative Adversarial Networks, Generating”Art” by Learning About Styles and Deviating from Style Norms”, Et Al 2017
“CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms”, 2017-06-21 ( ; backlinks; similar)
“Device Placement Optimization With Reinforcement Learning”, Et Al 2017
“Device Placement Optimization with Reinforcement Learning”, 2017-06-13 ( ; similar)
“Scalable Generalized Linear Bandits: Online Computation and Hashing”, Et Al 2017
“Scalable Generalized Linear Bandits: Online Computation and Hashing”, 2017-06-01 (similar)
“Ask the Right Questions: Active Question Reformulation With Reinforcement Learning”, Et Al 2017
“Ask the Right Questions: Active Question Reformulation with Reinforcement Learning”, 2017-05-22 ( ; backlinks; similar)
“DeepXplore: Automated Whitebox Testing of Deep Learning Systems”, Et Al 2017
“DeepXplore: Automated Whitebox Testing of Deep Learning Systems”, 2017-05-18 ( ; similar)
“Recurrent Environment Simulators”, Et Al 2017
“Recurrent Environment Simulators”, 2017-04-07 ( ; similar)
“Learned Optimizers That Scale and Generalize”, Et Al 2017
“Learned Optimizers that Scale and Generalize”, 2017-03-14 ( ; backlinks; similar)
“Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, Et Al 2017
“Evolution Strategies as a Scalable Alternative to Reinforcement Learning”, 2017-03-10 ( ; similar)
“Large-Scale Evolution of Image Classifiers”, Et Al 2017
“Large-Scale Evolution of Image Classifiers”, 2017-03-03 (backlinks; similar)
“CoDeepNEAT: Evolving Deep Neural Networks”, Et Al 2017
“CoDeepNEAT: Evolving Deep Neural Networks”, 2017-03-01 ( ; backlinks; similar)
“Neural Combinatorial Optimization With Reinforcement Learning”, Et Al 2017
“Neural Combinatorial Optimization with Reinforcement Learning”, 2017-02-17 ( ; backlinks; similar)
“Neural Data Filter for Bootstrapping Stochastic Gradient Descent”, Et Al 2017
“Neural Data Filter for Bootstrapping Stochastic Gradient Descent”, 2017-01-20 ( ; backlinks; similar)
“Search in Patchy Media: Exploitation-exploration Tradeoff”
“Towards Information-Seeking Agents”, Et Al 2016
“Towards Information-Seeking Agents”, 2016-12-08 ( ; backlinks; similar)
“Learning to Learn without Gradient Descent by Gradient Descent”, Et Al 2016
“Learning to Learn without Gradient Descent by Gradient Descent”, 2016-11-11 ( ; similar)
“Learning to Perform Physics Experiments via Deep Reinforcement Learning”, Et Al 2016
“Learning to Perform Physics Experiments via Deep Reinforcement Learning”, 2016-11-06 (similar)
“Neural Architecture Search With Reinforcement Learning”, 2016
“Neural Architecture Search with Reinforcement Learning”, 2016-11-05 ( ; similar)
“Combating Reinforcement Learning’s Sisyphean Curse With Intrinsic Fear”, Et Al 2016
“Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear”, 2016-11-03 ( ; similar)
“Bayesian Reinforcement Learning: A Survey”, Et Al 2016
“Bayesian Reinforcement Learning: A Survey”, 2016-09-14 ( ; backlinks; similar)
“Human Collective Intelligence As Distributed Bayesian Inference”, Et Al 2016
“Human collective intelligence as distributed Bayesian inference”, 2016-08-05 ( ; backlinks; similar)
“Universal Darwinism As a Process of Bayesian Inference”, 2016
“Universal Darwinism as a process of Bayesian inference”, 2016-06-25 ( ; backlinks; similar)
“Unifying Count-Based Exploration and Intrinsic Motivation”, Et Al 2016
“Unifying Count-Based Exploration and Intrinsic Motivation”, 2016-06-06 ( ; similar)
“Candy Japan’s New Box A/B Test”, 2016
“Candy Japan’s new box A/B test”, 2016-05-06 ( ; backlinks; similar; bibliography)
“D-TS: Double Thompson Sampling for Dueling Bandits”, 2016
“D-TS: Double Thompson Sampling for Dueling Bandits”, 2016-04-25 ( ; backlinks; similar)
“Deep Exploration via Bootstrapped DQN”, Et Al 2016
“Deep Exploration via Bootstrapped DQN”, 2016-02-15 (similar)
“Online Batch Selection for Faster Training of Neural Networks”, 2015
“Online Batch Selection for Faster Training of Neural Networks”, 2015-11-19 ( ; backlinks; similar)
“The Psychology and Neuroscience of Curiosity”, 2015
“The Psychology and Neuroscience of Curiosity”, 2015 ( ; backlinks; similar)
“What My Deep Model Doesn’t Know…”, 2015
“Thompson Sampling With the Online Bootstrap”, 2014
“Thompson sampling with the online bootstrap”, 2014-10-15 ( ; similar)
“On the Complexity of Best Arm Identification in Multi-Armed Bandit Models”, Et Al 2014
“On the Complexity of Best Arm Identification in Multi-Armed Bandit Models”, 2014-07-16 ( ; backlinks; similar)
“Freeze-Thaw Bayesian Optimization”, Et Al 2014
“Freeze-Thaw Bayesian Optimization”, 2014-06-16 ( ; backlinks; similar)
“Search for the Wreckage of Air France Flight AF 447”, Et Al 2014
“Search for the Wreckage of Air France Flight AF 447”, 2014-05-19 ( ; similar)
“(More) Efficient Reinforcement Learning via Posterior Sampling”, Et Al 2013
“(More) Efficient Reinforcement Learning via Posterior Sampling”, 2013-06-04 ( ; similar)
“PUCT: Continuous Upper Confidence Trees With Polynomial Exploration-Consistency”, Et Al 2013
“PUCT: Continuous Upper Confidence Trees with Polynomial Exploration-Consistency”, 2013 ( ; backlinks; similar)
“Experimental Design for Partially Observed Markov Decision Processes”, 2012
“Experimental design for Partially Observed Markov Decision Processes”, 2012-09-18 (similar)
“Learning Is Planning: near Bayes-optimal Reinforcement Learning via Monte-Carlo Tree Search”, 2012
“Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search”, 2012-02-14 ( ; similar)
“Abandoning Objectives: Evolution Through the Search for Novelty Alone”, 2011
“Abandoning Objectives: Evolution Through the Search for Novelty Alone”, 2011-06-01 (backlinks; similar)
“PILCO: A Model-Based and Data-Efficient Approach to Policy Search”, 2011
“PILCO: A Model-Based and Data-Efficient Approach to Policy Search”, 2011-06-01 ( ; backlinks; similar)
“Formal Theory of Creativity & Fun & Intrinsic Motivation (1990–2010)”, 2010
“The Epistemic Benefit of Transient Diversity”, 2009
“The Epistemic Benefit of Transient Diversity”, 2009-10-22 ( ; similar)
“Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players”, Et Al 2009
“Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players”, 2009-07-23 ( ; backlinks; similar; bibliography)
“Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes”, 2008
“Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes”, 2008-12-23 ( ; similar)
“Pure Exploration for Multi-Armed Bandit Problems”, Et Al 2008
“Pure Exploration for Multi-Armed Bandit Problems”, 2008-02-19 ( ; backlinks; similar)
“Towards Efficient Evolutionary Design of Autonomous Robots”, 2008
“Towards Efficient Evolutionary Design of Autonomous Robots”, 2008-01 (similar)
“Bayesian Adaptive Exploration”, 2003
“Bayesian Adaptive Exploration”, 2003 ( ; backlinks)
“NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002
“NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002-06-01 ( ; backlinks; similar)
“A Bayesian Framework for Reinforcement Learning”, 2000
“A Bayesian Framework for Reinforcement Learning”, 2000-06-28 ( ; backlinks; similar)
“Case Studies in Evolutionary Experimentation and Computation”, 2000
“Case studies in evolutionary experimentation and computation”, 2000-06-09 (similar)
“Efficient Progressive Sampling”, Et Al 1999b
“Efficient Progressive Sampling”, 1999-08-01 ( ; backlinks; similar)
“Interactions between Learning and Evolution”, 1992
“Interactions between Learning and Evolution”, 1992 ( ; backlinks; similar)
“Evolution Strategy: Nature’s Way of Optimization”, 1989
“Evolution Strategy: Nature's Way of Optimization”, 1989 (similar)
“Evolutionsstrategien”, 1977
“Evolutionsstrategien”, 1977
“Evolutionsstrategie: Optimierung Technischer Systeme Nach Prinzipien Der Biologischen Evolution”, 1973
“Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution”, 1973 ( ; backlinks; similar)
“Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too)”
“Reinforcement Learning With Prediction-Based Rewards”
“Conditions for Mathematical Equivalence of Stochastic Gradient Descent and Natural Selection”
Wikipedia
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2302.05981
: “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Shyam Sudhakaran, Miguel González-Duque, Claire Glanois, Matthias Freiberger, Elias Najarro, Sebastian Risi: -
https://arxiv.org/abs/2301.04104#deepmind
: “DreamerV3: Mastering Diverse Domains through World Models”, Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap: -
https://arxiv.org/abs/2209.01975
: “Vote-<em>K< / em>: Selective Annotation Makes Language Models Better Few-Shot Learners”, : -
https://arxiv.org/abs/2208.10291
: “Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space”, Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian: -
https://www.nature.com/articles/s41467-022-31918-9
: “Value-free Random Exploration Is Linked to Impulsivity”, Magda Dubois, Tobias U. Hauser: -
https://arxiv.org/abs/2206.11795#openai
: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, : -
https://arxiv.org/abs/2206.04114#google
: “Director: Deep Hierarchical Planning from Pixels”, Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel: -
https://openreview.net/forum?id=0ZbPmmB61g#google
: “Boosting Search Engines With Interactive Agents”, : -
https://arxiv.org/abs/2205.13320#google
: “Towards Learning Universal Hyperparameter Optimizers With Transformers”, : -
https://arxiv.org/abs/2204.05080#deepmind
: “Semantic Exploration from Language Abstractions and Pretrained Representations”, : -
https://arxiv.org/abs/2204.03514#facebook
: “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das: -
https://openreview.net/forum?id=bERaNdoegnO#deepmind
: “Policy Improvement by Planning With Gumbel”, Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver: -
https://arxiv.org/abs/2202.07415#deepmind
: “NeuPL: Neural Population Learning”, Siqi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel: -
https://arxiv.org/abs/2112.11701#tencent
: “Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination”, Rui Zhao, Jinming Song, Hu Haifeng, Yang Gao, Yi Wu, Zhongqian Sun, Yang Wei: -
https://arxiv.org/abs/2111.01587#deepmind
: “Procedural Generalization by Planning With Self-Supervised World Models”, : -
https://trajectory-transformer.github.io/
: “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Michael Janner, Qiyang Colin Li, Sergey Levine: -
https://arxiv.org/abs/2105.12196#deepmind
: “From Motor Control to Team Play in Simulated Humanoid Football”, : -
https://www.sciencedirect.com/science/article/pii/S0004370221000862#deepmind
: “Reward Is Enough”, David Silver, Satinder Singh, Doina Precup, Richard S. Sutton: -
2021-ecoffet.pdf#uber
: “Go-Explore: First Return, Then Explore”, Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune: -
https://arxiv.org/abs/2012.05672#deepmind
: “Imitating Interactive Intelligence”, : -
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
: “Agent57: Outperforming the Human Atari Benchmark”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell: -
https://arxiv.org/abs/1911.00357#facebook
: “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra: -
https://openai.com/blog/emergent-tool-use/#surprisingbehaviors
: “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch: -
https://david-abel.github.io/notes/icml_2019.pdf
: “ICML 2019 Notes”, David Abel: -
2019-jaderberg.pdf#deepmind
: “Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, : -
https://arxiv.org/abs/1802.08842
: “Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari”, Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter: -
https://arxiv.org/abs/1712.06567#uber
: “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, Jeff Clune: -
candy-japan
: “Candy Japan’s New Box A / B Test”, Gwern Branwen: -
https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2009.01030.x
: “Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players”, Merim Bilalić, Peter McLeod, Fernand Gobet: