SimpleStrat: Diversifying Language Model Generation with Stratification
Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Language Reward Modulation for Pretraining Reinforcement Learning
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery
Long-Term Value of Exploration: Measurements, Findings and Algorithms
Inducing anxiety in GPT-3.5 increases exploration and bias
Reflexion: Language Agents with Verbal Reinforcement Learning
MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
MarioGPT: Open-Ended Text2Level Generation through Large Language Models
AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong
Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans
In-context Reinforcement Learning with Algorithm Distillation
Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space
Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Towards Learning Universal Hyperparameter Optimizers with Transformers
Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments
Effective Mutation Rate Adaptation through Group Elite Selection
Semantic Exploration from Language Abstractions and Pretrained Representations
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
CLIP on Wheels (CoW): Zero-Shot Object Navigation as Object Localization and Exploration
VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning
Learning Causal Overhypotheses through Exploration in Children and Computational Models
Policy Learning and Evaluation with Randomized Quasi-Monte Carlo
LID: Pre-Trained Language Models for Interactive Decision-Making
Accelerated Quality-Diversity for Robotics through Massive Parallelism
Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)
Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination
Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots
Environment Generation for Zero-Shot Compositional Reinforcement Learning
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination
The geometry of decision-making in individuals and collectives
An Experimental Design Perspective on Model-Based Reinforcement Learning
JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning
Procedural Generalization by Planning with Self-Supervised World Models
Correspondence between neuroevolution and gradient descent
The structure of genotype-phenotype maps makes fitness landscapes navigable
A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning
Monkey Plays Pac-Man with Compositional Strategies and Hierarchical Decision-making
Algorithmic Balancing of Familiarity, Similarity, & Discovery in Music Recommendations
Is Curiosity All You Need? On the Utility of Emergent Behaviors from Curious Exploration
Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs
Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft
Planning for Novelty: Width-Based Algorithms for Common Problems in Control, Planning and Reinforcement Learning
Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem
From Motor Control to Team Play in Simulated Humanoid Football
Principled Exploration via Optimistic Bootstrapping and Backward Induction
Intelligence and Unambitiousness Using Algorithmic Information Theory
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
Epistemic Autonomy: Self-supervised Learning in the Mammalian Hippocampus
Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020
Flexible modulation of sequence generation in the entorhinal-hippocampal system
Asymmetric self-play for automatic goal discovery in robotic manipulation
Informational Herding, Optimal Experimentation, and Contrarianism
TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
Proof Artifact Co-training for Theorem Proving with Language Models
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
MAP-Elites Enables Powerful Stepping Stones and Diversity for Modular Robotics
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess
The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom
The Overfitted Brain: Dreams evolved to assist generalization
Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Approximate exploitability: Learning a best response in large games
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Effective Diversity in Population Based Reinforcement Learning
Near-perfect point-goal navigation from 2.5 billion frames of experience
microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination
Learning Human Objectives by Evaluating Hypothetical Behavior
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
Emergent Tool Use from Multi-Agent Interaction § Surprising behavior
R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment
A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment
An Optimistic Perspective on Offline Reinforcement Learning
Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Human-level performance in 3D multiplayer games with population-based reinforcement learning
AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors
π-IW: Deep Policies for Width-Based Planning in Pixel Domains
Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Is the FDA too conservative or too aggressive?: A Bayesian decision analysis of clinical trial design
Machine-Learning-Guided Directed Evolution for Protein Engineering
Enjoy it again: Repeat experiences are less repetitive than people think
The Bayesian Superorganism III: externalized memories facilitate distributed sampling
Off-Policy Deep Reinforcement Learning without Exploration
The Bayesian Superorganism I: collective probability estimation
Computational noise in reward-guided learning drives behavioral variability in volatile environments
Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory
Observe and Look Further: Achieving Consistent Performance on Atari
Toward Diverse Text Generation with Inverse Reinforcement Learning
Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Learning and Querying Fast Generative Models for Reinforcement Learning
Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning
Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI
Generalization Guides Human Exploration in Vast Decision Spaces
Innovation and cumulative culture through tweaks and leaps in online programming contests
A Flexible Approach to Automated RNN Architecture Generation
Finding Competitive Network Architectures Within a Day Using UCT
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
The paradoxical sustainability of periodic migration and habitat destruction
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits
Changing Their Tune: How Consumers’ Adoption of Online Streaming Affects Music Consumption and Discovery
Imagination-Augmented Agents for Deep Reinforcement Learning
The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously
CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms
Towards Synthesizing Complex Programs from Input-Output Examples
Scalable Generalized Linear Bandits: Online Computation and Hashing
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Neural Combinatorial Optimization with Reinforcement Learning
Neural Data Filter for Bootstrapping Stochastic Gradient Descent
Exploration and exploitation of Victorian science in Darwin’s reading notebooks
Learning to Learn without Gradient Descent by Gradient Descent
Learning to Perform Physics Experiments via Deep Reinforcement Learning
Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear
Human collective intelligence as distributed Bayesian inference
Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
Online Batch Selection for Faster Training of Neural Networks
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models
(More) Efficient Reinforcement Learning via Posterior Sampling
PUCT: Continuous Upper Confidence Trees with Polynomial Exploration-Consistency
(More) Efficient Reinforcement Learning via Posterior Sampling [PSRL]
Experimental design for Partially Observed Markov Decision Processes
Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
Abandoning Objectives: Evolution Through the Search for Novelty Alone
Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Formal Theory of Creativity & Fun & Intrinsic Motivation (1990–2010)
Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players
Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes
Exploiting Open-Endedness to Solve Problems Through the Search for Novelty
Towards Efficient Evolutionary Design of Autonomous Robots
ALPS: the age-layered population structure for reducing the problem of premature convergence
NEAT: Evolving Neural Networks through Augmenting Topologies
Case studies in evolutionary experimentation and computation
The Analysis of Sequential Experiments with Feedback to Subjects
Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution
Brian Christian on Computer Science Algorithms That Tackle Fundamental and Universal Problems
Goodhart’s Law, Diversity and a Series of Seemingly Unrelated Toy Problems
Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability [Blog]
An Experimental Design Perspective on Model-Based Reinforcement Learning [Blog]
Safety-First AI for Autonomous Data Center Cooling and Industrial Control
Why Testing Self-Driving Cars in SF Is Challenging but Necessary
Conditions for Mathematical Equivalence of Stochastic Gradient Descent and Natural Selection
Probable Points and Credible Intervals, Part 2: Decision Theory
Montezuma's Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too)
2023-hafner-figure1-dreamerv3outperformsbaselinesinsampleefficiencyonmanytasks.png
2023-hafner-figure6-dreamerv3scaleswellinbothdatarepeatsandmodelsize.png
2023-mehrotra-figure8-abtestofhighlightingunpopularartistsonspotifyincreasingtheirpercentilepopularity.jpg
2022-ramrakhya-figure1b-habitatobjectnavlogscalinginhumandemonstrationdata.jpg
2022-ramrakhya-figure5-scalingcurvesofimitationlearningvsreinforcementlearningonhabitat.jpg
2022-ramrakhya-figure7-scalingcurvesofimitationlearningonpickandplace.jpg
2021-mehrotra-figure3-highlightingunpopularartistsonspotifyincreasestheirpopularity.jpg
2020-interactiveagentsgroup-figure15-scalingandtransfer.jpg
2019-jaderberg-figure2-agentarchitectureandbenchmarking.jpg
2019-jaderberg-figure3-knowledgerepresentationtsneandbehavior.jpg
2019-jaderberg-figure4-progressionofagentduringtraining.jpg
2018-such-table1-geneticalgorithmsvsdqnar3crandomsearchevolutionstrategiesonatariale.png
2015-gomezuribe-figure4-effectivecatalogsizeofnetflixbydefaultvspersonalizedratings.jpg
http://vision.psych.umn.edu/groups/schraterlab/dearden98bayesian.pdf
https://deepmind.google/discover/blog/capture-the-flag-the-emergence-of-complex-cooperative-agents/-the-emergence-of-complex-cooperative-agents
https://engineeringideas.substack.com/p/review-of-why-greatness-cannot-be
https://nathanieltravis.com/2022/01/17/is-human-behavior-just-elaborate-running-and-tumbling/
https://openai.com/blog/learning-montezumas-revenge-from-a-single-demonstration/
https://patentimages.storage.googleapis.com/57/53/22/91b8a6792dbb1e/US20180204116A1.pdf#deepmind
https://people.idsia.ch/~juergen/FKI-126-90_(revised)bw_ocr.pdf
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/f52ac33bc9d1adecd3a8037a7009b185fd934f0e.pdf
https://tor-lattimore.com/downloads/book/book.pdf#page=412
https://www.freaktakes.com/p/the-past-and-present-of-computer
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
https://www.nature.com/articles/s41467-020-19244-4#deepmind
https://www.quantamagazine.org/clever-machines-learn-how-to-be-curious-20170919/
https://www.quantamagazine.org/random-search-wired-into-animals-may-help-them-hunt-20200611/
https://www.reddit.com/r/MachineLearning/comments/a0nnp7/r_montezumas_revenge_solved_by_goexplore_a_new/
Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2310.03882%2523deepmind.html
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html
Supervised Pretraining Can Learn In-Context Reinforcement Learning
MarioGPT: Open-Ended Text2Level Generation through Large Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2301.04104%2523deepmind.html
Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners
Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-022-31918-9.html
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2206.04114%2523google.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html
Towards Learning Universal Hyperparameter Optimizers with Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2205.13320%2523google.html
Semantic Exploration from Language Abstractions and Pretrained Representations
https%253A%252F%252Farxiv.org%252Fabs%252F2204.05080%2523deepmind.html
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
https%253A%252F%252Farxiv.org%252Fabs%252F2204.03514%2523facebook.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DbERaNdoegnO%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.05008%2523google.html
Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination
https%253A%252F%252Farxiv.org%252Fabs%252F2112.11701%2523tencent.html
Procedural Generalization by Planning with Self-Supervised World Models
https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html
Algorithmic Balancing of Familiarity, Similarity, & Discovery in Music Recommendations
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2021-mehrotra.pdf%2523spotify.html
Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem
https%253A%252F%252Ftrajectory-transformer.github.io%252F.html
From Motor Control to Team Play in Simulated Humanoid Football
https%253A%252F%252Farxiv.org%252Fabs%252F2105.12196%2523deepmind.html
https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html
Jeff Clune—Professor—Computer Science—University of British Columbia
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2021-ecoffet.pdf%2523uber.html
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
https%253A%252F%252Farxiv.org%252Fabs%252F2012.05672%2523deepmind.html
Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess
https%253A%252F%252Farxiv.org%252Fabs%252F2009.04374%2523deepmind.html
https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
https%253A%252F%252Farxiv.org%252Fabs%252F1911.00357%2523facebook.html
Emergent Tool Use from Multi-Agent Interaction § Surprising behavior
https%253A%252F%252Fopenai.com%252Fresearch%252Femergent-tool-use%2523surprisingbehaviors.html
https%253A%252F%252Fdavid-abel.github.io%252Fnotes%252Ficml_2019.pdf.html
Human-level performance in 3D multiplayer games with population-based reinforcement learning
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2019-jaderberg.pdf%2523deepmind.html
Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs42003-018-0078-7.html
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F1712.06567%2523uber.html
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2015-gomezuribe.pdf.html
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2010-schmidt.pdf.html
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2010-silver.pdf.html
Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players
https%253A%252F%252Fonlinelibrary.wiley.com%252Fdoi%252F10.1111%252Fj.1551-6709.2009.01030.x.html
ALPS: the age-layered population structure for reducing the problem of premature convergence
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2006-hornby.pdf.html
Wikipedia Bibliography: