Bibliography:

  1. ‘RL’ tag

  2. ‘data pruning’ tag

  3. ‘active learning’ tag

  4. ‘NN sampling’ tag

  5. ‘novelty U-curve’ tag

  6. ‘hidden-information game’ tag

  7. ‘MARL’ tag

  8. Nethack AI’ tag

  9. ‘robotics’ tag

  10. Benchmarking LLM Diversity & Creativity

  11. Can You Unsort Lists for Diversity?

  12. Number Search Engine via NN Embeddings

  13. Novelty Nets: Classifier Anti-Guidance

  14. Free-Play Periods for RL Agents

  15. Candy Japan’s new box A/B test

  16. SimpleStrat: Diversifying Language Model Generation with Stratification

  17. Learning Formal Mathematics From Intrinsic Motivation

  18. Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models

  19. Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

  20. Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games

  21. Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

  22. QDAIF: Quality-Diversity through AI Feedback

  23. Beyond Memorization: Violating Privacy Via Inference with Large Language Models

  24. Let Models Speak Ciphers: Multiagent Debate through Embeddings

  25. Small batch deep reinforcement learning

  26. Language Reward Modulation for Pretraining Reinforcement Learning

  27. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  28. Supervised Pretraining Can Learn In-Context Reinforcement Learning

  29. Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

  30. You And Your Research

  31. Long-Term Value of Exploration: Measurements, Findings and Algorithms

  32. Inducing anxiety in GPT-3.5 increases exploration and bias

  33. Reflexion: Language Agents with Verbal Reinforcement Learning

  34. MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

  35. MarioGPT: Open-Ended Text2Level Generation through Large Language Models

  36. DreamerV3: Mastering Diverse Domains through World Models

  37. AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

  38. Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans

  39. Curiosity in hindsight

  40. In-context Reinforcement Learning with Algorithm Distillation

  41. E3B: Exploration via Elliptical Episodic Bonuses

  42. Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners

  43. LGE: Cell-Free Latent Go-Explore

  44. A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning

  45. Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space

  46. Value-free random exploration is linked to impulsivity

  47. Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

  48. The cost of information acquisition by natural selection

  49. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

  50. BYOL-Explore: Exploration by Bootstrapped Prediction

  51. Multi-Objective Hyperparameter Optimization—An Overview

  52. Director: Deep Hierarchical Planning from Pixels

  53. Boosting Search Engines with Interactive Agents

  54. Towards Learning Universal Hyperparameter Optimizers with Transformers

  55. Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments

  56. Effective Mutation Rate Adaptation through Group Elite Selection

  57. Semantic Exploration from Language Abstractions and Pretrained Representations

  58. Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

  59. CLIP on Wheels (CoW): Zero-Shot Object Navigation as Object Localization and Exploration

  60. Policy improvement by planning with Gumbel

  61. Evolving Curricula with Regret-Based Environment Design

  62. VAPO: Affordance Learning from Play for Sample-Efficient Policy Learning

  63. Learning Causal Overhypotheses through Exploration in Children and Computational Models

  64. Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

  65. NeuPL: Neural Population Learning

  66. ODT: Online Decision Transformer

  67. EvoJAX: Hardware-Accelerated Neuroevolution

  68. LID: Pre-Trained Language Models for Interactive Decision-Making

  69. Accelerated Quality-Diversity for Robotics through Massive Parallelism

  70. Rotting Infinitely Many-armed Bandits

  71. Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)

  72. Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

  73. Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots

  74. Environment Generation for Zero-Shot Compositional Reinforcement Learning

  75. Safe Deep RL in 3D Environments using Human Feedback

  76. Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

  77. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

  78. The costs and benefits of dispersal in small populations

  79. The geometry of decision-making in individuals and collectives

  80. An Experimental Design Perspective on Model-Based Reinforcement Learning

  81. JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

  82. Procedural Generalization by Planning with Self-Supervised World Models

  83. Correspondence between neuroevolution and gradient descent

  84. URLB: Unsupervised Reinforcement Learning Benchmark

  85. Mastering Atari Games with Limited Data

  86. Discovering and Achieving Goals via World Models

  87. The structure of genotype-phenotype maps makes fitness landscapes navigable

  88. Replay-Guided Adversarial Environment Design

  89. A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

  90. Monkey Plays Pac-Man with Compositional Strategies and Hierarchical Decision-making

  91. Neural autopilot and context-sensitivity of habits

  92. Algorithmic Balancing of Familiarity, Similarity, & Discovery in Music Recommendations

  93. TrufLL: Learning Natural Language Generation from Scratch

  94. Is Curiosity All You Need? On the Utility of Emergent Behaviors from Curious Exploration

  95. Bootstrapped Meta-Learning

  96. Open-Ended Learning Leads to Generally Capable Agents

  97. Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs

  98. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

  99. Imitation-driven Cultural Collapse

  100. Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft

  101. Learning to hesitate

  102. Planning for Novelty: Width-Based Algorithms for Common Problems in Control, Planning and Reinforcement Learning

  103. Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem

  104. From Motor Control to Team Play in Simulated Humanoid Football

  105. Reward is enough

  106. Principled Exploration via Optimistic Bootstrapping and Backward Induction

  107. Intelligence and Unambitiousness Using Algorithmic Information Theory

  108. Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks

  109. On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

  110. What Are Bayesian Neural Network Posteriors Really Like?

  111. Epistemic Autonomy: Self-supervised Learning in the Mammalian Hippocampus

  112. Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020

  113. Flexible modulation of sequence generation in the entorhinal-hippocampal system

  114. Reinforcement Learning, Bit by Bit

  115. Asymmetric self-play for automatic goal discovery in robotic manipulation

  116. Informational Herding, Optimal Experimentation, and Contrarianism

  117. Go-Explore: First return, then explore

  118. TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

  119. Proof Artifact Co-training for Theorem Proving with Language Models

  120. The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

  121. Curriculum Learning: A Survey

  122. MAP-Elites Enables Powerful Stepping Stones and Diversity for Modular Robotics

  123. Is Pessimism Provably Efficient for Offline RL?

  124. Monte-Carlo Graph Search for AlphaZero

  125. Imitating Interactive Intelligence

  126. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

  127. Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

  128. Meta-trained agents implement Bayes-optimal agents

  129. Learning not to learn: Nature versus nurture in silico

  130. The Child as Hacker

  131. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess

  132. The Temporal Dynamics of Opportunity Costs: A Normative Account of Cognitive Fatigue and Boredom

  133. The Overfitted Brain: Dreams evolved to assist generalization

  134. The NetHack Learning Environment

  135. Exploration Strategies in Deep Reinforcement Learning

  136. Synthetic Petri Dish: A Novel Surrogate Model for Rapid Architecture Search

  137. Automatic Discovery of Interpretable Planning Strategies

  138. IJON: Exploring Deep State Spaces via Fuzzing

  139. Planning to Explore via Self-Supervised World Models

  140. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

  141. Pitfalls of learning a reward function online

  142. First return, then explore

  143. Real World Games Look Like Spinning Tops

  144. Approximate exploitability: Learning a best response in large games

  145. Agent57: Outperforming the human Atari benchmark

  146. Agent57: Outperforming the Atari Human Benchmark

  147. Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

  148. Meta-learning curiosity algorithms

  149. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

  150. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

  151. Never Give Up: Learning Directed Exploration Strategies

  152. Effective Diversity in Population Based Reinforcement Learning

  153. Near-perfect point-goal navigation from 2.5 billion frames of experience

  154. microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination

  155. Learning Human Objectives by Evaluating Hypothetical Behavior

  156. Optimal Policies Tend to Seek Power

  157. DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

  158. Emergent Tool Use From Multi-Agent Autocurricula

  159. Emergent Tool Use from Multi-Agent Interaction § Surprising behavior

  160. R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

  161. Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

  162. A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

  163. An Optimistic Perspective on Offline Reinforcement Learning

  164. Meta Reinforcement Learning

  165. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

  166. ICML 2019 Notes

  167. Human-level performance in 3D multiplayer games with population-based reinforcement learning

  168. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

  169. Learning to Reason in Large Theories without Imitation

  170. Reinforcement Learning, Fast and Slow

  171. Meta reinforcement learning as task inference

  172. Meta-learning of Sequential Strategies

  173. The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors

  174. π-IW: Deep Policies for Width-Based Planning in Pixel Domains

  175. Learning To Follow Directions in Street View

  176. A Generalized Framework for Population Based Training

  177. Go-Explore: a New Approach for Hard-Exploration Problems

  178. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

  179. Is the FDA too conservative or too aggressive?: A Bayesian decision analysis of clinical trial design

  180. V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing

  181. Common neural code for reward and information value

  182. Machine-Learning-Guided Directed Evolution for Protein Engineering

  183. Enjoy it again: Repeat experiences are less repetitive than people think

  184. Evolutionary-Neural Hybrid Agents for Architecture Search

  185. The Bayesian Superorganism III: externalized memories facilitate distributed sampling

  186. Exploration in the wild

  187. Off-Policy Deep Reinforcement Learning without Exploration

  188. An Introduction to Deep Reinforcement Learning

  189. The Bayesian Superorganism I: collective probability estimation

  190. Exploration by Random Network Distillation

  191. Computational noise in reward-guided learning drives behavioral variability in volatile environments

  192. RND: Large-Scale Study of Curiosity-Driven Learning

  193. Visual Reinforcement Learning with Imagined Goals

  194. Is Q-learning Provably Efficient?

  195. Improving width-based planning with compact policies

  196. Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory

  197. Re-evaluating Evaluation

  198. DVRL: Deep Variational Reinforcement Learning for POMDPs

  199. Mix&Match—Agent Curricula for Reinforcement Learning

  200. Playing hard exploration games by watching YouTube

  201. Observe and Look Further: Achieving Consistent Performance on Atari

  202. Generalization and search in risky environments

  203. Toward Diverse Text Generation with Inverse Reinforcement Learning

  204. Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

  205. Learning to Navigate in Cities Without a Map

  206. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities

  207. Some Considerations on Learning to Explore via Meta-Reinforcement Learning

  208. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

  209. Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

  210. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

  211. One Big Net For Everything

  212. Learning to Search with MCTSnets

  213. Learning and Querying Fast Generative Models for Reinforcement Learning

  214. Safe Exploration in Continuous Action Spaces

  215. Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning

  216. Deep Reinforcement Fuzzing

  217. Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI

  218. Generalization Guides Human Exploration in Vast Decision Spaces

  219. Innovation and cumulative culture through tweaks and leaps in online programming contests

  220. A Flexible Approach to Automated RNN Architecture Generation

  221. Finding Competitive Network Architectures Within a Day Using UCT

  222. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents

  223. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

  224. The paradoxical sustainability of periodic migration and habitat destruction

  225. Posterior Sampling for Large Scale Reinforcement Learning

  226. Policy Optimization by Genetic Distillation

  227. Emergent Complexity via Multi-Agent Competition

  228. An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

  229. The Uncertainty Bellman Equation and Exploration

  230. Changing Their Tune: How Consumers’ Adoption of Online Streaming Affects Music Consumption and Discovery

  231. A Rational Choice Framework for Collective Behavior

  232. Imagination-Augmented Agents for Deep Reinforcement Learning

  233. Distral: Robust Multitask Reinforcement Learning

  234. The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

  235. Emergence of Locomotion behaviors in Rich Environments

  236. Noisy Networks for Exploration

  237. CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms

  238. Device Placement Optimization with Reinforcement Learning

  239. Towards Synthesizing Complex Programs from Input-Output Examples

  240. Scalable Generalized Linear Bandits: Online Computation and Hashing

  241. DeepXplore: Automated Whitebox Testing of Deep Learning Systems

  242. Recurrent Environment Simulators

  243. Learned Optimizers that Scale and Generalize

  244. Evolution Strategies as a Scalable Alternative to Reinforcement Learning

  245. Large-Scale Evolution of Image Classifiers

  246. CoDeepNEAT: Evolving Deep Neural Networks

  247. Rotting Bandits

  248. Neural Combinatorial Optimization with Reinforcement Learning

  249. Neural Data Filter for Bootstrapping Stochastic Gradient Descent

  250. Search in Patchy Media: Exploitation-Exploration Tradeoff

  251. Towards Information-Seeking Agents

  252. Exploration and exploitation of Victorian science in Darwin’s reading notebooks

  253. Learning to Learn without Gradient Descent by Gradient Descent

  254. Learning to Perform Physics Experiments via Deep Reinforcement Learning

  255. Neural Architecture Search with Reinforcement Learning

  256. Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

  257. Bayesian Reinforcement Learning: A Survey

  258. Human collective intelligence as distributed Bayesian inference

  259. Universal Darwinism as a process of Bayesian inference

  260. Unifying Count-Based Exploration and Intrinsic Motivation

  261. D-TS: Double Thompson Sampling for Dueling Bandits

  262. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

  263. Deep Exploration via Bootstrapped DQN

  264. The Netflix Recommender System

  265. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

  266. Online Batch Selection for Faster Training of Neural Networks

  267. MAP-Elites: Illuminating search spaces by mapping elites

  268. What My Deep Model Doesn't Know...

  269. The Psychology and Neuroscience of Curiosity

  270. Thompson sampling with the online bootstrap

  271. On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

  272. Robots that can adapt like animals

  273. Freeze-Thaw Bayesian Optimization

  274. Search for the Wreckage of Air France Flight AF 447

  275. (More) Efficient Reinforcement Learning via Posterior Sampling

  276. Model-Based Bayesian Exploration

  277. PUCT: Continuous Upper Confidence Trees with Polynomial Exploration-Consistency

  278. (More) Efficient Reinforcement Learning via Posterior Sampling [PSRL]

  279. 629d1cdd4ed81ada32f4b9638501d15d3d891556.pdf#deepmind

  280. Experimental design for Partially Observed Markov Decision Processes

  281. Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

  282. PILCO: A Model-Based and Data-Efficient Approach to Policy Search

  283. Abandoning Objectives: Evolution Through the Search for Novelty Alone

  284. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

  285. Age-fitness pareto optimization

  286. Monte-Carlo Planning in Large POMDPs

  287. Formal Theory of Creativity & Fun & Intrinsic Motivation (1990–2010)

  288. The Epistemic Benefit of Transient Diversity

  289. Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players

  290. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes

  291. Pure Exploration for Multi-Armed Bandit Problems

  292. Exploiting Open-Endedness to Solve Problems Through the Search for Novelty

  293. Towards Efficient Evolutionary Design of Autonomous Robots

  294. Resilient Machines Through Continuous Self-Modeling

  295. ALPS: the age-layered population structure for reducing the problem of premature convergence

  296. Bayesian Adaptive Exploration

  297. NEAT: Evolving Neural Networks through Augmenting Topologies

  298. A Bayesian Framework for Reinforcement Learning

  299. Case studies in evolutionary experimentation and computation

  300. Efficient Progressive Sampling

  301. Evolving 3D Morphology and Behavior by Competition

  302. b5f0d7b48d78560834dc391504a8c36149d6e802.pdf

  303. Interactions between Learning and Evolution

  304. Evolution Strategy: Nature’s Way of Optimization

  305. The Analysis of Sequential Experiments with Feedback to Subjects

  306. Evolutionsstrategien

  307. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution

  308. The Usefulness of Useless Knowledge

  309. Curiosity Killed the Mario

  310. Brian Christian on Computer Science Algorithms That Tackle Fundamental and Universal Problems

  311. e38fdd2e3d19fef222bb1222fbe63da6a0fe9ea6.html

  312. Solving Zelda With the Antithesis SDK

  313. 974bd67fde19017d4a9c5879b69275021e18cdd5.html

  314. Goodhart’s Law, Diversity and a Series of Seemingly Unrelated Toy Problems

  315. 4ed8cac77930aa2bfd9d9d87fc0c6320d627666f.html

  316. Why Generalization in RL Is Difficult: Epistemic POMDPs and Implicit Partial Observability [Blog]

  317. Bayesian Optimization Book

  318. Temporal Difference Learning and TD-Gammon

  319. An Experimental Design Perspective on Model-Based Reinforcement Learning [Blog]

  320. f081a958085e5690489c18884fa913a7190d0688.html

  321. Safety-First AI for Autonomous Data Center Cooling and Industrial Control

  322. Pulling JPEGs out of Thin Air

  323. d7117e8daef1d2b88752309178476d7d627b8908.html

  324. Curriculum For Reinforcement Learning

  325. 82f88b3c03ad0b252a33823c09b770832a95bbcc.html#openai

  326. Why Testing Self-Driving Cars in SF Is Challenging but Necessary

  327. Reinforcement Learning With Prediction-Based Rewards

  328. Prompting Diverse Ideas: Increasing AI Idea Variance

  329. You Need a Novelty Budget

  330. ChatGPT As Muse, Not Oracle

  331. Conditions for Mathematical Equivalence of Stochastic Gradient Descent and Natural Selection

  332. Probable Points and Credible Intervals, Part 2: Decision Theory

  333. AI Is Learning How to Create Itself

  334. 2874fc1c58ced20eb47a85ca46af67972fcf2ab4.html

  335. Montezuma's Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too)

  336. Monkeys Play Pac-Man

  337. Playing Montezuma's Revenge With Intrinsic Motivation

  338. design#future-tag-features

    [Transclude the forward-link's context]

  339. 2023-hafner-figure1-dreamerv3outperformsbaselinesinsampleefficiencyonmanytasks.png

  340. 2023-hafner-figure6-dreamerv3scaleswellinbothdatarepeatsandmodelsize.png

  341. 2023-mehrotra-figure8-abtestofhighlightingunpopularartistsonspotifyincreasingtheirpercentilepopularity.jpg

  342. 2022-ramrakhya-figure1b-habitatobjectnavlogscalinginhumandemonstrationdata.jpg

  343. 2022-ramrakhya-figure5-scalingcurvesofimitationlearningvsreinforcementlearningonhabitat.jpg

  344. 2022-ramrakhya-figure7-scalingcurvesofimitationlearningonpickandplace.jpg

  345. 2021-mehrotra-figure3-highlightingunpopularartistsonspotifyincreasestheirpopularity.jpg

  346. 2020-interactiveagentsgroup-figure15-scalingandtransfer.jpg

  347. 2019-jaderberg-figure1-ctftaskandtraining.jpg

  348. 2019-jaderberg-figure2-agentarchitectureandbenchmarking.jpg

  349. 2019-jaderberg-figure3-knowledgerepresentationtsneandbehavior.jpg

  350. 2019-jaderberg-figure4-progressionofagentduringtraining.jpg

  351. 2018-such-table1-geneticalgorithmsvsdqnar3crandomsearchevolutionstrategiesonatariale.png

  352. 2015-gomezuribe-figure4-effectivecatalogsizeofnetflixbydefaultvspersonalizedratings.jpg

  353. http://bayesiandeeplearning.org/2017/papers/57.pdf

  354. 9542903facdeee8a78641311471be0bc6fb2a482.pdf

  355. http://incompleteideas.net/book/ebook/node45.html

  356. 528eb890c177e0a5d26694aea7131d449ca9022e.html

  357. http://vision.psych.umn.edu/groups/schraterlab/dearden98bayesian.pdf

  358. https://deepmind.google/discover/blog/capture-the-flag-the-emergence-of-complex-cooperative-agents/-the-emergence-of-complex-cooperative-agents

  359. https://engineeringideas.substack.com/p/review-of-why-greatness-cannot-be

  360. https://jsomers.net/e-coli-chemotaxis/

  361. https://nathanieltravis.com/2022/01/17/is-human-behavior-just-elaborate-running-and-tumbling/

  362. https://onlinelibrary.wiley.com/doi/full/10.1002/bdm.2360

  363. https://openai.com/blog/learning-montezumas-revenge-from-a-single-demonstration/

  364. https://openai.com/research/vpt

  365. https://patentimages.storage.googleapis.com/57/53/22/91b8a6792dbb1e/US20180204116A1.pdf#deepmind

  366. https://people.idsia.ch/~juergen/FKI-126-90_(revised)bw_ocr.pdf

  367. 666a90978bf0db98cb2afd8edf854870779a6f0c.pdf

  368. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/f52ac33bc9d1adecd3a8037a7009b185fd934f0e.pdf

  369. 75839823347df79e49d65c3912f17f35e466a648.pdf

  370. https://tor-lattimore.com/downloads/book/book.pdf#page=412

  371. 762d4ee657af5f8ac4c3f5096fac3c5ba87c71f0.pdf#page=412

  372. https://www.freaktakes.com/p/the-past-and-present-of-computer

  373. https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

  374. https://www.nature.com/articles/s41467-020-19244-4#deepmind

  375. https://www.nature.com/articles/s41534-019-0241-0

  376. https://www.quantamagazine.org/clever-machines-learn-how-to-be-curious-20170919/

  377. f68e4e5dc1dd15828679d7cabe5b568097d1264e.html

  378. https://www.quantamagazine.org/random-search-wired-into-animals-may-help-them-hunt-20200611/

  379. 4ba297594338ba879254f1e055fd62c491eca45f.html

  380. https://www.reddit.com/r/MachineLearning/comments/a0nnp7/r_montezumas_revenge_solved_by_goexplore_a_new/

  381. df2f27d90614f7333aae0b60f9acbe4e34cfa8cd.html

  382. https://www.youtube.com/watch?v=DcYLT37ImBY

  383. https://x.com/Aella_Girl/status/1594144435000213505

  384. https://x.com/Omorfiamorphism/status/1564633854119477257

  385. https://x.com/kenneth0stanley/status/1733571230803058920

  386. https://x.com/pirroh/status/1694516986561307022

  387. Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models

  388. Jeff Clune—Professor—Computer Science—University of British Columbia

  389. https%253A%252F%252Farxiv.org%252Fabs%252F2405.15143.html

  390. Small batch deep reinforcement learning

  391. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03882%2523deepmind.html

  392. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  393. https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html

  394. Supervised Pretraining Can Learn In-Context Reinforcement Learning

  395. https%253A%252F%252Farxiv.org%252Fabs%252F2306.14892.html

  396. You And Your Research

  397. %252Fdoc%252Fscience%252F1986-hamming.html

  398. MarioGPT: Open-Ended Text2Level Generation through Large Language Models

  399. https%253A%252F%252Farxiv.org%252Fabs%252F2302.05981.html

  400. DreamerV3: Mastering Diverse Domains through World Models

  401. https%253A%252F%252Farxiv.org%252Fabs%252F2301.04104%2523deepmind.html

  402. Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners

  403. Luke Zettlemoyer

  404. https%253A%252F%252Farxiv.org%252Fabs%252F2209.01975.html

  405. Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space

  406. https%253A%252F%252Farxiv.org%252Fabs%252F2208.10291.html

  407. Value-free random exploration is linked to impulsivity

  408. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-022-31918-9.html

  409. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

  410. Jeff Clune—Professor—Computer Science—University of British Columbia

  411. https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html

  412. Director: Deep Hierarchical Planning from Pixels

  413. https%253A%252F%252Farxiv.org%252Fabs%252F2206.04114%2523google.html

  414. Boosting Search Engines with Interactive Agents

  415. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html

  416. Towards Learning Universal Hyperparameter Optimizers with Transformers

  417. https%253A%252F%252Farxiv.org%252Fabs%252F2205.13320%2523google.html

  418. Semantic Exploration from Language Abstractions and Pretrained Representations

  419. Language Understanding Grounded in Perception and Action

  420. https%253A%252F%252Farxiv.org%252Fabs%252F2204.05080%2523deepmind.html

  421. Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

  422. https%253A%252F%252Farxiv.org%252Fabs%252F2204.03514%2523facebook.html

  423. Policy improvement by planning with Gumbel

  424. Julian Schrittwieser

  425. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DbERaNdoegnO%2523deepmind.html

  426. NeuPL: Neural Population Learning

  427. Nicolas Heess

  428. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html

  429. EvoJAX: Hardware-Accelerated Neuroevolution

  430. https%253A%252F%252Farxiv.org%252Fabs%252F2202.05008%2523google.html

  431. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

  432. https%253A%252F%252Farxiv.org%252Fabs%252F2112.11701%2523tencent.html

  433. Procedural Generalization by Planning with Self-Supervised World Models

  434. Julian Schrittwieser

  435. Sherjil Ozair

  436. https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html

  437. Mastering Atari Games with Limited Data

  438. https%253A%252F%252Farxiv.org%252Fabs%252F2111.00210.html

  439. Algorithmic Balancing of Familiarity, Similarity, & Discovery in Music Recommendations

  440. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2021-mehrotra.pdf%2523spotify.html

  441. Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem

  442. Sergey Levine

  443. https%253A%252F%252Ftrajectory-transformer.github.io%252F.html

  444. From Motor Control to Team Play in Simulated Humanoid Football

  445. Guy Lever

  446. Nicolas Heess

  447. https%253A%252F%252Farxiv.org%252Fabs%252F2105.12196%2523deepmind.html

  448. Reward is enough

  449. https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html

  450. Go-Explore: First return, then explore

  451. Jeff Clune—Professor—Computer Science—University of British Columbia

  452. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2021-ecoffet.pdf%2523uber.html

  453. The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

  454. John Schulman’s Homepage

  455. https%253A%252F%252Farxiv.org%252Fabs%252F2101.11071.html

  456. Imitating Interactive Intelligence

  457. Language Understanding Grounded in Perception and Action

  458. https%253A%252F%252Farxiv.org%252Fabs%252F2012.05672%2523deepmind.html

  459. Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess

  460. https%253A%252F%252Farxiv.org%252Fabs%252F2009.04374%2523deepmind.html

  461. Agent57: Outperforming the human Atari benchmark

  462. https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html

  463. DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

  464. https%253A%252F%252Farxiv.org%252Fabs%252F1911.00357%2523facebook.html

  465. Emergent Tool Use from Multi-Agent Interaction § Surprising behavior

  466. https://x.com/bobmcgrewai

  467. Igor Mordatch

  468. https%253A%252F%252Fopenai.com%252Fresearch%252Femergent-tool-use%2523surprisingbehaviors.html

  469. ICML 2019 Notes

  470. https%253A%252F%252Fdavid-abel.github.io%252Fnotes%252Ficml_2019.pdf.html

  471. Human-level performance in 3D multiplayer games with population-based reinforcement learning

  472. Guy Lever

  473. Koray Kavukcuoglu

  474. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2019-jaderberg.pdf%2523deepmind.html

  475. Improving width-based planning with compact policies

  476. https%253A%252F%252Farxiv.org%252Fabs%252F1806.05898.html

  477. Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory

  478. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs42003-018-0078-7.html

  479. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

  480. Ilya Loshchilov

  481. Profile – Machine Learning Lab

  482. https%253A%252F%252Farxiv.org%252Fabs%252F1802.08842.html

  483. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

  484. Jeff Clune—Professor—Computer Science—University of British Columbia

  485. https%253A%252F%252Farxiv.org%252Fabs%252F1712.06567%2523uber.html

  486. The Netflix Recommender System

  487. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2015-gomezuribe.pdf.html

  488. Age-fitness pareto optimization

  489. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2010-schmidt.pdf.html

  490. Monte-Carlo Planning in Large POMDPs

  491. %252Fdoc%252Freinforcement-learning%252Fmodel%252F2010-silver.pdf.html

  492. Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players

  493. https%253A%252F%252Fonlinelibrary.wiley.com%252Fdoi%252F10.1111%252Fj.1551-6709.2009.01030.x.html

  494. ALPS: the age-layered population structure for reducing the problem of premature convergence

  495. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2006-hornby.pdf.html