‘RL exploration’ directory

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia (19)

Miscellaneous

Bibliography

https://arxiv.org/abs/2405.15143: “Intelligent Go-Explore (IGE): Standing on the Shoulders of Giant Foundation Models ”⁠, Cong Lu, Shengran Hu, ⁠Jeff Clune
link-bibliography⁠
https://arxiv.org/abs/2310.03882#deepmind: “Small Batch Deep Reinforcement Learning ”⁠, Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro
link-bibliography⁠
https://arxiv.org/abs/2308.09175#deepmind: “Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”⁠, Tom Zahavy, Vivek Veeriah, Shaobo Hou …, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut⁠, Demis Hassabis⁠, Satinder Singh⁠
link-bibliography⁠
https://arxiv.org/abs/2306.14892: “Supervised Pretraining Can Learn In-Context Reinforcement Learning ”⁠, Jonathan N. Lee, Annie Xie, Aldo Pacchiano …, Yash Chandak, Chelsea Finn⁠, Ofir Nachum, Emma Brunskill
link-bibliography⁠
1986-hamming: “You And Your Research ”⁠, Richard W. Hamming⁠
link-bibliography⁠
https://arxiv.org/abs/2302.05981: “MarioGPT: Open-Ended Text2Level Generation through Large Language Models ”⁠, Shyam Sudhakaran, Miguel González-Duque, Claire Glanois …, Matthias Freiberger, Elias Najarro, Sebastian Risi
link-bibliography⁠
https://arxiv.org/abs/2301.04104#deepmind: “DreamerV3: Mastering Diverse Domains through World Models ”⁠, Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap⁠
link-bibliography⁠
https://arxiv.org/abs/2209.01975: “Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners ”⁠, Hongjin Su, Jungo Kasai, Chen Henry Wu …, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf⁠, Luke Zettlemoyer⁠, Noah Smith⁠, Tao Yu
link-bibliography⁠
https://arxiv.org/abs/2208.10291: “Trajectory Autoencoding Planner: Efficient Planning in a Compact Latent Action Space ”⁠, Zhengyao Jiang, Tianjun Zhang, Michael Janner …, Yueying Li, ⁠Tim Rocktäschel, Edward Grefenstette, ⁠Yuandong Tian
link-bibliography⁠
https://www.nature.com/articles/s41467-022-31918-9: “Value-Free Random Exploration Is Linked to Impulsivity ”⁠, Magda Dubois, Tobias U. Hauser
link-bibliography⁠
https://arxiv.org/abs/2206.11795#openai: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos ”⁠, Bowen Baker, Ilge Akkaya, Peter Zhokhov …, Joost Huizinga, Jie Tang⁠, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, ⁠Jeff Clune
link-bibliography⁠
https://arxiv.org/abs/2206.04114#google: “Director: Deep Hierarchical Planning from Pixels ”⁠, Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel⁠
link-bibliography⁠
https://openreview.net/forum?id=0ZbPmmB61g#google: “Boosting Search Engines With Interactive Agents ”⁠, Massimiliano Ciaramita, Leonard Adolphs, Michelle Chen Huebscher …, Sascha Rothe, Christian Buck, Thomas Hofmann⁠, Yannic Kilcher⁠, Lasse Espeholt, Pier Giuseppe Sessa, Lierni Sestorain, Benjamin Börschinger
link-bibliography⁠
https://arxiv.org/abs/2205.13320#google: “Towards Learning Universal Hyperparameter Optimizers With Transformers ”⁠, Yutian Chen⁠, Xingyou Song, Chansoo Lee …, Zi Wang⁠, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc’aurelio Ranzato, Sagi Perel, Nando de Freitas⁠
link-bibliography⁠
https://arxiv.org/abs/2204.05080#deepmind: “Semantic Exploration from Language Abstractions and Pretrained Representations ”⁠, Allison C. Tam, Neil C. Rabinowitz, Andrew K. Lampinen …, Nicholas A. Roy, Stephanie C. Y. Chan, D. J. Strouse, Jane X. Wang, Andrea Banino, ⁠Felix Hill
link-bibliography⁠
https://arxiv.org/abs/2204.03514#facebook: “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale ”⁠, Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das
link-bibliography⁠
https://openreview.net/forum?id=bERaNdoegnO#deepmind: “Policy Improvement by Planning With Gumbel ”⁠, Ivo Danihelka, Arthur Guez⁠, Julian Schrittwieser, David Silver⁠
link-bibliography⁠
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning ”⁠, Siqi Liu, Luke Marris, Daniel Hennes …, Josh Merel, Nicolas Heess⁠, ⁠Thore Graepel
link-bibliography⁠
https://arxiv.org/abs/2202.05008#google: “EvoJAX: Hardware-Accelerated Neuroevolution ”⁠, Yujin Tang, Yingtao Tian, ⁠David Ha
link-bibliography⁠
https://arxiv.org/abs/2112.11701#tencent: “Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination ”⁠, Rui Zhao, Jinming Song, Hu Haifeng⁠ …, Yang Gao⁠, Yi Wu⁠, Zhongqian Sun, Yang Wei
link-bibliography⁠
https://arxiv.org/abs/2111.01587#deepmind: “Procedural Generalization by Planning With Self-Supervised World Models ”⁠, Ankesh Anand, Jacob Walker, Yazhe Li …, Eszter Vértes, Julian Schrittwieser, Sherjil Ozair, Théophane Weber, Jessica B. Hamrick
link-bibliography⁠
https://arxiv.org/abs/2111.00210: “Mastering Atari Games With Limited Data ”⁠, Weirui Ye, Shaohuai Liu, Thanard Kurutach …, Pieter Abbeel⁠, Yang Gao⁠
link-bibliography⁠
2021-mehrotra.pdf#spotify: “Algorithmic Balancing of Familiarity, Similarity, & Discovery in Music Recommendations ”⁠, Rishabh Mehrotra
link-bibliography⁠
https://trajectory-transformer.github.io/: “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem ”, Michael Janner, Qiyang Colin Li, Sergey Levine⁠
link-bibliography⁠
https://arxiv.org/abs/2105.12196#deepmind: “From Motor Control to Team Play in Simulated Humanoid Football ”⁠, Siqi Liu, Guy Lever⁠, Zhe Wang …, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, ⁠Thore Graepel, Nicolas Heess⁠
link-bibliography⁠
https://www.sciencedirect.com/science/article/pii/S0004370221000862#deepmind: “Reward Is Enough ”⁠, David Silver⁠, Satinder Singh⁠, Doina Precup⁠, Richard S. Sutton⁠
link-bibliography⁠
2021-ecoffet.pdf#uber: “Go-Explore: First Return, Then Explore ”⁠, Adrien Ecoffet, Joost Huizinga, Joel Lehman⁠ …, Kenneth O. Stanley⁠, ⁠Jeff Clune
link-bibliography⁠
https://arxiv.org/abs/2101.11071: “The MineRL 2020 Competition on Sample Efficient Reinforcement Learning Using Human Priors ”⁠, William H. Guss, Mario Ynocente Castro, Sam Devlin …, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada Mohanty, Keisuke Nakata, Ruslan Salakhutdinov⁠, ⁠John Schulman, Shinya Shiroshita, Nicholay Topin, Avinash Ummadisingu, Oriol Vinyals⁠
link-bibliography⁠
https://arxiv.org/abs/2012.05672#deepmind: “Imitating Interactive Intelligence ”⁠, Josh Abramson⁠, Arun Ahuja, Arthur Brussee …, Federico Carnevale, Mary Cassin, Stephen Clark, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, ⁠Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap⁠, Kory Mathewson, Alistair Muldal, Adam Santoro⁠, Nikolay Savinov, Vikrant Varma, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu
link-bibliography⁠
https://arxiv.org/abs/2009.04374#deepmind: “Assessing Game Balance With AlphaZero: Exploring Alternative Rule Sets in Chess ”⁠, Nenad Tomašev, Ulrich Paquet, Demis Hassabis⁠, Vladimir Kramnik⁠
link-bibliography⁠
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/: “Agent57: Outperforming the Human Atari Benchmark ”⁠, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski …, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell⁠
link-bibliography⁠
https://arxiv.org/abs/1911.00357#facebook: “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames ”⁠, Erik Wijmans, Abhishek Kadian, Ari Morcos …, Stefan Lee, Irfan Essa⁠, Devi Parikh⁠, Manolis Savva, Dhruv Batra
link-bibliography⁠
https://openai.com/research/emergent-tool-use#surprisingbehaviors: “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior ”⁠, Bowen Baker, Ingmar Kanitscheider, Todor Markov …, Yi Wu⁠, Glenn Powell, Bob McGrew⁠, Igor Mordatch⁠
link-bibliography⁠
https://lilianweng.github.io/lil-log/2019/06/23/meta-reinforcement-learning.html#openai: “Meta Reinforcement Learning ”⁠, ⁠Lilian Weng
link-bibliography⁠
https://david-abel.github.io/notes/icml_2019.pdf: “ICML 2019 Notes ”⁠, David Abel
link-bibliography⁠
2019-jaderberg.pdf#deepmind: “Human-Level Performance in 3D Multiplayer Games With Population-Based Reinforcement Learning ”⁠, Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning …, Luke Marris, Guy Lever⁠, Antonio Garcia Castañeda, Charles Beattie⁠, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver⁠, Demis Hassabis⁠, Koray Kavukcuoglu⁠, ⁠Thore Graepel
link-bibliography⁠
https://arxiv.org/abs/1806.05898: “Improving Width-Based Planning With Compact Policies ”⁠, Miquel Junyent, Anders Jonsson⁠, Vicenç Gómez
link-bibliography⁠
https://www.nature.com/articles/s42003-018-0078-7: “Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory ”⁠, Andreas Pavlogiannis, Josef Tkadlec, Krishnendu Chatterjee⁠, Martin A. Nowak⁠
link-bibliography⁠
https://arxiv.org/abs/1802.08842: “Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari ”⁠, Patryk Chrabaszcz, ⁠Ilya Loshchilov, ⁠Frank Hutter
link-bibliography⁠
https://arxiv.org/abs/1712.06567#uber: “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning ”⁠, Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti …, Joel Lehman⁠, Kenneth O. Stanley⁠, ⁠Jeff Clune
link-bibliography⁠
2015-gomezuribe.pdf: “The Netflix Recommender System ”⁠, Carlos A. Gomez-Uribe, Neil Hunt
link-bibliography⁠
2010-schmidt.pdf: “Age-Fitness Pareto Optimization ”⁠, Michael D. Schmidt, Hod Lipson⁠
link-bibliography⁠
2010-silver.pdf: “Monte-Carlo Planning in Large POMDPs ”⁠, David Silver⁠, ⁠Joel Veness⁠
link-bibliography⁠
https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2009.01030.x: “Specialization Effect and Its Influence on Memory and Problem Solving in Expert Chess Players ”⁠, Merim Bilalić, Peter McLeod⁠, Fernand Gobet⁠
link-bibliography⁠
2006-hornby.pdf: “ALPS: the Age-Layered Population Structure for Reducing the Problem of Premature Convergence ”⁠, Gregory S. Hornby
link-bibliography⁠