‘imitation learning’ directory

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia (1)

Shoggoth :

https://en.wikipedia.org/wiki/Shoggoth

Miscellaneous

Bibliography

https://arxiv.org/abs/2402.04494#deepmind: “Grandmaster-Level Chess Without Search ”, Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid, Tim Genewein

link-bibliography
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission ”, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, Lei M. Zhang

link-bibliography
https://arxiv.org/abs/2310.16410#deepmind: “Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero ”, Lisa Schut, Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim

link-bibliography
https://arxiv.org/abs/2308.04445: “Getting from Generative AI to Trustworthy AI: What LLMs Might Learn from Cyc ”, Doug Lenat, Gary Marcus

link-bibliography
https://arxiv.org/abs/2306.05426: “SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling With Backtracking ”, Chris Cundy, Stefano Ermon

link-bibliography
https://arxiv.org/abs/2306.00323: “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking ”, Shengran Hu, Jeff Clune

link-bibliography
https://arxiv.org/abs/2305.20050#openai: “Let’s Verify Step by Step ”, Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

link-bibliography
https://arxiv.org/abs/2305.15717: “The False Promise of Imitating Proprietary LLMs ”, Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song

link-bibliography
https://arxiv.org/abs/2305.09836: “Revisiting the Minimalist Approach to Offline Reinforcement Learning ”, Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov

link-bibliography
https://arxiv.org/abs/2304.13705: “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware ”, Tony Z. Zhao, Vikash Kumar, Sergey Levine, Chelsea Finn

link-bibliography
https://arxiv.org/abs/2302.12422#nvidia: “MimicPlay: Long-Horizon Imitation Learning by Watching Human Play ”, Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, Anima Anandkumar

link-bibliography
2022-bakhtin.pdf: “CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning ”, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra

link-bibliography
https://arxiv.org/abs/2210.10760#openai: “Scaling Laws for Reward Model Overoptimization ”, Leo Gao, John Schulman, Jacob Hilton

link-bibliography
https://arxiv.org/abs/2210.01241: “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization ”, Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi

link-bibliography
https://arxiv.org/abs/2206.11795#openai: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos ”, Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

link-bibliography
https://arxiv.org/abs/2206.05314#deepmind: “Large-Scale Retrieval for Reinforcement Learning ”, Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent Sifre, Théophane Weber, Timothy Lillicrap

link-bibliography
https://openreview.net/forum?id=0ZbPmmB61g#google: “Boosting Search Engines With Interactive Agents ”, Massimiliano Ciaramita, Leonard Adolphs, Michelle Chen Huebscher, Sascha Rothe, Christian Buck, Thomas Hofmann, Yannic Kilcher, Lasse Espeholt, Pier Giuseppe Sessa, Lierni Sestorain, Benjamin Börschinger

link-bibliography
https://arxiv.org/abs/2204.03514#facebook: “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale ”, Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das

link-bibliography
https://arxiv.org/abs/2112.09332#openai: “WebGPT: Browser-Assisted Question-Answering With Human Feedback ”, Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

link-bibliography
https://arxiv.org/abs/2112.00861#anthropic: “A General Language Assistant As a Laboratory for Alignment ”, Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy L. Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan

link-bibliography
https://arxiv.org/abs/2105.12196#deepmind: “From Motor Control to Team Play in Simulated Humanoid Football ”, Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

link-bibliography
https://arxiv.org/abs/2101.11071: “The MineRL 2020 Competition on Sample Efficient Reinforcement Learning Using Human Priors ”, William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada Mohanty, Keisuke Nakata, Ruslan Salakhutdinov, John Schulman, Shinya Shiroshita, Nicholay Topin, Avinash Ummadisingu, Oriol Vinyals

link-bibliography
https://arxiv.org/abs/2012.05672#deepmind: “Imitating Interactive Intelligence ”, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Stephen Clark, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

link-bibliography
https://arxiv.org/abs/2011.13729#tencent: “TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game ”, Lei Han, Jiechao Xiong, Peng Sun, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Hongsheng Yu, Zhengyou Zhang

link-bibliography
https://arxiv.org/abs/1811.02549: “Language GANs Falling Short ”, Massimo Caccia, Lucas Caccia, William Fedus, Hugo Larochelle, Joelle Pineau, Laurent Charlin

link-bibliography
2018-gudmundsson.pdf: “Human-Like Playtesting With Deep Learning ”, Stefan Freyr Gudmundsson, Philipp Eisen, Erik Poromaa, Alex Nodet, Sami Purmonen, Bartlomiej Kozakowski, Richard Meurling, Lele Cao

link-bibliography
2017-sabatelli.pdf#page=3: “Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks ”, Matthia Sabatelli

link-bibliography