- See Also
-
Links
- “JaxMARL: Multi-Agent RL Environments in JAX”, Rutherford et al 2023
- “Diversifying AI: Towards Creative Chess With AlphaZero (AZdb)”, Zahavy et al 2023
- “Deep RL at Scale: Sorting Waste in Office Buildings With a Fleet of Mobile Manipulators”, Herzog et al 2023
- “Scaling Laws for Single-agent Reinforcement Learning”, Hilton et al 2023
- “DreamerV3: Mastering Diverse Domains through World Models”, Hafner et al 2023
- “Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes”, Kumar et al 2022
- “VeLO: Training Versatile Learned Optimizers by Scaling Up”, Metz et al 2022
- “Scaling Laws for Reward Model Overoptimization”, Gao et al 2022
- “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Patel et al 2022
-
“
g.pt
: Learning to Learn With Generative Models of Neural Network Checkpoints”, Peebles et al 2022 - “Human-level Atari 200× Faster”, Kapturowski et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, Soltan et al 2022
- “TextWorldExpress: Simulating Text Games at One Million Steps Per Second”, Jansen & Côté 2022
- “Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”, Hassabis & Fridman 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
- “Multi-Game Decision Transformers”, Lee et al 2022
- “Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Caccia et al 2022
- “CT0: Fine-tuned Language Models Are Continual Learners”, Scialom et al 2022
- “Flexible Diffusion Modeling of Long Videos”, Harvey et al 2022
- “Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Honovich et al 2022
- “Gato: A Generalist Agent”, Reed et al 2022
- “Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Chan et al 2022
- “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ramrakhya et al 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Ahn et al 2022
- “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Zeng et al 2022
- “It Looks Like You’re Trying To Take Over The World”, Gwern 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
- “A Data-driven Approach for Learning to Control Computers”, Humphreys et al 2022
- “EvoJAX: Hardware-Accelerated Neuroevolution”, Tang et al 2022
- “Accelerated Quality-Diversity for Robotics through Massive Parallelism”, Lim et al 2022
- “Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)”, Yarats et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
- “In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, Kurin et al 2022
- “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Pan et al 2022
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021
- “WebGPT: Browser-assisted Question-answering With Human Feedback”, Nakano et al 2021
- “Acquisition of Chess Knowledge in AlphaZero”, McGrath et al 2021
- “AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Lu et al 2021
- “An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
- “Procedural Generalization by Planning With Self-Supervised World Models”, Anand et al 2021
- “MetaICL: Learning to Learn In Context”, Min et al 2021
- “T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Sanh et al 2021
- “Collaborating With Humans without Human Data”, Strouse et al 2021
- “Bridge Data: Boosting Generalization of Robotic Skills With Cross-Domain Datasets”, Ebert et al 2021
- “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, Rudin et al 2021
- “Recursively Summarizing Books With Human Feedback”, Wu et al 2021
- “FLAN: Finetuned Language Models Are Zero-Shot Learners”, Wei et al 2021
- “Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”, Nair et al 2021
- “WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”, Lan et al 2021
- “Multi-Task Self-Training for Learning General Representations”, Ghiasi et al 2021
- “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, Makoviychuk et al 2021
- “Open-Ended Learning Leads to Generally Capable Agents”, Team et al 2021
- “Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, Petrenko et al 2021
- “Evaluating Large Language Models Trained on Code”, Chen et al 2021
- “PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Vicol et al 2021
- “Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
- “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, Freeman et al 2021
- “PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, Zellers et al 2021
- “From Motor Control to Team Play in Simulated Humanoid Football”, Liu et al 2021
- “Reward Is Enough”, Silver et al 2021
- “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Schrittwieser et al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, Hessel et al 2021
- “Scaling Scaling Laws With Board Games”, Jones 2021
- “Large Batch Simulation for Deep Reinforcement Learning”, Shacklett et al 2021
- “Training Larger Networks for Deep Reinforcement Learning”, Ota et al 2021
- “Investment vs. Reward in a Competitive Knapsack Problem”, Neumann & Gros 2021
- “NNUE: The Neural Network of the Stockfish Chess Engine”, Goucher 2021
- “Imitating Interactive Intelligence”, Abramson et al 2020
- “Scaling down Deep Learning”, Greydanus 2020
- “Understanding RL Vision: With Diverse Environments, We Can Analyze, Diagnose and Edit Deep Reinforcement Learning Models Using Attribution”, Hilton et al 2020
- “Meta-trained Agents Implement Bayes-optimal Agents”, Mikulik et al 2020
- “Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, Anonymous 2020
- “Learning to Summarize from Human Feedback”, Stiennon et al 2020
- “Measuring Hardware Overhang”, hippke 2020
- “Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, Petrenko et al 2020
- “The Scaling Hypothesis”, Gwern 2020
- “Agent57: Outperforming the Human Atari Benchmark”, Puigdomènech et al 2020
- “Deep Neuroethology of a Virtual Rodent”, Merel et al 2020
- “Near-perfect Point-goal Navigation from 2.5 Billion Frames of Experience”, Wijmans & Kadian 2020
- “Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-to-use Procedurally-generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, Cobbe et al 2019
- “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Wijmans et al 2019
- “Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, Vinyals et al 2019
- “Solving Rubik’s Cube With a Robot Hand”, OpenAI et al 2019
- “Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
- “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Baker et al 2019
- “Meta Reinforcement Learning”, Weng 2019
- “Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, Jaderberg et al 2019
- “AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, Clune 2019
- “Meta-learning of Sequential Strategies”, Ortega et al 2019
- “Habitat: A Platform for Embodied AI Research”, Savva et al 2019
- “The Bitter Lesson”, Sutton 2019
- “Benchmarking Classic and Learned Navigation in Complex 3D Environments”, Mishkin et al 2019
- “Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size”, Berner 2019 (page 13)
- “How AI Training Scales”, McCandlish et al 2018
- “An Empirical Model of Large-Batch Training”, McCandlish et al 2018
- “Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018
- “Quantifying Generalization in Reinforcement Learning”, Cobbe et al 2018
- “One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, Paine et al 2018
- “Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”, Gupta et al 2018
- “Human-level Performance in First-person Multiplayer Games With Population-based Deep Reinforcement Learning”, Jaderberg et al 2018
- “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, Kalashnikov et al 2018
- “Playing Atari With Six Neurons”, Cuccu et al 2018
- “AI and Compute”, Amodei et al 2018
- “Accelerated Methods for Deep Reinforcement Learning”, Stooke & Abbeel 2018
- “One Big Net For Everything”, Schmidhuber 2018
- “Interactive Grounded Language Acquisition and Generalization in a 2D World”, Yu et al 2018
- “Emergence of Locomotion Behaviors in Rich Environments”, Heess et al 2017
- “Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
- “Research Ideas”, Gwern 2017
- “Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, Salimans et al 2017
- “Why Tool AIs Want to Be Agent AIs”, Gwern 2016
- “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, Schmidhuber 2015
- “Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, Nair et al 2015
- “Algorithmic Progress in Six Domains”, Grace 2013
- “Robot Predictions Evolution”, Moravec 2004
- “When Will Computer Hardware Match the Human Brain?”, Moravec 1998
- “Human Window on the World”, Michie 1985
-
“Submission #6347: Chef Stef’s NES Arkanoid
warpless
in 11:11.18” - “Training a CUDA TDS Ant Using C++ ARS Linear Policy: The Video Is Real-time, After a Few Minutes (in the 30 Million Steps) the Training Curve Is Flat (I Trained Until a Billion Steps). Note That This Ant Is PD Control, and Not Identical to Either MuJoCo or PyBullet Ant, so the Training Curves Are Not Comparable Yet. Will Fix That.”
- “Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
- “If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It's Just Gonna Work Just like Supervised Learning. It's the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It's Not Just Vision, It's Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.”
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“JaxMARL: Multi-Agent RL Environments in JAX”, Rutherford et al 2023
“Diversifying AI: Towards Creative Chess With AlphaZero (AZdb)”, Zahavy et al 2023
“Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)”
“Deep RL at Scale: Sorting Waste in Office Buildings With a Fleet of Mobile Manipulators”, Herzog et al 2023
“Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators”
“Scaling Laws for Single-agent Reinforcement Learning”, Hilton et al 2023
“DreamerV3: Mastering Diverse Domains through World Models”, Hafner et al 2023
“Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes”, Kumar et al 2022
“Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes”
“VeLO: Training Versatile Learned Optimizers by Scaling Up”, Metz et al 2022
“Scaling Laws for Reward Model Overoptimization”, Gao et al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”, Patel et al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”
“g.pt
: Learning to Learn With Generative Models of Neural Network Checkpoints”, Peebles et al 2022
“g.pt
: Learning to Learn with Generative Models of Neural Network Checkpoints”
“Human-level Atari 200× Faster”, Kapturowski et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”
“AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, Soltan et al 2022
“AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”
“TextWorldExpress: Simulating Text Games at One Million Steps Per Second”, Jansen & Côté 2022
“TextWorldExpress: Simulating Text Games at One Million Steps Per Second”
“Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”, Hassabis & Fridman 2022
“Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Baker et al 2022
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”
“Multi-Game Decision Transformers”, Lee et al 2022
“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Caccia et al 2022
“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”
“CT0: Fine-tuned Language Models Are Continual Learners”, Scialom et al 2022
“Flexible Diffusion Modeling of Long Videos”, Harvey et al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Honovich et al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”
“Gato: A Generalist Agent”, Reed et al 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Chan et al 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ramrakhya et al 2022
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Ahn et al 2022
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”
“Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Zeng et al 2022
“Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language”
“It Looks Like You’re Trying To Take Over The World”, Gwern 2022
“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
“InstructGPT: Training language models to follow instructions with human feedback”
“A Data-driven Approach for Learning to Control Computers”, Humphreys et al 2022
“EvoJAX: Hardware-Accelerated Neuroevolution”, Tang et al 2022
“Accelerated Quality-Diversity for Robotics through Massive Parallelism”, Lim et al 2022
“Accelerated Quality-Diversity for Robotics through Massive Parallelism”
“Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)”, Yarats et al 2022
“Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
“In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, Kurin et al 2022
“In Defense of the Unitary Scalarization for Deep Multi-Task Learning”
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Pan et al 2022
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”
“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021
“WebGPT: Improving the factual accuracy of language models through web browsing”
“WebGPT: Browser-assisted Question-answering With Human Feedback”, Nakano et al 2021
“WebGPT: Browser-assisted question-answering with human feedback”
“Acquisition of Chess Knowledge in AlphaZero”, McGrath et al 2021
“AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Lu et al 2021
“AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale”
“An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
“An Explanation of In-context Learning as Implicit Bayesian Inference”
“Procedural Generalization by Planning With Self-Supervised World Models”, Anand et al 2021
“Procedural Generalization by Planning with Self-Supervised World Models”
“MetaICL: Learning to Learn In Context”, Min et al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Sanh et al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”
“Collaborating With Humans without Human Data”, Strouse et al 2021
“Bridge Data: Boosting Generalization of Robotic Skills With Cross-Domain Datasets”, Ebert et al 2021
“Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets”
“Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, Rudin et al 2021
“Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”
“Recursively Summarizing Books With Human Feedback”, Wu et al 2021
“FLAN: Finetuned Language Models Are Zero-Shot Learners”, Wei et al 2021
“Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”, Nair et al 2021
“Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”
“WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”, Lan et al 2021
“WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”
“Multi-Task Self-Training for Learning General Representations”, Ghiasi et al 2021
“Multi-Task Self-Training for Learning General Representations”
“Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, Makoviychuk et al 2021
“Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”
“Open-Ended Learning Leads to Generally Capable Agents”, Team et al 2021
“Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, Petrenko et al 2021
“Megaverse: Simulating Embodied Agents at One Million Experiences per Second”
“Evaluating Large Language Models Trained on Code”, Chen et al 2021
“PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Vicol et al 2021
“Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
“Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, Freeman et al 2021
“Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”
“PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, Zellers et al 2021
“PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”
“From Motor Control to Team Play in Simulated Humanoid Football”, Liu et al 2021
“From Motor Control to Team Play in Simulated Humanoid Football”
“Reward Is Enough”, Silver et al 2021
“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Schrittwieser et al 2021
“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model”
“Podracer Architectures for Scalable Reinforcement Learning”, Hessel et al 2021
“Podracer architectures for scalable Reinforcement Learning”
“Scaling Scaling Laws With Board Games”, Jones 2021
“Large Batch Simulation for Deep Reinforcement Learning”, Shacklett et al 2021
“Training Larger Networks for Deep Reinforcement Learning”, Ota et al 2021
“Investment vs. Reward in a Competitive Knapsack Problem”, Neumann & Gros 2021
“NNUE: The Neural Network of the Stockfish Chess Engine”, Goucher 2021
“Imitating Interactive Intelligence”, Abramson et al 2020
“Scaling down Deep Learning”, Greydanus 2020
“Understanding RL Vision: With Diverse Environments, We Can Analyze, Diagnose and Edit Deep Reinforcement Learning Models Using Attribution”, Hilton et al 2020
“Meta-trained Agents Implement Bayes-optimal Agents”, Mikulik et al 2020
“Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, Anonymous 2020
“Measuring Progress in Deep Reinforcement Learning Sample Efficiency”
“Learning to Summarize from Human Feedback”, Stiennon et al 2020
“Measuring Hardware Overhang”, hippke 2020
“Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, Petrenko et al 2020
“The Scaling Hypothesis”, Gwern 2020
“Agent57: Outperforming the Human Atari Benchmark”, Puigdomènech et al 2020
“Deep Neuroethology of a Virtual Rodent”, Merel et al 2020
“Near-perfect Point-goal Navigation from 2.5 Billion Frames of Experience”, Wijmans & Kadian 2020
“Near-perfect point-goal navigation from 2.5 billion frames of experience”
“Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-to-use Procedurally-generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, Cobbe et al 2019
“DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Wijmans et al 2019
“DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”
“Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, Vinyals et al 2019
“Grandmaster level in StarCraft II using multi-agent reinforcement learning”
“Solving Rubik’s Cube With a Robot Hand”, OpenAI et al 2019
“Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
“Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Baker et al 2019
“Emergent Tool Use from Multi-Agent Interaction § Surprising behavior”
“Meta Reinforcement Learning”, Weng 2019
“Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, Jaderberg et al 2019
“Human-level performance in 3D multiplayer games with population-based reinforcement learning”
“AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, Clune 2019
“Meta-learning of Sequential Strategies”, Ortega et al 2019
“Habitat: A Platform for Embodied AI Research”, Savva et al 2019
“The Bitter Lesson”, Sutton 2019
“Benchmarking Classic and Learned Navigation in Complex 3D Environments”, Mishkin et al 2019
“Benchmarking Classic and Learned Navigation in Complex 3D Environments”
“Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size”, Berner 2019 (page 13)
“Dota 2 with Large Scale Deep Reinforcement Learning: §4.3: Batch Size”
“How AI Training Scales”, McCandlish et al 2018
“An Empirical Model of Large-Batch Training”, McCandlish et al 2018
“Bayesian Layers: A Module for Neural Network Uncertainty”, Tran et al 2018
“Quantifying Generalization in Reinforcement Learning”, Cobbe et al 2018
“One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, Paine et al 2018
“One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL”
“Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”, Gupta et al 2018
“Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”
“Human-level Performance in First-person Multiplayer Games With Population-based Deep Reinforcement Learning”, Jaderberg et al 2018
“QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, Kalashnikov et al 2018
“QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”
“Playing Atari With Six Neurons”, Cuccu et al 2018
“AI and Compute”, Amodei et al 2018
“Accelerated Methods for Deep Reinforcement Learning”, Stooke & Abbeel 2018
“One Big Net For Everything”, Schmidhuber 2018
“Interactive Grounded Language Acquisition and Generalization in a 2D World”, Yu et al 2018
“Interactive Grounded Language Acquisition and Generalization in a 2D World”
“Emergence of Locomotion Behaviors in Rich Environments”, Heess et al 2017
“Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
“Research Ideas”, Gwern 2017
“Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, Salimans et al 2017
“Evolution Strategies as a Scalable Alternative to Reinforcement Learning”
“Why Tool AIs Want to Be Agent AIs”, Gwern 2016
“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, Schmidhuber 2015
“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, Nair et al 2015
“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”
“Algorithmic Progress in Six Domains”, Grace 2013
“Robot Predictions Evolution”, Moravec 2004
“When Will Computer Hardware Match the Human Brain?”, Moravec 1998
“Human Window on the World”, Michie 1985
“Submission #6347: Chef Stef’s NES Arkanoid warpless
in 11:11.18”
“Submission #6347: Chef Stef’s NES Arkanoid warpless
in 11:11.18”
“Training a CUDA TDS Ant Using C++ ARS Linear Policy: The Video Is Real-time, After a Few Minutes (in the 30 Million Steps) the Training Curve Is Flat (I Trained Until a Billion Steps). Note That This Ant Is PD Control, and Not Identical to Either MuJoCo or PyBullet Ant, so the Training Curves Are Not Comparable Yet. Will Fix That.”
“Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
“Ilya Sutskever: Deep Learning | AI Podcast #94 with Lex Fridman”
“If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It's Just Gonna Work Just like Supervised Learning. It's the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It's Not Just Vision, It's Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
scaling
generalizing-ai
human-feedback
Miscellaneous
-
https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-chess/
-
https://blog.research.google/2023/01/google-research-2022-beyond-language.html
-
https://jdlm.info/articles/2018/03/18/markov-decision-process-2048.html
-
https://www.anthropic.com/index/anthropics-responsible-scaling-policy
-
https://www.deepmind.com/publications/open-ended-learning-leads-to-generally-capable-agents
-
https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization
Link Bibliography
-
https://arxiv.org/abs/2311.10090
: “JaxMARL: Multi-Agent RL Environments in JAX”, -
https://arxiv.org/abs/2301.04104#deepmind
: “DreamerV3: Mastering Diverse Domains through World Models”, Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap -
https://arxiv.org/abs/2210.10760#openai
: “Scaling Laws for Reward Model Overoptimization”, Leo Gao, John Schulman, Jacob Hilton -
https://arxiv.org/abs/2209.14500
: “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch -
https://arxiv.org/abs/2209.12892
: “g.pt
: Learning to Learn With Generative Models of Neural Network Checkpoints”, William Peebles, Ilija Radosavovic, Tim Brooks, Alexei A. Efros, Jitendra Malik -
https://arxiv.org/abs/2209.07550#deepmind
: “Human-level Atari 200× Faster”, Steven Kapturowski, Víctor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, Adrià Puigdomènech Badia -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2208.01448#amazon
: “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, -
https://arxiv.org/abs/2206.11795#openai
: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, -
https://arxiv.org/abs/2205.15241#google
: “Multi-Game Decision Transformers”, -
https://arxiv.org/abs/2205.12393
: “CT0: Fine-tuned Language Models Are Continual Learners”, Thomas Scialom, Tuhin Chakrabarty, Smaranda Muresan -
https://arxiv.org/abs/2205.06175#deepmind
: “Gato: A Generalist Agent”, -
https://arxiv.org/abs/2204.03514#facebook
: “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das -
https://arxiv.org/abs/2204.01691#google
: “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, -
https://arxiv.org/abs/2204.00598#google
: “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, -
clippy
: “It Looks Like You’re Trying To Take Over The World”, Gwern -
https://arxiv.org/abs/2201.03544
: “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Alexander Pan, Kush Bhatia, Jacob Steinhardt -
https://openai.com/research/webgpt
: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman -
https://arxiv.org/abs/2112.09332#openai
: “WebGPT: Browser-assisted Question-answering With Human Feedback”, -
https://arxiv.org/abs/2111.09259#deepmind
: “Acquisition of Chess Knowledge in AlphaZero”, -
https://arxiv.org/abs/2111.01587#deepmind
: “Procedural Generalization by Planning With Self-Supervised World Models”, -
https://arxiv.org/abs/2109.10862#openai
: “Recursively Summarizing Books With Human Feedback”, Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano -
https://proceedings.mlr.press/v139/vicol21a.html
: “PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein -
https://arxiv.org/abs/2106.13281#google
: “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem -
https://arxiv.org/abs/2105.12196#deepmind
: “From Motor Control to Team Play in Simulated Humanoid Football”, -
https://www.sciencedirect.com/science/article/pii/S0004370221000862#deepmind
: “Reward Is Enough”, David Silver, Satinder Singh, Doina Precup, Richard S. Sutton -
https://arxiv.org/abs/2104.06294#deepmind
: “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Julian Schrittwieser, Thomas Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver -
https://arxiv.org/abs/2104.06272#deepmind
: “Podracer Architectures for Scalable Reinforcement Learning”, Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt -
https://arxiv.org/abs/2012.05672#deepmind
: “Imitating Interactive Intelligence”, -
https://greydanus.github.io/2020/12/01/scaling-down/
: “Scaling down Deep Learning”, Sam Greydanus -
scaling-hypothesis
: “The Scaling Hypothesis”, Gwern -
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
: “Agent57: Outperforming the Human Atari Benchmark”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell -
https://openreview.net/forum?id=SyxrxR4KPS#deepmind
: “Deep Neuroethology of a Virtual Rodent”, Josh Merel, Diego Aldarondo, Jesse Marshall, Yuval Tassa, Greg Wayne, Bence Olveczky (DM/Harvard) -
https://openai.com/research/procgen-benchmark
: “Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-to-use Procedurally-generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman -
https://arxiv.org/abs/1911.00357#facebook
: “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra -
2019-vinyals.pdf#deepmind
: “Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, -
https://openai.com/research/emergent-tool-use#surprisingbehaviors
: “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch -
2019-jaderberg.pdf#deepmind
: “Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, -
https://arxiv.org/abs/1904.01201#facebook
: “Habitat: A Platform for Embodied AI Research”, -
https://openai.com/research/how-ai-training-scales
: “How AI Training Scales”, Sam McCandlish, Jared Kaplan, Dario Amodei -
https://openai.com/research/ai-and-compute
: “AI and Compute”, Dario Amodei, Danny Hernandez, Girish Sastry, Jack Clark, Greg Brockman, Ilya Sutskever -
idea
: “Research Ideas”, Gwern -
tool-ai
: “Why Tool AIs Want to Be Agent AIs”, Gwern -
https://web.archive.org/web/20230718144747/https://frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html
: “Robot Predictions Evolution”, Hans Moravec