- See Also
-
Links
- “Scaling Laws for Single-agent Reinforcement Learning”, Et Al 2023
- “DreamerV3: Mastering Diverse Domains through World Models”, Et Al 2023
- “VeLO: Training Versatile Learned Optimizers by Scaling Up”, Et Al 2022
- “Scaling Laws for Reward Model Overoptimization”, Et Al 2022
- “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Et Al 2022
- “Human-level Atari 200× Faster”, Et Al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Et Al 2022
- “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, Et Al 2022
- “TextWorldExpress: Simulating Text Games at One Million Steps Per Second”, Jansen & 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Et Al 2022
- “Multi-Game Decision Transformers”, Et Al 2022
- “Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Et Al 2022
- “CT0: Fine-tuned Language Models Are Continual Learners”, Et Al 2022
- “Flexible Diffusion Modeling of Long Videos”, Et Al 2022
- “Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Et Al 2022
- “Gato: A Generalist Agent”, Et Al 2022
- “Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Et Al 2022
- “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Et Al 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Et Al 2022
- “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Et Al 2022
- “It Looks Like You’re Trying To Take Over The World”, Gwern 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Et Al 2022
- “A Data-driven Approach for Learning to Control Computers”, Et Al 2022
- “EvoJAX: Hardware-Accelerated Neuroevolution”, Et Al 2022
- “Accelerated Quality-Diversity for Robotics through Massive Parallelism”, Et Al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Et Al 2022
- “In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, Et Al 2022
- “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Et Al 2022
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Et Al 2021
- “WebGPT: Browser-assisted Question-answering With Human Feedback”, Et Al 2021
- “Acquisition of Chess Knowledge in AlphaZero”, Et Al 2021
- “AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Et Al 2021
- “An Explanation of In-context Learning As Implicit Bayesian Inference”, Et Al 2021
- “Procedural Generalization by Planning With Self-Supervised World Models”, Et Al 2021
- “MetaICL: Learning to Learn In Context”, Et Al 2021
- “T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Et Al 2021
- “Collaborating With Humans without Human Data”, Et Al 2021
- “Bridge Data: Boosting Generalization of Robotic Skills With Cross-Domain Datasets”, Et Al 2021
- “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, Et Al 2021
- “Recursively Summarizing Books With Human Feedback”, Et Al 2021
- “FLAN: Finetuned Language Models Are Zero-Shot Learners”, Et Al 2021
- “Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”, Et Al 2021
- “WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”, Et Al 2021
- “Multi-Task Self-Training for Learning General Representations”, Et Al 2021
- “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, Et Al 2021
- “Open-Ended Learning Leads to Generally Capable Agents”, Et Al 2021
- “Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, Et Al 2021
- “Evaluating Large Language Models Trained on Code”, Et Al 2021
- “Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Et Al 2021
- “Multimodal Few-Shot Learning With Frozen Language Models”, Et Al 2021
- “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, Et Al 2021
- “PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, Et Al 2021
- “From Motor Control to Team Play in Simulated Humanoid Football”, Et Al 2021
- “Reward Is Enough”, Et Al 2021
- “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Et Al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, Et Al 2021
- “Scaling Scaling Laws With Board Games”, 2021
- “Large Batch Simulation for Deep Reinforcement Learning”, Et Al 2021
- “Investment vs. Reward in a Competitive Knapsack Problem”, 2021
- “NNUE: The Neural Network of the Stockfish Chess Engine”, 2021
- “Imitating Interactive Intelligence”, Et Al 2020
- “Scaling down Deep Learning”, 2020
- “Understanding RL Vision: With Diverse Environments, We Can Analyze, Diagnose and Edit Deep Reinforcement Learning Models Using Attribution”, Et Al 2020
- “Meta-trained Agents Implement Bayes-optimal Agents”, Et Al 2020
- “Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, 2020
- “Learning to Summarize from Human Feedback”, Et Al 2020
- “Measuring Hardware Overhang”, Hippke 2020
- “Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, Et Al 2020
- “The Scaling Hypothesis”, Gwern 2020
- “Agent57: Outperforming the Human Atari Benchmark”, Et Al 2020
- “Deep Neuroethology of a Virtual Rodent”, Et Al 2020
- “Near-perfect Point-goal Navigation from 2.5 Billion Frames of Experience”, 2020
- “Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-to-use Procedurally-generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, Et Al 2019
- “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Et Al 2019
- “Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, Et Al 2019
- “Solving Rubik’s Cube With a Robot Hand”, OpenAI Et Al 2019
- “Fine-Tuning Language Models from Human Preferences”, Et Al 2019
- “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Et Al 2019
- “Meta Reinforcement Learning”, 2019
- “Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, Et Al 2019
- “AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
- “Meta-learning of Sequential Strategies”, Et Al 2019
- “Habitat: A Platform for Embodied AI Research”, Et Al 2019
- “The Bitter Lesson”, 2019
- “Benchmarking Classic and Learned Navigation in Complex 3D Environments”, Et Al 2019
- “Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size”, Berner & Al 2019 (page 13)
- “How AI Training Scales”, Et Al 2018
- “An Empirical Model of Large-Batch Training”, Et Al 2018
- “Bayesian Layers: A Module for Neural Network Uncertainty”, Et Al 2018
- “Quantifying Generalization in Reinforcement Learning”, Et Al 2018
- “One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, Et Al 2018
- “Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”, Et Al 2018
- “Human-level Performance in First-person Multiplayer Games With Population-based Deep Reinforcement Learning”, Et Al 2018
- “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, Et Al 2018
- “Playing Atari With Six Neurons”, Et Al 2018
- “AI and Compute”, Et Al 2018
- “One Big Net For Everything”, 2018
- “Interactive Grounded Language Acquisition and Generalization in a 2D World”, Et Al 2018
- “Emergence of Locomotion Behaviors in Rich Environments”, Et Al 2017
- “Deep Reinforcement Learning from Human Preferences”, Et Al 2017
- “Research Ideas”, 2017
- “Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, Et Al 2017
- “Why Tool AIs Want to Be Agent AIs”, Gwern 2016
- “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, 2015
- “Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, Et Al 2015
- “Algorithmic Progress in Six Domains”, 2013
- “Robot Predictions Evolution”, 2004
- “When Will Computer Hardware Match the Human Brain?”, 1998
- “Human Window on the World”, 1985
- “Submission #6347: Chef Stef’s NES Arkanoid ‘Warpless’ In 11:11.18”
- “Training a CUDA TDS Ant Using C++ ARS Linear Policy: The Video Is Real-time, After a Few Minutes (in the 30 Million Steps) the Training Curve Is Flat (I Trained Until a Billion Steps). Note That This Ant Is PD Control, and Not Identical to Either MuJoCo or PyBullet Ant, so the Training Curves Are Not Comparable Yet. Will Fix That.”
- “Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
- “If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It’s Just Gonna Work Just like Supervised Learning. It’s the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It’s Not Just Vision, It’s Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.”
- Miscellaneous
- Link Bibliography
See Also
Links
“Scaling Laws for Single-agent Reinforcement Learning”, Et Al 2023
“DreamerV3: Mastering Diverse Domains through World Models”, Et Al 2023
“VeLO: Training Versatile Learned Optimizers by Scaling Up”, Et Al 2022
“Scaling Laws for Reward Model Overoptimization”, Et Al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”, Et Al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”
“Human-level Atari 200× Faster”, Et Al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Et Al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”
“AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, Et Al 2022
“AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”
“TextWorldExpress: Simulating Text Games at One Million Steps Per Second”, Jansen & 2022
“TextWorldExpress: Simulating Text Games at One Million Steps Per Second”
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, Et Al 2022
“Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”
“Multi-Game Decision Transformers”, Et Al 2022
“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Et Al 2022
“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”
“CT0: Fine-tuned Language Models Are Continual Learners”, Et Al 2022
“Flexible Diffusion Modeling of Long Videos”, Et Al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Et Al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”
“Gato: A Generalist Agent”, Et Al 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Et Al 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Et Al 2022
“Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Et Al 2022
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”
“Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, Et Al 2022
“Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language”
“It Looks Like You’re Trying To Take Over The World”, Gwern 2022
“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Et Al 2022
“InstructGPT: Training language models to follow instructions with human feedback”
“A Data-driven Approach for Learning to Control Computers”, Et Al 2022
“EvoJAX: Hardware-Accelerated Neuroevolution”, Et Al 2022
“Accelerated Quality-Diversity for Robotics through Massive Parallelism”, Et Al 2022
“Accelerated Quality-Diversity for Robotics through Massive Parallelism”
“Can Wikipedia Help Offline Reinforcement Learning?”, Et Al 2022
“In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, Et Al 2022
“In Defense of the Unitary Scalarization for Deep Multi-Task Learning”
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Et Al 2022
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”
“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Et Al 2021
“WebGPT: Improving the factual accuracy of language models through web browsing”
“WebGPT: Browser-assisted Question-answering With Human Feedback”, Et Al 2021
“WebGPT: Browser-assisted question-answering with human feedback”
“Acquisition of Chess Knowledge in AlphaZero”, Et Al 2021
“AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, Et Al 2021
“AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale”
“An Explanation of In-context Learning As Implicit Bayesian Inference”, Et Al 2021
“An Explanation of In-context Learning as Implicit Bayesian Inference”
“Procedural Generalization by Planning With Self-Supervised World Models”, Et Al 2021
“Procedural Generalization by Planning with Self-Supervised World Models”
“MetaICL: Learning to Learn In Context”, Et Al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Et Al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”
“Collaborating With Humans without Human Data”, Et Al 2021
“Bridge Data: Boosting Generalization of Robotic Skills With Cross-Domain Datasets”, Et Al 2021
“Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets”
“Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, Et Al 2021
“Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”
“Recursively Summarizing Books With Human Feedback”, Et Al 2021
“FLAN: Finetuned Language Models Are Zero-Shot Learners”, Et Al 2021
“Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”, Et Al 2021
“Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”
“WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”, Et Al 2021
“WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU”
“Multi-Task Self-Training for Learning General Representations”, Et Al 2021
“Multi-Task Self-Training for Learning General Representations”
“Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, Et Al 2021
“Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”
“Open-Ended Learning Leads to Generally Capable Agents”, Et Al 2021
“Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, Et Al 2021
“Megaverse: Simulating Embodied Agents at One Million Experiences per Second”
“Evaluating Large Language Models Trained on Code”, Et Al 2021
“Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Et Al 2021
“Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies”
“Multimodal Few-Shot Learning With Frozen Language Models”, Et Al 2021
“Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, Et Al 2021
“Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”
“PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, Et Al 2021
“PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”
“From Motor Control to Team Play in Simulated Humanoid Football”, Et Al 2021
“From Motor Control to Team Play in Simulated Humanoid Football”
“Reward Is Enough”, Et Al 2021
“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Et Al 2021
“MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model”
“Podracer Architectures for Scalable Reinforcement Learning”, Et Al 2021
“Podracer architectures for scalable Reinforcement Learning”
“Scaling Scaling Laws With Board Games”, 2021
“Large Batch Simulation for Deep Reinforcement Learning”, Et Al 2021
“Investment vs. Reward in a Competitive Knapsack Problem”, 2021
“NNUE: The Neural Network of the Stockfish Chess Engine”, 2021
“Imitating Interactive Intelligence”, Et Al 2020
“Scaling down Deep Learning”, 2020
“Understanding RL Vision: With Diverse Environments, We Can Analyze, Diagnose and Edit Deep Reinforcement Learning Models Using Attribution”, Et Al 2020
“Meta-trained Agents Implement Bayes-optimal Agents”, Et Al 2020
“Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, 2020
“Measuring Progress in Deep Reinforcement Learning Sample Efficiency”
“Learning to Summarize from Human Feedback”, Et Al 2020
“Measuring Hardware Overhang”, Hippke 2020
“Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, Et Al 2020
“The Scaling Hypothesis”, Gwern 2020
“Agent57: Outperforming the Human Atari Benchmark”, Et Al 2020
“Deep Neuroethology of a Virtual Rodent”, Et Al 2020
“Near-perfect Point-goal Navigation from 2.5 Billion Frames of Experience”, 2020
“Near-perfect point-goal navigation from 2.5 billion frames of experience”
“Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-to-use Procedurally-generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, Et Al 2019
“DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Et Al 2019
“DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”
“Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, Et Al 2019
“Grandmaster level in StarCraft II using multi-agent reinforcement learning”
“Solving Rubik’s Cube With a Robot Hand”, OpenAI Et Al 2019
“Fine-Tuning Language Models from Human Preferences”, Et Al 2019
“Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Et Al 2019
“Emergent Tool Use from Multi-Agent Interaction § Surprising behavior”
“Meta Reinforcement Learning”, 2019
“Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, Et Al 2019
“Human-level performance in 3D multiplayer games with population-based reinforcement learning”
“AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
“Meta-learning of Sequential Strategies”, Et Al 2019
“Habitat: A Platform for Embodied AI Research”, Et Al 2019
“The Bitter Lesson”, 2019
“Benchmarking Classic and Learned Navigation in Complex 3D Environments”, Et Al 2019
“Benchmarking Classic and Learned Navigation in Complex 3D Environments”
“Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size”, Berner & Al 2019 (page 13)
“Dota 2 with Large Scale Deep Reinforcement Learning: §4.3: Batch Size”
“How AI Training Scales”, Et Al 2018
“An Empirical Model of Large-Batch Training”, Et Al 2018
“Bayesian Layers: A Module for Neural Network Uncertainty”, Et Al 2018
“Quantifying Generalization in Reinforcement Learning”, Et Al 2018
“One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, Et Al 2018
“One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL”
“Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”, Et Al 2018
“Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”
“Human-level Performance in First-person Multiplayer Games With Population-based Deep Reinforcement Learning”, Et Al 2018
“QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, Et Al 2018
“QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”
“Playing Atari With Six Neurons”, Et Al 2018
“AI and Compute”, Et Al 2018
“One Big Net For Everything”, 2018
“Interactive Grounded Language Acquisition and Generalization in a 2D World”, Et Al 2018
“Interactive Grounded Language Acquisition and Generalization in a 2D World”
“Emergence of Locomotion Behaviors in Rich Environments”, Et Al 2017
“Deep Reinforcement Learning from Human Preferences”, Et Al 2017
“Research Ideas”, 2017
“Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, Et Al 2017
“Evolution Strategies as a Scalable Alternative to Reinforcement Learning”
“Why Tool AIs Want to Be Agent AIs”, Gwern 2016
“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, 2015
“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, Et Al 2015
“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”
“Algorithmic Progress in Six Domains”, 2013
“Robot Predictions Evolution”, 2004
“When Will Computer Hardware Match the Human Brain?”, 1998
“Human Window on the World”, 1985
“Submission #6347: Chef Stef’s NES Arkanoid ‘Warpless’ In 11:11.18”
“Submission #6347: Chef Stef’s NES Arkanoid 'warpless' in 11:11.18”
“Training a CUDA TDS Ant Using C++ ARS Linear Policy: The Video Is Real-time, After a Few Minutes (in the 30 Million Steps) the Training Curve Is Flat (I Trained Until a Billion Steps). Note That This Ant Is PD Control, and Not Identical to Either MuJoCo or PyBullet Ant, so the Training Curves Are Not Comparable Yet. Will Fix That.”
“Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
“Ilya Sutskever: Deep Learning | AI Podcast #94 with Lex Fridman”
“If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It’s Just Gonna Work Just like Supervised Learning. It’s the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It’s Not Just Vision, It’s Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.”
Miscellaneous
-
https://ai.googleblog.com/2023/01/google-research-2022-beyond-language.html
-
https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-chess/
-
https://jdlm.info/articles/2018/03/18/markov-decision-process-2048.html
-
https://www.deepmind.com/publications/open-ended-learning-leads-to-generally-capable-agents
-
https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization
Link Bibliography
-
https://arxiv.org/abs/2301.04104#deepmind
: “DreamerV3: Mastering Diverse Domains through World Models”, Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap -
https://arxiv.org/abs/2210.10760#openai
: “Scaling Laws for Reward Model Overoptimization”, Leo Gao, John Schulman, Jacob Hilton -
https://arxiv.org/abs/2209.14500
: “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch -
https://arxiv.org/abs/2209.07550#deepmind
: “Human-level Atari 200× Faster”, Steven Kapturowski, Víctor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, Adrià Puigdomènech Badia -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2208.01448#amazon
: “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, -
https://arxiv.org/abs/2206.11795#openai
: “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, -
https://arxiv.org/abs/2205.15241#google
: “Multi-Game Decision Transformers”, -
https://arxiv.org/abs/2205.12393#facebook
: “CT0: Fine-tuned Language Models Are Continual Learners”, Thomas Scialom, Tuhin Chakrabarty, Smaranda Muresan -
https://arxiv.org/abs/2205.06175#deepmind
: “Gato: A Generalist Agent”, -
https://arxiv.org/abs/2204.03514#facebook
: “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, Ram Ramrakhya, Eric Undersander, Dhruv Batra, Abhishek Das -
https://arxiv.org/abs/2204.01691#google
: “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, -
https://arxiv.org/abs/2204.00598#google
: “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, -
clippy
: “It Looks Like You’re Trying To Take Over The World”, gwern -
https://arxiv.org/abs/2201.03544
: “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Alexander Pan, Kush Bhatia, Jacob Steinhardt -
https://openai.com/research/webgpt
: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman -
https://arxiv.org/abs/2112.09332#openai
: “WebGPT: Browser-assisted Question-answering With Human Feedback”, -
https://arxiv.org/abs/2111.09259#deepmind
: “Acquisition of Chess Knowledge in AlphaZero”, -
https://arxiv.org/abs/2111.01587#deepmind
: “Procedural Generalization by Planning With Self-Supervised World Models”, -
https://arxiv.org/abs/2109.10862#openai
: “Recursively Summarizing Books With Human Feedback”, Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano -
https://proceedings.mlr.press/v139/vicol21a.html
: “Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein -
https://arxiv.org/abs/2106.13281#google
: “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem -
https://arxiv.org/abs/2105.12196#deepmind
: “From Motor Control to Team Play in Simulated Humanoid Football”, -
https://www.sciencedirect.com/science/article/pii/S0004370221000862#deepmind
: “Reward Is Enough”, David Silver, Satinder Singh, Doina Precup, Richard S. Sutton -
https://arxiv.org/abs/2104.06294#deepmind
: “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, Julian Schrittwieser, Thomas Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver -
https://arxiv.org/abs/2104.06272#deepmind
: “Podracer Architectures for Scalable Reinforcement Learning”, Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt -
https://arxiv.org/abs/2012.05672#deepmind
: “Imitating Interactive Intelligence”, -
https://greydanus.github.io/2020/12/01/scaling-down/
: “Scaling down Deep Learning”, Sam Greydanus -
scaling-hypothesis
: “The Scaling Hypothesis”, gwern -
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
: “Agent57: Outperforming the Human Atari Benchmark”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell -
https://openreview.net/forum?id=SyxrxR4KPS#deepmind
: “Deep Neuroethology of a Virtual Rodent”, Josh Merel, Diego Aldarondo, Jesse Marshall, Yuval Tassa, Greg Wayne, Bence Olveczky (DM/Harvard) -
https://openai.com/research/procgen-benchmark
: “Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-to-use Procedurally-generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman -
https://arxiv.org/abs/1911.00357#facebook
: “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra -
2019-vinyals.pdf#deepmind
: “Grandmaster Level in StarCraft II Using Multi-agent Reinforcement Learning”, -
https://openai.com/research/emergent-tool-use#surprisingbehaviors
: “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, Igor Mordatch -
2019-jaderberg.pdf#deepmind
: “Human-level Performance in 3D Multiplayer Games With Population-based Reinforcement Learning”, -
https://arxiv.org/abs/1904.01201#facebook
: “Habitat: A Platform for Embodied AI Research”, -
https://openai.com/research/how-ai-training-scales
: “How AI Training Scales”, Sam McCandlish, Jared Kaplan, Dario Amodei -
https://openai.com/research/ai-and-compute
: “AI and Compute”, Dario Amodei, Danny Hernandez, Girish Sastry, Jack Clark, Greg Brockman, Ilya Sutskever -
idea
: “Research Ideas”, Gwern Branwen -
tool-ai
: “Why Tool AIs Want to Be Agent AIs”, gwern -
https://www.frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html
: “Robot Predictions Evolution”, Hans Moravec