Data Scaling Laws in Imitation Learning for Robotic Manipulation
AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
JEST: Data curation via joint example selection further accelerates multimodal learning
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
Emergence of belief-like representations through reinforcement learning
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
SAP: Bidirectional Language Models Are Also Few-shot Learners
g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
InstructGPT: Training language models to follow instructions with human feedback
Accelerated Quality-Diversity for Robotics through Massive Parallelism
Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)
In Defense of the Unitary Scalarization for Deep Multi-Task Learning
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
WebGPT: Browser-assisted question-answering with human feedback
WebGPT: Improving the factual accuracy of language models through web browsing
AW-Opt: Learning Robotic Skills with Imitation and Reinforcement at Scale
An Explanation of In-context Learning as Implicit Bayesian Inference
Procedural Generalization by Planning with Self-Supervised World Models
T0: Multitask Prompted Training Enables Zero-Shot Task Generalization
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
Multi-Task Self-Training for Learning General Representations
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
Megaverse: Simulating Embodied Agents at One Million Experiences per Second
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
From Motor Control to Team Play in Simulated Humanoid Football
Podracer architectures for scalable Reinforcement Learning
MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model
Understanding RL Vision: With diverse environments, we can analyze, diagnose and edit deep reinforcement learning models using attribution
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS with Asynchronous Reinforcement Learning
Near-perfect point-goal navigation from 2.5 billion frames of experience
Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
Grandmaster level in StarCraft II using multi-agent reinforcement learning
Emergent Tool Use from Multi-Agent Interaction § Surprising behavior
Human-level performance in 3D multiplayer games with population-based reinforcement learning
AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
Benchmarking Classic and Learned Navigation in Complex 3D Environments
Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size
One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL
Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
Interactive Grounded Language Acquisition and Generalization in a 2D World
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
Gorila: Massively Parallel Methods for Deep Reinforcement Learning
Trading Off Compute in Training and Inference § MCTS Scaling
Submission #6347: Chef Stef’s NES Arkanoid warpless in 11:11.18
[The Addictiveness & Adversarialness of Playing against LeelaQueenOdds]
a702591f3bdf9664f2ea597b2f974e91263364db.html#rJGjyq4j9xNoGyucw
Training a CUDA TDS Ant Using C++ ARS Linear Policy: The Video Is Real-Time, After a Few Minutes (in the 30 Million Steps) the Training Curve Is Flat (I Trained Until a Billion Steps). Note That This Ant Is PD Control, and Not Identical to Either MuJoCo or PyBullet Ant, so the Training Curves Are Not Comparable Yet. Will Fix That.
Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman
Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Video]
If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It's Just Gonna Work Just like Supervised Learning. It's the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It's Not Just Vision, It's Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.
2018-mccandlish-openai-howaitrainingscales-gradientnoisescale-summary3-scalevsbatchsize.jpg
https://clemenswinter.com/2021/03/24/mastering-real-time-strategy-games-with-deep-reinforcement-learning-mere-mortal-edition/
https://community.arm.com/arm-community-blogs/b/high-performance-computing-blog/posts/deep-learning-episode-4-supercomputer-vs-pong-ii
https://jdlm.info/articles/2018/03/18/markov-decision-process-2048.html
https://research.google/blog/google-research-2022-beyond-language-vision-and-generative-models/
https://www.anthropic.com/index/anthropics-responsible-scaling-policy
https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization
https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization?commentId=wMerfGZfPHerdzDAi
https://www.lesswrong.com/posts/bwyKCQD7PFWKhELMr/by-default-gpts-think-in-plain-sight#zfzHshctWZYo8JkLe
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
https%253A%252F%252Farxiv.org%252Fabs%252F2410.07095%2523openai.html
https%253A%252F%252Fyellow-apartment-148.notion.site%252FAI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d.html
https%253A%252F%252Farxiv.org%252Fabs%252F2402.04494%2523deepmind.html
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
https%253A%252F%252Farxiv.org%252Fabs%252F2401.05566%2523anthropic.html
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2301.04104%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.10760%2523openai.html
SAP: Bidirectional Language Models Are Also Few-shot Learners
g.pt: Learning to Learn with Generative Models of Neural Network Checkpoints
https%253A%252F%252Farxiv.org%252Fabs%252F2209.07550%2523deepmind.html
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
https%253A%252F%252Fwww.anthropic.com%252Fred_teaming.pdf.html
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
https%253A%252F%252Farxiv.org%252Fabs%252F2208.01448%2523amazon.html
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2206.11795%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.15241%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.06175%2523deepmind.html
Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
https%253A%252F%252Farxiv.org%252Fabs%252F2204.03514%2523facebook.html
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
https%253A%252F%252Farxiv.org%252Fabs%252F2204.01691%2523google.html
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
https%253A%252F%252Farxiv.org%252Fabs%252F2204.00598%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.05008%2523google.html
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
WebGPT: Browser-assisted question-answering with human feedback
https%253A%252F%252Farxiv.org%252Fabs%252F2112.09332%2523openai.html
WebGPT: Improving the factual accuracy of language models through web browsing
https%253A%252F%252Fopenai.com%252Fresearch%252Fwebgpt.html
https%253A%252F%252Farxiv.org%252Fabs%252F2111.09259%2523deepmind.html
Procedural Generalization by Planning with Self-Supervised World Models
https%253A%252F%252Farxiv.org%252Fabs%252F2111.01587%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2109.10862%2523openai.html
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
https%253A%252F%252Fproceedings.mlr.press%252Fv139%252Fvicol21a.html.html
Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation
https%253A%252F%252Farxiv.org%252Fabs%252F2106.13281%2523google.html
From Motor Control to Team Play in Simulated Humanoid Football
https%253A%252F%252Farxiv.org%252Fabs%252F2105.12196%2523deepmind.html
https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html
Podracer architectures for scalable Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html
MuZero Unplugged: Online and Offline Reinforcement Learning by Planning with a Learned Model
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06294%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F2012.05672%2523deepmind.html
https%253A%252F%252Fgreydanus.github.io%252F2020%252F12%252F01%252Fscaling-down%252F.html
Measuring Progress in Deep Reinforcement Learning Sample Efficiency
https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DSyxrxR4KPS%2523deepmind.html
Procgen Benchmark: We’re releasing Procgen Benchmark, 16 simple-to-use procedurally-generated environments which provide a direct measure of how quickly a reinforcement learning agent learns generalizable skills
https%253A%252F%252Fopenai.com%252Fresearch%252Fprocgen-benchmark.html
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
https%253A%252F%252Farxiv.org%252Fabs%252F1911.00357%2523facebook.html
Grandmaster level in StarCraft II using multi-agent reinforcement learning
%252Fdoc%252Freinforcement-learning%252Fmodel-free%252Falphastar%252F2019-vinyals.pdf%2523deepmind.html
Emergent Tool Use from Multi-Agent Interaction § Surprising behavior
https%253A%252F%252Fopenai.com%252Fresearch%252Femergent-tool-use%2523surprisingbehaviors.html
Human-level performance in 3D multiplayer games with population-based reinforcement learning
%252Fdoc%252Freinforcement-learning%252Fexploration%252F2019-jaderberg.pdf%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F1904.01201%2523facebook.html
https%253A%252F%252Fopenai.com%252Fresearch%252Fhow-ai-training-scales.html
https%253A%252F%252Fopenai.com%252Fresearch%252Fai-and-compute.html
https%253A%252F%252Fweb.archive.org%252Fweb%252F20230718144747%252Fhttps%253A%252F%252Ffrc.ri.cmu.edu%252F~hpm%252Fproject.archive%252Frobot.papers%252F2004%252FPredictions.html.html
https%253A%252F%252Fjetpress.org%252Fvolume1%252Fmoravec.htm.html
Wikipedia Bibliography: