“‘RL Scaling’ Tag”,2019-09-09 ():
![]()
Bibliography for tag
reinforcement-learning/scaling, most recent first: 12 related tags, 142 annotations, & 21 links (parent).
- See Also
- Gwern
- Links
- “Data Scaling Laws in Imitation Learning for Robotic Manipulation”, et al 2024
- “AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II”, 2024
- “MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering”, et al 2024
- “NAVIX: Scaling MiniGrid Environments With JAX”, et al 2024
- “JEST: Data Curation via Joint Example Selection Further Accelerates Multimodal Learning”, et al 2024
- “AI Search: The Bitter-Er Lesson”, 2024
- “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”, et al 2024
- “Simple and Scalable Strategies to Continually Pre-Train Large Language Models”, et al 2024
- “Robust Agents Learn Causal World Models”, 2024
- “Grandmaster-Level Chess Without Search”, et al 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, et al 2024
- “Vision-Language Models As a Source of Rewards”, et al 2023
- “JaxMARL: Multi-Agent RL Environments in JAX”, et al 2023
- “Diversifying AI: Towards Creative Chess With AlphaZero (AZdb)”, et al 2023
- “Deep RL at Scale: Sorting Waste in Office Buildings With a Fleet of Mobile Manipulators”, et al 2023
- “Emergence of Belief-Like Representations through Reinforcement Learning”, et al 2023
- “Scaling Laws for Single-Agent Reinforcement Learning”, et al 2023
- “DreamerV3: Mastering Diverse Domains through World Models”, et al 2023
- “Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes”, et al 2022
- “VeLO: Training Versatile Learned Optimizers by Scaling Up”, et al 2022
- “Broken Neural Scaling Laws”, et al 2022
- “Scaling Laws for Reward Model Overoptimization”, et al 2022
- “SAP: Bidirectional Language Models Are Also Few-Shot Learners”, et al 2022
- “
g.pt: Learning to Learn With Generative Models of Neural Network Checkpoints”, et al 2022- “Human-Level Atari 200× Faster”, et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, et al 2022
- “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”, et al 2022
- “TextWorldExpress: Simulating Text Games at One Million Steps Per Second”, Jansen & 2022
- “Demis Hassabis: DeepMind—AI, Superintelligence & the Future of Humanity § Turing Test”, 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, et al 2022
- “Multi-Game Decision Transformers”, et al 2022
- “Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, et al 2022
- “CT0: Fine-Tuned Language Models Are Continual Learners”, et al 2022
- “Flexible Diffusion Modeling of Long Videos”, et al 2022
- “Instruction Induction: From Few Examples to Natural Language Task Descriptions”, et al 2022
- “Gato: A Generalist Agent”, et al 2022
- “Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, et al 2022
- “Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale”, et al 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, et al 2022
- “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, et al 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, et al 2022
- “A Data-Driven Approach for Learning to Control Computers”, et al 2022
- “EvoJAX: Hardware-Accelerated Neuroevolution”, et al 2022
- “Accelerated Quality-Diversity for Robotics through Massive Parallelism”, et al 2022
- “Don’t Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning (ExORL)”, et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, et al 2022
- “In Defense of the Unitary Scalarization for Deep Multi-Task Learning”, et al 2022
- “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, et al 2022
- “WebGPT: Browser-Assisted Question-Answering With Human Feedback”, et al 2021
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, et al 2021
- “Acquisition of Chess Knowledge in AlphaZero”, et al 2021
- “AW-Opt: Learning Robotic Skills With Imitation and Reinforcement at Scale”, et al 2021
- “An Explanation of In-Context Learning As Implicit Bayesian Inference”, et al 2021
- “Procedural Generalization by Planning With Self-Supervised World Models”, et al 2021
- “MetaICL: Learning to Learn In Context”, et al 2021
- “Collaborating With Humans without Human Data”, et al 2021
- “T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, et al 2021
- “Bridge Data: Boosting Generalization of Robotic Skills With Cross-Domain Datasets”, et al 2021
- “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, et al 2021
- “Recursively Summarizing Books With Human Feedback”, et al 2021
- “FLAN: Finetuned Language Models Are Zero-Shot Learners”, et al 2021
- “Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation”, et al 2021
- “WarpDrive: Extremely Fast End-To-End Deep Multi-Agent Reinforcement Learning on a GPU”, et al 2021
- “Multi-Task Self-Training for Learning General Representations”, et al 2021
- “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning”, et al 2021
- “Open-Ended Learning Leads to Generally Capable Agents”, et al 2021
- “Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, et al 2021
- “Evaluating Large Language Models Trained on Code”, et al 2021
- “PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, et al 2021
- “Multimodal Few-Shot Learning With Frozen Language Models”, et al 2021
- “Brax—A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, et al 2021
- “PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World”, et al 2021
- “From Motor Control to Team Play in Simulated Humanoid Football”, et al 2021
- “Reward Is Enough”, et al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, et al 2021
- “MuZero Unplugged: Online and Offline Reinforcement Learning by Planning With a Learned Model”, et al 2021
- “Scaling Scaling Laws With Board Games”, 2021
- “Large Batch Simulation for Deep Reinforcement Learning”, et al 2021
- “Stockfish and Lc0, Test at Different Number of Nodes”, 2021
- “Training Larger Networks for Deep Reinforcement Learning”, et al 2021
- “Investment vs. Reward in a Competitive Knapsack Problem”, 2021
- “NNUE: The Neural Network of the Stockfish Chess Engine”, 2021
- “Imitating Interactive Intelligence”, et al 2020
- “Scaling down Deep Learning”, 2020
- “Understanding RL Vision: With Diverse Environments, We Can Analyze, Diagnose and Edit Deep Reinforcement Learning Models Using Attribution”, et al 2020
- “Meta-Trained Agents Implement Bayes-Optimal Agents”, et al 2020
- “Measuring Progress in Deep Reinforcement Learning Sample Efficiency”, 2020
- “Learning to Summarize from Human Feedback”, et al 2020
- “Measuring Hardware Overhang”, hippke 2020
- “Sample Factory: Egocentric 3D Control from Pixels at 100,000 FPS With Asynchronous Reinforcement Learning”, et al 2020
- “Real World Games Look Like Spinning Tops”, et al 2020
- “Agent57: Outperforming the Human Atari Benchmark”, et al 2020
- “Deep Neuroethology of a Virtual Rodent”, et al 2020
- “Near-Perfect Point-Goal Navigation from 2.5 Billion Frames of Experience”, 2020
- “Procgen Benchmark: We’re Releasing Procgen Benchmark, 16 Simple-To-Use Procedurally-Generated Environments Which Provide a Direct Measure of How Quickly a Reinforcement Learning Agent Learns Generalizable Skills”, et al 2019
- “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”, et al 2019
- “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning”, et al 2019
- “Solving Rubik’s Cube With a Robot Hand”, OpenAI et al 2019
- “Fine-Tuning Language Models from Human Preferences”, et al 2019
- “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, et al 2019
- “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior”, et al 2019
- “Meta Reinforcement Learning”, 2019
- “Human-Level Performance in 3D Multiplayer Games With Population-Based Reinforcement Learning”, et al 2019
- “AI-GAs: AI-Generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
- “Meta-Learning of Sequential Strategies”, et al 2019
- “Habitat: A Platform for Embodied AI Research”, et al 2019
- “The Bitter Lesson”, 2019
- “Benchmarking Classic and Learned Navigation in Complex 3D Environments”, et al 2019
- “Dota 2 With Large Scale Deep Reinforcement Learning: §4.3: Batch Size”, 2019 (page 13)
- “An Empirical Model of Large-Batch Training”, et al 2018
- “How AI Training Scales”, et al 2018
- “Bayesian Layers: A Module for Neural Network Uncertainty”, et al 2018
- “Quantifying Generalization in Reinforcement Learning”, et al 2018
- “One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, et al 2018
- “Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias”, et al 2018
- “Human-Level Performance in First-Person Multiplayer Games With Population-Based Deep Reinforcement Learning”, et al 2018
- “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”, et al 2018
- “Playing Atari With Six Neurons”, et al 2018
- “AI and Compute”, et al 2018
- “Accelerated Methods for Deep Reinforcement Learning”, 2018
- “One Big Net For Everything”, 2018
- “Interactive Grounded Language Acquisition and Generalization in a 2D World”, et al 2018
- “Emergence of Locomotion Behaviors in Rich Environments”, et al 2017
- “Deep Reinforcement Learning from Human Preferences”, et al 2017
- “Evolution Strategies As a Scalable Alternative to Reinforcement Learning”, et al 2017
- “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, 2015
- “Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning”, et al 2015
- “Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, et al 2015
- “Algorithmic Progress in Six Domains”, 2013
- “Robot Predictions Evolution”, 2004
- “When Will Computer Hardware Match the Human Brain?”, 1998
- “Human Window on the World”, 1985
- “Time for AI to Cross the Human Performance Range in Chess”
- “Eric Jang”
- “Trading Off Compute in Training and Inference”
- “Trading Off Compute in Training and Inference § MCTS Scaling”
- “Submission #6347: Chef Stef’s NES Arkanoid
warplessin 11:11.18”- “[The Addictiveness & Adversarialness of Playing against LeelaQueenOdds]”, 2024
- “Training a CUDA TDS Ant Using C++ ARS Linear Policy: The Video Is Real-Time, After a Few Minutes (in the 30 Million Steps) the Training Curve Is Flat (I Trained Until a Billion Steps). Note That This Ant Is PD Control, and Not Identical to Either MuJoCo or PyBullet Ant, so the Training Curves Are Not Comparable Yet. Will Fix That.”
- “Ilya Sutskever: Deep Learning | AI Podcast #94 With Lex Fridman”
- “Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Video]”
- “If You Want to Solve a Hard Problem in Reinforcement Learning, You Just Scale. It’s Just Gonna Work Just like Supervised Learning. It’s the Same, the Same Story Exactly. It Was Kind of Hard to Believe That Supervised Learning Can Do All Those Things, but It’s Not Just Vision, It’s Everything and the Same Thing Seems to Hold for Reinforcement Learning Provided You Have a Lot of Experience.”
- Wikipedia
- Miscellaneous
- Bibliography