‘MARL’ directory

Gwern

‘MARL’ directory

Gwern

“Evolution As Backstop for Reinforcement Learning ”, Gwern 2018

Evolution as Backstop for Reinforcement Learning

“Fashion Cycles ”, Gwern 2021

⁠Fashion Cycles

Links

“Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment ”

⁠⁠Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment⁠ :

View External Link:

⁠https://bair.berkeley.edu/blog/2025/03/25/rl-av-smoothing/⁠

“Training Language Models for Social Deduction With Multi-Agent Reinforcement Learning ”, Sarkar et al 2025

⁠Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning⁠

“Just How Many Robots Can One Person Control at Once? A DARPA Project Overturns Long-Standing Assumptions ”, Hampson 2025

⁠Just How Many Robots Can One Person Control at Once? A DARPA project overturns long-standing assumptions⁠

“Deployment of an Aerial Multi-Agent System for Automated Task Execution in Large-Scale Underground Mining Environments ”, Dahlquist et al 2025

Deployment of an Aerial Multi-agent System for Automated Task Execution in Large-scale Underground Mining Environments⁠

“Multiagent Finetuning: Self Improvement With Diverse Reasoning Chains ”, Subramaniam et al 2025

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains⁠

“Human-Like Bots for Tactical Shooters Using Compute-Efficient Sensors ”, Justesen et al 2024

Human-like Bots for Tactical Shooters Using Compute-Efficient Sensors⁠

“Learning to Move Like Professional Counter-Strike Players ”, Durst et al 2024

Learning to Move Like Professional Counter-Strike Players⁠

“On Scalable Oversight With Weak LLMs Judging Strong LLMs ”, Kenton et al 2024

On scalable oversight with weak LLMs judging strong LLMs⁠

“Foundational Challenges in Assuring Alignment and Safety of Large Language Models ”, Anwar et al 2024

Foundational Challenges in Assuring Alignment and Safety of Large Language Models⁠

“Algorithmic Collusion by Large Language Models ”, Fish et al 2024

Algorithmic Collusion by Large Language Models⁠

“Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL Algorithms to a 100 AV Highway Field Operational Test ”, Jang et al 2024

⁠Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test⁠

“Automatic Design of Stigmergy-Based Behaviors for Robot Swarms ”, Salman et al 2024

Automatic design of stigmergy-based behaviors for robot swarms⁠

“From Reinforcement Learning to Agency: Frameworks for Understanding Basal Cognition ”, Seifert et al 2024

From reinforcement learning to agency: Frameworks for understanding basal cognition⁠

“Classical Sorting Algorithms As a Model of Morphogenesis: Self-Sorting Arrays Reveal Unexpected Competencies in a Minimal Model of Basal Intelligence ”, Zhang et al 2023

Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence⁠

“PRER: Modeling Complex Mathematical Reasoning via Large Language Model Based MathAgent ”, Liao et al 2023

PRER: Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent⁠

“Generative Agent-Based Modeling With Actions Grounded in Physical, Social, or Digital Space Using Concordia ”, Vezhnevets et al 2023

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia⁠

“Learning Few-Shot Imitation As Cultural Transmission ”, Bhoopchand et al 2023

Learning few-shot imitation as cultural transmission⁠

“JaxMARL: Multi-Agent RL Environments in JAX ”, Rutherford et al 2023

JaxMARL: Multi-Agent RL Environments in JAX⁠

“Large Language Models Can Strategically Deceive Their Users When Put Under Pressure ”, Scheurer et al 2023

Large Language Models can Strategically Deceive their Users when Put Under Pressure⁠

“Neural MMO 2.0: A Massively Multi-Task Addition to Massively Multi-Agent Learning ”, Suárez et al 2023

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning⁠

“Let Models Speak Ciphers: Multiagent Debate through Embeddings ”, Pham et al 2023

Let Models Speak Ciphers: Multiagent Debate through Embeddings⁠

“AI Deception: A Survey of Examples, Risks, and Potential Solutions ”, Park et al 2023

AI Deception: A Survey of Examples, Risks, and Potential Solutions⁠

“Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”, Zahavy et al 2023

Diversifying AI: Towards Creative Chess with AlphaZero (AZ_db)⁠

“Can A Single Human Supervise A Swarm of 100 Heterogeneous Robots? ”, Adams et al 2023

⁠Can A Single Human Supervise A Swarm of 100 Heterogeneous Robots?⁠

“Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models ”, O’Gara 2023

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models⁠

“Combining Human Expertise With Artificial Intelligence: Experimental Evidence from Radiology ”, Agarwal et al 2023

Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology⁠

“Posterior Sampling for Multi-Agent Reinforcement Learning: Solving Extensive Games With Imperfect Information ”, Zhou et al 2023

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information⁠

“Reinforcement Learning in Newcomb-Like Environments ”, Bell et al 2023

Reinforcement Learning in Newcomb-like Environments⁠

“Learning Agile Soccer Skills for a Bipedal Robot With Deep Reinforcement Learning ”, Haarnoja et al 2023

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning⁠

“Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings With Humans and Models ”, Wei et al 2023

Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings with Humans and Models⁠

“Off-The-Grid MARL (OG-MARL): Datasets With Baselines for Offline Multi-Agent Reinforcement Learning ”, Formanek et al 2023

Off-the-Grid MARL (OG-MARL): Datasets with Baselines for Offline Multi-Agent Reinforcement Learning⁠

“Learning to Control and Coordinate Mixed Traffic Through Robot Vehicles at Complex and Unsignalized Intersections ”, Wang et al 2023

Learning to Control and Coordinate Mixed Traffic Through Robot Vehicles at Complex and Unsignalized Intersections⁠

“Melting Pot 2.0 ”, Agapiou et al 2022

Melting Pot 2.0⁠

“CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning ”, Bakhtin et al 2022

CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning⁠

“Over-Communicate No More: Situated RL Agents Learn Concise Communication Protocols ”, Kalinowska et al 2022

Over-communicate no more: Situated RL agents learn concise communication protocols⁠

“Human-AI Coordination via Human-Regularized Search and Learning ”, Hu et al 2022

Human-AI Coordination via Human-Regularized Search and Learning⁠

“Game Theoretic Rating in N-Player General-Sum Games With Equilibria ”, Marris et al 2022

Game Theoretic Rating in N-player general-sum games with Equilibria⁠

“Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning ”, Anonymous 2022

Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning⁠

“Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members ”, Cornelisse et al 2022

Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members⁠

“Social Simulacra: Creating Populated Prototypes for Social Computing Systems ”, Park et al 2022

Social Simulacra: Creating Populated Prototypes for Social Computing Systems⁠

“DeepNash: Mastering the Game of Stratego With Model-Free Multiagent Reinforcement Learning ”, Perolat et al 2022

DeepNash: Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning⁠

“Fleet-DAgger: Interactive Robot Fleet Learning With Scalable Human Supervision ”, Hoque et al 2022

Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision⁠

“Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning ”, Fu et al 2022

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning⁠

“MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem ”, Wen et al 2022

MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem⁠

“First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization ”, Reddy et al 2022

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization⁠

“Emergent Bartering Behavior in Multi-Agent Reinforcement Learning ”, Johanson et al 2022

Emergent Bartering Behavior in Multi-Agent Reinforcement Learning⁠

“NeuPL: Neural Population Learning ”, Liu et al 2022

NeuPL: Neural Population Learning⁠

“Uncalibrated Models Can Improve Human-AI Collaboration ”, Vodrahalli et al 2022

Uncalibrated Models Can Improve Human-AI Collaboration⁠

“Human-Centered Mechanism Design With Democratic AI ”, Koster et al 2022

Human-centered mechanism design with Democratic AI⁠

“Hidden Agenda: a Social Deduction Game With Diverse Learned Equilibria ”, Kopparapu et al 2022

Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria⁠

“Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning ”, Curry et al 2022

Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning⁠

“Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination ”, Zhao et al 2021

Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination⁠

“Modeling Strong and Human-Like Gameplay With KL-Regularized Search ”, Jacob et al 2021

Modeling Strong and Human-Like Gameplay with KL-Regularized Search⁠

“Offline Pre-Trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks ”, Meng et al 2021

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks⁠

“Player of Games ”, Schmid et al 2021

Player of Games⁠

“Collective Intelligence for Deep Learning: A Survey of Recent Developments ”, Ha & Tang 2021

Collective Intelligence for Deep Learning: A Survey of Recent Developments⁠

“Learning to Ground Multi-Agent Communication With Autoencoders ”, Lin et al 2021

Learning to Ground Multi-Agent Communication with Autoencoders⁠

“Meta-Learning, Social Cognition and Consciousness in Brains and Machines ”, Langdon et al 2021

Meta-learning, social cognition and consciousness in brains and machines⁠

“Collaborating With Humans without Human Data ”, Strouse et al 2021

Collaborating with Humans without Human Data⁠

“The Neural MMO Platform for Massively Multiagent Research ”, Suarez et al 2021

The Neural MMO Platform for Massively Multiagent Research⁠

“Replay-Guided Adversarial Environment Design ”, Jiang et al 2021

Replay-Guided Adversarial Environment Design⁠

“DORA: No-Press Diplomacy from Scratch ”, Bakhtin et al 2021

DORA: No-Press Diplomacy from Scratch⁠

“Embodied Intelligence via Learning and Evolution ”, Gupta et al 2021

Embodied intelligence via learning and evolution⁠

“Trust Region Policy Optimization in Multi-Agent Reinforcement Learning ”, Kuba et al 2021

Trust Region Policy Optimization in Multi-Agent Reinforcement Learning⁠

“WarpDrive: Extremely Fast End-To-End Deep Multi-Agent Reinforcement Learning on a GPU ”, Lan et al 2021

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU⁠

“The AI Economist: Optimal Economic Policy Design via Two-Level Deep Reinforcement Learning ”, Zheng et al 2021

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning⁠

“Open-Ended Learning Leads to Generally Capable Agents ”, Team et al 2021

Open-Ended Learning Leads to Generally Capable Agents⁠

“Megaverse: Simulating Embodied Agents at One Million Experiences per Second ”, Petrenko et al 2021

Megaverse: Simulating Embodied Agents at One Million Experiences per Second⁠

“Scalable Evaluation of Multi-Agent Reinforcement Learning With Melting Pot ”, Leibo et al 2021

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot⁠

“From Motor Control to Team Play in Simulated Humanoid Football ”, Liu et al 2021

From Motor Control to Team Play in Simulated Humanoid Football⁠

“Cooperative AI Foundation (CAIF) ”, CAIF 2021

Cooperative AI Foundation (CAIF)

“Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents ”, Alcorn & Nguyen 2021

baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents⁠

“Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments ”, Riviere et al 2021

Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments⁠

“Multitasking Inhibits Semantic Drift ”, Jacob et al 2021

Multitasking Inhibits Semantic Drift⁠

“Asymmetric Self-Play for Automatic Goal Discovery in Robotic Manipulation ”, OpenAI et al 2021

Asymmetric self-play for automatic goal discovery in robotic manipulation⁠

“Reinforcement Learning for Datacenter Congestion Control ”, Tessler et al 2021

Reinforcement Learning for Datacenter Congestion Control⁠

“Baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotemporal Modeling ”, Alcorn & Nguyen 2021

baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotemporal Modeling⁠

“UPDeT: Universal Multi-Agent Reinforcement Learning via Policy Decoupling With Transformers ”, Hu et al 2021

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers⁠

“Imitating Interactive Intelligence ”, Abramson et al 2020

Imitating Interactive Intelligence⁠

“Towards Playing Full MOBA Games With Deep Reinforcement Learning ”, Ye et al 2020

Towards Playing Full MOBA Games with Deep Reinforcement Learning⁠

“TLeague: A Framework for Competitive Self-Play Based Distributed Multi-Agent Reinforcement Learning ”, Sun et al 2020

TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning⁠

“Emergent Road Rules In Multi-Agent Driving Environments ”, Pal et al 2020

Emergent Road Rules In Multi-Agent Driving Environments⁠

“Reinforcement Learning for Optimization of COVID-19 Mitigation Policies ”, Kompella et al 2020

Reinforcement Learning for Optimization of COVID-19 Mitigation policies⁠

“Human-Level Performance in No-Press Diplomacy via Equilibrium Search ”, Gray et al 2020

Human-Level Performance in No-Press Diplomacy via Equilibrium Search⁠

“Emergent Social Learning via Multi-Agent Reinforcement Learning ”, Ndousse et al 2020

Emergent Social Learning via Multi-agent Reinforcement Learning⁠

“Grounded Language Learning Fast and Slow ”, Hill et al 2020

Grounded Language Learning Fast and Slow⁠

“ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games ”, Brown et al 2020

ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games⁠

“Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [Blog] ”, Chang & Kaushik 2020

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [blog]⁠

“One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control ”, Huang et al 2020

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control⁠

“Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions ”, Chang et al 2020

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions⁠

“Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks ”, Papoudakis et al 2020

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks⁠

“Learning to Play No-Press Diplomacy With Best Response Policy Iteration ”, Anthony et al 2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration⁠

“Real World Games Look Like Spinning Tops ”, Czarnecki et al 2020

Real World Games Look Like Spinning Tops⁠

“Approximate Exploitability: Learning a Best Response in Large Games ”, Timbers et al 2020

Approximate exploitability: Learning a best response in large games⁠

“Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and Their Solutions ”, Wang et al 2020

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions⁠

“Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning ”, McKee et al 2020

Social diversity and social preferences in mixed-motive reinforcement learning⁠

“Effective Diversity in Population Based Reinforcement Learning ”, Parker-Holder et al 2020

Effective Diversity in Population Based Reinforcement Learning⁠

“Towards Learning Multi-Agent Negotiations via Self-Play ”, Tang 2020

Towards Learning Multi-agent Negotiations via Self-Play⁠

“Smooth Markets: A Basic Mechanism for Organizing Gradient-Based Learners ”, Balduzzi et al 2020

Smooth markets: A basic mechanism for organizing gradient-based learners⁠

“MicrobatchGAN: Stimulating Diversity With Multi-Adversarial Discrimination ”, Mordido et al 2020

microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination⁠

“Learning by Cheating ”, Chen et al 2019

Learning by Cheating⁠

“Increasing Generality in Machine Learning through Procedural Content Generation ”, Risi & Togelius 2019

Increasing Generality in Machine Learning through Procedural Content Generation⁠

“Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms ”, Zhang et al 2019

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms⁠

“Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning ”, Vinyals et al 2019

Grandmaster level in StarCraft II using multi-agent reinforcement learning⁠

“Multiplayer AlphaZero ”, Petosa & Balch 2019

Multiplayer AlphaZero⁠

“Stabilizing Generative Adversarial Networks: A Survey ”, Wiatrak et al 2019

Stabilizing Generative Adversarial Networks: A Survey⁠

“Emergent Tool Use From Multi-Agent Autocurricula ”, Baker et al 2019

Emergent Tool Use From Multi-Agent Autocurricula⁠

“Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior ”, Baker et al 2019

Emergent Tool Use from Multi-Agent Interaction § Surprising behavior⁠

“Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior ”, Baker et al 2019

Emergent Tool Use from Multi-Agent Interaction § Surprising behavior⁠

“No Press Diplomacy: Modeling Multi-Agent Gameplay ”, Paquette et al 2019

No Press Diplomacy: Modeling Multi-Agent Gameplay⁠

“A Review of Cooperative Multi-Agent Deep Reinforcement Learning ”, OroojlooyJadid & Hajinezhad 2019

A Review of Cooperative Multi-Agent Deep Reinforcement Learning⁠

“Pluribus: Superhuman AI for Multiplayer Poker ”, Brown & Sandholm 2019

Pluribus: Superhuman AI for multiplayer poker⁠

“Evolving the Hearthstone Meta ”, Silva et al 2019

Evolving the Hearthstone Meta⁠

“Evolutionary Implementation of Bayesian Computations ”, Czégel et al 2019

Evolutionary implementation of Bayesian computations⁠

“Finding Friend and Foe in Multi-Agent Games ”, Serrino et al 2019

Finding Friend and Foe in Multi-Agent Games⁠

“Hierarchical Decision Making by Generating and Following Natural Language Instructions ”, Hu et al 2019

Hierarchical Decision Making by Generating and Following Natural Language Instructions⁠

“ICML 2019 Notes ”, Abel 2019

ICML 2019 Notes⁠

“Human-Level Performance in 3D Multiplayer Games With Population-Based Reinforcement Learning ”, Jaderberg et al 2019

Human-level performance in 3D multiplayer games with population-based reinforcement learning⁠

“AI-GAs: AI-Generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence ”, Clune 2019

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence⁠

“Adversarial Policies: Attacking Deep Reinforcement Learning ”, Gleave et al 2019

Adversarial Policies: Attacking Deep Reinforcement Learning⁠

“LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game ”, Urbanek et al 2019

LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game⁠

“Α-Rank: Multi-Agent Evaluation by Evolution ”, Omidshafiei et al 2019

α-Rank: Multi-Agent Evaluation by Evolution⁠

“Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research ”, Leibo et al 2019

Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research⁠

“Distilling Policy Distillation ”, Czarnecki et al 2019

Distilling Policy Distillation⁠

“Hierarchical Reinforcement Learning for Multi-Agent MOBA Game ”, Zhang et al 2019

Hierarchical Reinforcement Learning for Multi-agent MOBA Game⁠

“Open-Ended Learning in Symmetric Zero-Sum Games ”, Balduzzi et al 2019

Open-ended Learning in Symmetric Zero-sum Games⁠

“Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions ”, Wang et al 2019

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions⁠

“Hierarchical Macro Strategy Model for MOBA Game AI ”, Wu et al 2018

Hierarchical Macro Strategy Model for MOBA Game AI⁠

“Continual Match Based Training in Pommerman: Technical Report ”, Peng et al 2018

Continual Match Based Training in Pommerman: Technical Report⁠

“Malthusian Reinforcement Learning ”, Leibo et al 2018

Malthusian Reinforcement Learning⁠

“Stable Opponent Shaping in Differentiable Games ”, Letcher et al 2018

Stable Opponent Shaping in Differentiable Games⁠

“Deep Counterfactual Regret Minimization ”, Brown et al 2018

Deep Counterfactual Regret Minimization⁠

“TarMAC: Targeted Multi-Agent Communication ”, Das et al 2018

TarMAC: Targeted Multi-Agent Communication⁠

“Graph Convolutional Reinforcement Learning ”, Jiang et al 2018

Graph Convolutional Reinforcement Learning⁠

“Social Influence As Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning ”, Jaques et al 2018

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning⁠

“Deep Reinforcement Learning ”, Li 2018

Deep Reinforcement Learning⁠

“A Survey and Critique of Multiagent Deep Reinforcement Learning ”, Hernandez-Leal et al 2018

A Survey and Critique of Multiagent Deep Reinforcement Learning⁠

“Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation ”, Nogueira et al 2018

Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation⁠

“Pommerman: A Multi-Agent Playground ”, Resnick et al 2018

Pommerman: A Multi-Agent Playground⁠

“Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios ”, Fan et al 2018

Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios⁠

“Human-Level Performance in First-Person Multiplayer Games With Population-Based Deep Reinforcement Learning ”, Jaderberg et al 2018

Human-level performance in first-person multiplayer games with population-based deep reinforcement learning⁠

“Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory ”, Pavlogiannis et al 2018

Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory⁠

“Adaptive Mechanism Design: Learning to Promote Cooperation ”, Baumann et al 2018

Adaptive Mechanism Design: Learning to Promote Cooperation⁠

“Mix&Match—Agent Curricula for Reinforcement Learning ”, Czarnecki et al 2018

Mix&Match—Agent Curricula for Reinforcement Learning⁠

“Kickstarting Deep Reinforcement Learning ”, Schmitt et al 2018

Kickstarting Deep Reinforcement Learning⁠

“Machine Theory of Mind ”, Rabinowitz et al 2018

Machine Theory of Mind⁠

“Sim-To-Real Optimization of Complex Real World Mobile Network With Imperfect Information via Deep Reinforcement Learning from Self-Play ”, Tan et al 2018

Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play⁠

“Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning ”, Chen et al 2018

Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning⁠

“Emergent Complexity via Multi-Agent Competition ”, Bansal et al 2017

Emergent Complexity via Multi-Agent Competition⁠

“Learning With Opponent-Learning Awareness ”, Foerster et al 2017

Learning with Opponent-Learning Awareness⁠

“LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions ”, Wang et al 2017

LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions⁠

“CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms ”, Elgammal et al 2017

CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms⁠

“On Convergence and Stability of GANs ”, Kodali et al 2017

On Convergence and Stability of GANs⁠

“Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning ”, Das et al 2017

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning⁠

“Supervision via Competition: Robot Adversaries for Learning Tasks ”, Pinto et al 2016

Supervision via Competition: Robot Adversaries for Learning Tasks⁠

“Cooperative Inverse Reinforcement Learning ”, Hadfield-Menell et al 2016

Cooperative Inverse Reinforcement Learning⁠

“Policy Distillation ”, Rusu et al 2015

Policy Distillation⁠

“Reflective Oracles: A Foundation for Classical Game Theory ”, Fallenstein et al 2015

Reflective Oracles: A Foundation for Classical Game Theory⁠

“Homo Moralis-Preference Evolution Under Incomplete Information and Assortative Matching ”, Alger & Weibull 2013

Homo Moralis-Preference Evolution Under Incomplete Information and Assortative Matching⁠

“A Self-Coordinating Bus Route to Resist Bus Bunching ”, III & Eisenstein 2012

A self-coordinating bus route to resist bus bunching⁠

“Language Evolution in the Laboratory ”, Scott-Phillips & Kirby 2010

Language evolution in the laboratory⁠

“If Multi-Agent Learning Is the Answer, What Is the Question? ”, Shoham et al 2007

If multi-agent learning is the answer, what is the question?⁠

“Market-Based Reinforcement Learning in Partially Observable Worlds ”, Kwee et al 2001

Market-Based Reinforcement Learning in Partially Observable Worlds⁠

“Properties of the Bucket Brigade Algorithm ”, Holland 1985

Properties of the Bucket Brigade Algorithm⁠

“Computer-Aided Gas Pipeline Operation Using Genetic Algorithms And Rule Learning ”, Goldberg 1983

Computer-Aided Gas Pipeline Operation Using Genetic Algorithms And Rule Learning⁠

“Collaborating With Humans Requires Understanding Them ”

⁠Collaborating with Humans Requires Understanding Them⁠ :

View External Link:

⁠https://bair.berkeley.edu/blog/2019/10/21/coordination/⁠

“The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games ”

⁠The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games⁠ :

View External Link:

⁠https://bair.berkeley.edu/blog/2021/07/14/mappo/⁠

“Generally Capable Agents Emerge from Open-Ended Play ”

⁠Generally capable agents emerge from open-ended play⁠ :

View HTML:

⁠/doc/www/deepmind.google/6c572f51d49224648a52a8421933f0db04170ce1.html⁠

“Learning to Ground Multi-Agent Communication With Autoencoders [Code] ”

⁠⁠Learning to Ground Multi-Agent Communication with Autoencoders [code]⁠ :

View HTML:

⁠/doc/www/github.com/d49d5bc015234d96eee19bfbc0032358e9fa7770.html⁠

“`elimination_game`: A Multi-Player Tournament Benchmark That Tests LLMs in Social Reasoning, Strategy, and Deception. Players Engage in Public and Private Conversations, Form Alliances, and Vote to Eliminate Each Other ”, Mazur 2025

⁠elimination_game: A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other⁠

“One Writer Enters International Competition to Play the World-Conquering Game That Redefines What It Means to Be a Geek (And a Person) ”

One writer enters international competition to play the world-conquering game that redefines what it means to be a geek (and a person)

“Mimicking Evolution With Reinforcement Learning ”

Mimicking Evolution with Reinforcement Learning

“LLM Powered Autonomous Agents ”

⁠LLM Powered Autonomous Agents

“Thore Graepel ”, Graepel 2025

⁠Thore Graepel :

View HTML:

⁠/doc/www/thoregraepel.github.io/6ae01c8317a3db0d16a6516bd43e331758434a03.html⁠

“Learning to Ground Multi-Agent Communication With Autoencoders [Homepage] ”

⁠⁠Learning to Ground Multi-Agent Communication with Autoencoders [homepage] :

View External Link:

⁠https://toruowo.github.io/marl-ae-comm/

“Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning ”

⁠Efficient large-scale fleet management via multi-agent deep reinforcement learning⁠ :

View External Link:

⁠https://web.archive.org/web/20201107225524/https://blog.acolyer.org/2019/03/04/efficient-large-scale-fleet-management-via-multi-agent-deep-reinforcement-learning/⁠

“The Pommerman Team Competition Or: How We Learned to Stop Worrying and Love the Battle ”

The Pommerman team competition or: How we learned to stop worrying and love the battle

“New Winning Strategies for the Iterated Prisoner’s Dilemma ”

⁠New Winning Strategies for the Iterated Prisoner’s Dilemma :

View HTML:

⁠/doc/www/www.jasss.org/24c2e00384aa8b0d150e0cc7d1d732d345df6de3.html⁠

“How DeepMind’s Generally Capable Agents Were Trained ”

⁠How DeepMind’s Generally Capable Agents Were Trained⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/DreKBuMvK7fdESmSJ/how-deepmind-s-generally-capable-agents-were-trained⁠

“How Much Compute Was Used to Train DeepMind’s Generally Capable Agents? ”

⁠How much compute was used to train DeepMind’s generally capable agents?⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/KaPaTdpLggdMqzdyo/how-much-compute-was-used-to-train-deepmind-s-generally⁠

“DeepMind: Generally Capable Agents Emerge from Open-Ended Play ”

⁠DeepMind: Generally capable agents emerge from open-ended play⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/mTGrrX8SZJ2tQDuqz/deepmind-generally-capable-agents-emerge-from-open-ended⁠

“So Has AI Conquered Bridge? ”

⁠So has AI conquered Bridge?⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/yHxmJch8dJoH6dwwz/so-has-ai-conquered-bridge⁠

“The Steely, Headless King of Texas Hold’Em ”

The Steely, Headless King of Texas Hold’Em⁠

“Artificial Intelligence Beats Eight World Champions at Bridge ”

Artificial intelligence beats eight world champions at bridge⁠

“Learning to Ground Multi-Agent Communication With Autoencoders [Video] ”

⁠⁠Learning to Ground Multi-Agent Communication with Autoencoders [video]⁠ :

⁠https://www.youtube.com/watch?v=0Fyf5Xca94c⁠

“Open-Ended Learning Leads to Generally Capable Agents [Video] ”

⁠Open-Ended Learning Leads to Generally Capable Agents [video]⁠ :

⁠https://www.youtube.com/watch?v=lTmL7jwFfdw#deepmind⁠

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2312.08926: “PRER: Modeling Complex Mathematical Reasoning via Large Language Model Based MathAgent ”⁠, Haoran Liao, Qinyi Du, Shaohua Hu …, Hao He, Yanyan Xu, Jidong Tian, Yaohui Jin
link-bibliography⁠
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission ”⁠, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister …, Agustin Dal Lago, Ashley Edwards, Richard Everett⁠, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh⁠, Alexander Zacherl, Lei M. Zhang
link-bibliography⁠
https://arxiv.org/abs/2311.10090: “JaxMARL: Multi-Agent RL Environments in JAX ”⁠, Alexander Rutherford⁠, Benjamin Ellis, Matteo Gallici …, Jonathan Cook⁠, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu⁠, Jakob Nicolaus Foerster
link-bibliography⁠
https://arxiv.org/abs/2311.03736: “Neural MMO 2.0: A Massively Multi-Task Addition to Massively Multi-Agent Learning ”⁠, Joseph Suárez, Phillip Isola⁠, Kyoung Whan Choe …, David Bloomin, Hao Xiang Li, Nikhil Pinnaparaju, Nishaanth Kanna, Daniel Scott, Ryan Sullivan⁠, Rose S. Shuman, Lucas de Alcântara, Herbie Bradley, Louis Castricato, Kirsty You, Yuhao Jiang, Qimai Li, Jiaxin Chen, Xiaolong Zhu
link-bibliography⁠
https://arxiv.org/abs/2308.09175#deepmind: “Diversifying AI: Towards Creative Chess With AlphaZero (AZ_db) ”⁠, Tom Zahavy, Vivek Veeriah, Shaobo Hou …, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut⁠, Demis Hassabis⁠, Satinder Singh⁠
link-bibliography⁠
https://arxiv.org/abs/2308.01404: “Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models ”⁠, Aidan O’Gara
link-bibliography⁠
https://www.nber.org/papers/w31422: “Combining Human Expertise With Artificial Intelligence: Experimental Evidence from Radiology ”⁠, Nikhil Agarwal, Alex Moehring, Pranav Rajpurkar, Tobias Salz
link-bibliography⁠
https://arxiv.org/abs/2304.13653#deepmind: “Learning Agile Soccer Skills for a Bipedal Robot With Deep Reinforcement Learning ”⁠, Tuomas Haarnoja, Ben Moran, Guy Lever⁠ …, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber⁠, Nicole Hurley, Francesco Nori⁠, Raia Hadsell, Nicolas Heess⁠
link-bibliography⁠
2022-bakhtin.pdf: “CICERO: Human-Level Play in the Game of Diplomacy by Combining Language Models With Strategic Reasoning ”⁠, Anton Bakhtin, Noam Brown⁠, Emily Dinan …, Gabriele Farina, Colin Flaherty⁠, Daniel Fried⁠, Andrew Goff⁠, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, ⁠Adam Lerer, Mike Lewis⁠, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu⁠, Hugh Zhang, Markus Zijlstra
link-bibliography⁠
https://openreview.net/forum?id=DY1pMrmDkm: “Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning ”⁠, Anonymous
link-bibliography⁠
https://arxiv.org/abs/2208.04024: “Social Simulacra: Creating Populated Prototypes for Social Computing Systems ”⁠, Joon Sung Park, Lindsay Popowski, Carrie J. Cai …, Meredith Ringel Morris⁠, ⁠Percy Liang⁠, ⁠Michael S. Bernstein
link-bibliography⁠
https://arxiv.org/abs/2206.15378#deepmind: “DeepNash: Mastering the Game of Stratego With Model-Free Multiagent Reinforcement Learning ”⁠, Julien Perolat, Bart de Vylder, Daniel Hennes …, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles⁠, Mark Rowland⁠, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre⁠, Nathalie Beauguerlange, Remi Munos, David Silver⁠, Satinder Singh⁠, Demis Hassabis⁠, Karl Tuyls
link-bibliography⁠
https://arxiv.org/abs/2206.14349: “Fleet-DAgger: Interactive Robot Fleet Learning With Scalable Human Supervision ”⁠, Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma …, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel⁠, Ken Goldberg
link-bibliography⁠
https://arxiv.org/abs/2206.07505: “Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning ”⁠, Wei Fu⁠, Chao Yu, Zelai Xu …, Jiaqi Yang, Yi Wu⁠
link-bibliography⁠
https://arxiv.org/abs/2205.14953: “MAT: Multi-Agent Reinforcement Learning Is a Sequence Modeling Problem ”⁠, Muning Wen, Jakub Grudzien Kuba, Runji Lin …, Weinan Zhang, Ying Wen, Jun Wang⁠, Yaodong Yang
link-bibliography⁠
https://arxiv.org/abs/2202.07415#deepmind: “NeuPL: Neural Population Learning ”⁠, Siqi Liu, Luke Marris, Daniel Hennes …, Josh Merel, Nicolas Heess⁠, ⁠Thore Graepel
link-bibliography⁠
https://arxiv.org/abs/2112.11701#tencent: “Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination ”⁠, Rui Zhao, Jinming Song, Hu Haifeng⁠ …, Yang Gao⁠, Yi Wu⁠, Zhongqian Sun, Yang Wei
link-bibliography⁠
https://arxiv.org/abs/2112.03178#deepmind: “Player of Games ”⁠, Martin Schmid, Matej Moravcik, Neil Burch …, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, Zach Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling⁠
link-bibliography⁠
https://arxiv.org/abs/2110.15349: “Learning to Ground Multi-Agent Communication With Autoencoders ”⁠, Toru Lin, Minyoung Huh, Chris Stauffer …, Ser-Nam Lim, Phillip Isola⁠
link-bibliography⁠
https://arxiv.org/abs/2105.12196#deepmind: “From Motor Control to Team Play in Simulated Humanoid Football ”⁠, Siqi Liu, Guy Lever⁠, Zhe Wang …, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, ⁠Thore Graepel, Nicolas Heess⁠
link-bibliography⁠
https://arxiv.org/abs/2104.11980: “Baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents ”⁠, Michael A. Alcorn, Anh Nguyen
link-bibliography⁠
https://arxiv.org/abs/2012.05672#deepmind: “Imitating Interactive Intelligence ”⁠, Josh Abramson⁠, Arun Ahuja, Arthur Brussee …, Federico Carnevale, Mary Cassin, Stephen Clark, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, ⁠Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap⁠, Kory Mathewson, Alistair Muldal, Adam Santoro⁠, Nikolay Savinov, Vikrant Varma, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu
link-bibliography⁠
https://arxiv.org/abs/2011.12692#tencent: “Towards Playing Full MOBA Games With Deep Reinforcement Learning ”⁠, Deheng Ye, Guibin Chen, Wen Zhang …, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi⁠, Liang Wang, Tengfei Shi⁠, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu
link-bibliography⁠
https://arxiv.org/abs/2011.12895#tencent: “TLeague: A Framework for Competitive Self-Play Based Distributed Multi-Agent Reinforcement Learning ”⁠, Peng Sun⁠, Jiechao Xiong, Lei Han⁠ …, Xinghai Sun, Shuxing Li, Jiawei Xu, Meng Fang, Zhengyou Zhang⁠
link-bibliography⁠
https://bair.berkeley.edu/blog/2020/07/11/auction/: “Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [Blog] ”⁠, Michael Chang⁠, Sidhant Kaushik
link-bibliography⁠
2019-vinyals.pdf#deepmind: “Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning ”⁠, Oriol Vinyals⁠, Igor Babuschkin, Wojciech M. Czarnecki …, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds⁠, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang⁠, Laurent Sifre⁠, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy⁠, Tom L. Paine, ⁠Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, ⁠Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap⁠, Koray Kavukcuoglu⁠, Demis Hassabis⁠, Chris Apps⁠, David Silver⁠
link-bibliography⁠
https://openai.com/research/emergent-tool-use#surprisingbehaviors: “Emergent Tool Use from Multi-Agent Interaction § Surprising Behavior ”⁠, Bowen Baker, Ingmar Kanitscheider, Todor Markov …, Yi Wu⁠, Glenn Powell, Bob McGrew⁠, Igor Mordatch⁠
link-bibliography⁠
https://david-abel.github.io/notes/icml_2019.pdf: “ICML 2019 Notes ”⁠, David Abel
link-bibliography⁠
2019-jaderberg.pdf#deepmind: “Human-Level Performance in 3D Multiplayer Games With Population-Based Reinforcement Learning ”⁠, Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning …, Luke Marris, Guy Lever⁠, Antonio Garcia Castañeda, Charles Beattie⁠, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver⁠, Demis Hassabis⁠, Koray Kavukcuoglu⁠, ⁠Thore Graepel
link-bibliography⁠
https://arxiv.org/abs/1902.02186#deepmind: “Distilling Policy Distillation ”⁠, Wojciech Marian Czarnecki, ⁠Razvan Pascanu⁠, Simon Osindero …, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg
link-bibliography⁠
https://www.nature.com/articles/s42003-018-0078-7: “Construction of Arbitrarily Strong Amplifiers of Natural Selection Using Evolutionary Graph Theory ”⁠, Andreas Pavlogiannis, Josef Tkadlec, Krishnendu Chatterjee⁠, Martin A. Nowak⁠
link-bibliography⁠
2013-alger.pdf: “Homo Moralis-Preference Evolution Under Incomplete Information and Assortative Matching ”⁠, Ingela Alger, Jörgen W. Weibull
link-bibliography⁠
2007-shoham.pdf: “If Multi-Agent Learning Is the Answer, What Is the Question? ”⁠, Yoav Shoham⁠, Rob Powers, Trond Grenager
link-bibliography⁠