Bibliography:

  1. ‘RL’ tag

  2. ‘GAN’ tag

  3. ‘Sydney (AI)’ tag

  4. ‘AI scaling’ tag

  5. ‘mechanism design’ tag

  6. ‘RL exploration’ tag

  7. Hanabi AI’ tag

  8. ‘hidden-information game’ tag

  9. ‘AlphaStar’ tag

  10. ‘OA5’ tag

  11. ‘AlphaGo’ tag

  12. ‘offline RL’ tag

  13. ‘robotics’ tag

  14. Evolution as Backstop for Reinforcement Learning

  15. Fashion Cycles

  16. On scalable oversight with weak LLMs judging strong LLMs

  17. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

  18. Algorithmic Collusion by Large Language Models

  19. From reinforcement learning to agency: Frameworks for understanding basal cognition

  20. Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligence

  21. PRER: Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

  22. Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

  23. Learning few-shot imitation as cultural transmission

  24. JaxMARL: Multi-Agent RL Environments in JAX

  25. Large Language Models can Strategically Deceive their Users when Put Under Pressure

  26. Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

  27. Let Models Speak Ciphers: Multiagent Debate through Embeddings

  28. AI Deception: A Survey of Examples, Risks, and Potential Solutions

  29. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  30. Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

  31. Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology

  32. Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

  33. Reinforcement Learning in Newcomb-like Environments

  34. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  35. Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings with Humans and Models

  36. Off-the-Grid MARL (OG-MARL): Datasets with Baselines for Offline Multi-Agent Reinforcement Learning

  37. Learning to Control and Coordinate Mixed Traffic Through Robot Vehicles at Complex and Unsignalized Intersections

  38. Melting Pot 2.0

  39. CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning

  40. Over-communicate no more: Situated RL agents learn concise communication protocols

  41. Human-AI Coordination via Human-Regularized Search and Learning

  42. Game Theoretic Rating in N-player general-sum games with Equilibria

  43. Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning

  44. Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members

  45. Social Simulacra: Creating Populated Prototypes for Social Computing Systems

  46. DeepNash: Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

  47. Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

  48. Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

  49. MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

  50. First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

  51. Emergent Bartering Behavior in Multi-Agent Reinforcement Learning

  52. NeuPL: Neural Population Learning

  53. Uncalibrated Models Can Improve Human-AI Collaboration

  54. Human-centered mechanism design with Democratic AI

  55. Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

  56. Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning

  57. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

  58. Modeling Strong and Human-Like Gameplay with KL-Regularized Search

  59. Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

  60. Player of Games

  61. Collective Intelligence for Deep Learning: A Survey of Recent Developments

  62. Learning to Ground Multi-Agent Communication with Autoencoders

  63. Meta-learning, social cognition and consciousness in brains and machines

  64. Collaborating with Humans without Human Data

  65. The Neural MMO Platform for Massively Multiagent Research

  66. Replay-Guided Adversarial Environment Design

  67. DORA: No-Press Diplomacy from Scratch

  68. Embodied intelligence via learning and evolution

  69. Trust Region Policy Optimization in Multi-Agent Reinforcement Learning

  70. WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

  71. The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning

  72. Open-Ended Learning Leads to Generally Capable Agents

  73. Megaverse: Simulating Embodied Agents at One Million Experiences per Second

  74. Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

  75. From Motor Control to Team Play in Simulated Humanoid Football

  76. Cooperative AI Foundation (CAIF)

  77. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents

  78. Neural Tree Expansion for Multi-Robot Planning in Non-Cooperative Environments

  79. Multitasking Inhibits Semantic Drift

  80. Asymmetric self-play for automatic goal discovery in robotic manipulation

  81. Reinforcement Learning for Datacenter Congestion Control

  82. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotemporal Modeling

  83. UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers

  84. Imitating Interactive Intelligence

  85. Towards Playing Full MOBA Games with Deep Reinforcement Learning

  86. TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning

  87. Emergent Road Rules In Multi-Agent Driving Environments

  88. Reinforcement Learning for Optimization of COVID-19 Mitigation policies

  89. Human-Level Performance in No-Press Diplomacy via Equilibrium Search

  90. Emergent Social Learning via Multi-agent Reinforcement Learning

  91. Grounded Language Learning Fast and Slow

  92. ReBeL: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

  93. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [blog]

  94. One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

  95. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

  96. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

  97. Learning to Play No-Press Diplomacy with Best Response Policy Iteration

  98. Real World Games Look Like Spinning Tops

  99. Approximate exploitability: Learning a best response in large games

  100. Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

  101. Social diversity and social preferences in mixed-motive reinforcement learning

  102. Effective Diversity in Population Based Reinforcement Learning

  103. Towards Learning Multi-agent Negotiations via Self-Play

  104. Smooth markets: A basic mechanism for organizing gradient-based learners

  105. microbatchGAN: Stimulating Diversity with Multi-Adversarial Discrimination

  106. Learning by Cheating

  107. Increasing Generality in Machine Learning through Procedural Content Generation

  108. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

  109. Grandmaster level in StarCraft II using multi-agent reinforcement learning

  110. Multiplayer AlphaZero

  111. Stabilizing Generative Adversarial Networks: A Survey

  112. Emergent Tool Use From Multi-Agent Autocurricula

  113. Emergent Tool Use from Multi-Agent Interaction § Surprising behavior

  114. No Press Diplomacy: Modeling Multi-Agent Gameplay

  115. A Review of Cooperative Multi-Agent Deep Reinforcement Learning

  116. Pluribus: Superhuman AI for multiplayer poker

  117. Evolving the Hearthstone Meta

  118. Evolutionary implementation of Bayesian computations

  119. Finding Friend and Foe in Multi-Agent Games

  120. Hierarchical Decision Making by Generating and Following Natural Language Instructions

  121. ICML 2019 Notes

  122. Human-level performance in 3D multiplayer games with population-based reinforcement learning

  123. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

  124. Adversarial Policies: Attacking Deep Reinforcement Learning

  125. LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game

  126. α-Rank: Multi-Agent Evaluation by Evolution

  127. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

  128. Distilling Policy Distillation

  129. Hierarchical Reinforcement Learning for Multi-agent MOBA Game

  130. Open-ended Learning in Symmetric Zero-sum Games

  131. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

  132. Hierarchical Macro Strategy Model for MOBA Game AI

  133. Continual Match Based Training in Pommerman: Technical Report

  134. Malthusian Reinforcement Learning

  135. Stable Opponent Shaping in Differentiable Games

  136. Deep Counterfactual Regret Minimization

  137. TarMAC: Targeted Multi-Agent Communication

  138. Graph Convolutional Reinforcement Learning

  139. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

  140. Deep Reinforcement Learning

  141. A Survey and Critique of Multiagent Deep Reinforcement Learning

  142. Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation

  143. Pommerman: A Multi-Agent Playground

  144. Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios

  145. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

  146. Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory

  147. Adaptive Mechanism Design: Learning to Promote Cooperation

  148. Mix&Match—Agent Curricula for Reinforcement Learning

  149. Kickstarting Deep Reinforcement Learning

  150. Machine Theory of Mind

  151. Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

  152. Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

  153. Emergent Complexity via Multi-Agent Competition

  154. Learning with Opponent-Learning Awareness

  155. LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions

  156. CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms

  157. On Convergence and Stability of GANs

  158. Supervision via Competition: Robot Adversaries for Learning Tasks

  159. Policy Distillation

  160. Reflective Oracles: A Foundation for Classical Game Theory

  161. Homo Moralis-Preference Evolution Under Incomplete Information and Assortative Matching

  162. A self-coordinating bus route to resist bus bunching

  163. Language evolution in the laboratory

  164. If multi-agent learning is the answer, what is the question?

  165. Market-Based Reinforcement Learning in Partially Observable Worlds

  166. Properties of the Bucket Brigade Algorithm

  167. Computer-Aided Gas Pipeline Operation Using Genetic Algorithms And Rule Learning

  168. Collaborating With Humans Requires Understanding Them

  169. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

  170. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning

  171. Generally Capable Agents Emerge from Open-Ended Play

  172. 6c572f51d49224648a52a8421933f0db04170ce1.html

  173. One Writer Enters International Competition to Play the World-Conquering Game That Redefines What It Means to Be a Geek (and a Person)

  174. Mimicking Evolution With Reinforcement Learning

  175. LLM Powered Autonomous Agents

  176. The Pommerman Team Competition Or: How We Learned to Stop Worrying and Love the Battle

  177. New Winning Strategies for the Iterated Prisoner’s Dilemma

  178. How DeepMind's Generally Capable Agents Were Trained

  179. How Much Compute Was Used to Train DeepMind's Generally Capable Agents?

  180. DeepMind: Generally Capable Agents Emerge from Open-Ended Play

  181. So Has AI Conquered Bridge?

  182. The Steely, Headless King of Texas Hold ’Em

  183. Artificial Intelligence Beats Eight World Champions at Bridge

  184. Open-Ended Learning Leads to Generally Capable Agents [Video]

  185. 2019-jaderberg-supplement-movie-1-aau6249s1.mp4

  186. 2019-jaderberg-supplement-movie-2-aau6249s2.mp4

  187. 2019-jaderberg-supplement-movie-3-aau6249s3.mp4

  188. 2019-jaderberg-supplement-movie-4-aau6249s4.mp4

  189. https://blog.otoro.net/2022/10/01/collectiveintelligence/

  190. https://deepmind.google/discover/blog/learning-robust-real-time-cultural-transmission-without-human-data/

  191. https://github.com/deepmind/meltingpot

  192. https://github.com/deepmind/open_spiel

  193. https://people.idsia.ch/~juergen/directsearch/node15.html

  194. 73cb9f0d5ba3a3e57f5a29aff25fbb0389ac55c8.html

  195. https://research.facebook.com/publications/control-strategies-for-physically-simulated-characters-performing-two-player-competitive-sports/

  196. https://research.google/blog/introducing-google-research-football-a-novel-reinforcement-learning-environment/

  197. https://research.google/blog/leveraging-machine-learning-for-game-development/

  198. https://simulationlabs.ai/

  199. https://www.chocolatehammer.org/?p=5773

  200. https://www.fry-ai.com/p/social-media-no-humans-allowed

  201. e275b67a3d795a5c1c5d9402630337a4c4121f33.html

  202. https://www.lesswrong.com/posts/65qmEJHDw3vw69tKm/proposal-scaling-laws-for-rl-generalization

  203. https://www.lesswrong.com/posts/FbSAuJfCxizZGpcHc/interpreting-the-learning-of-deceit

  204. https://www.lesswrong.com/posts/bwyKCQD7PFWKhELMr/by-default-gpts-think-in-plain-sight#zfzHshctWZYo8JkLe

  205. https://www.nature.com/articles/s41467-020-19244-4#deepmind

  206. https://www.nature.com/articles/s41598-019-45619-9#deepmind

  207. https://www.pnas.org/doi/full/10.1073/pnas.2317967121

  208. https://www.quantamagazine.org/computers-evolve-a-new-path-toward-human-intelligence-20191106/

  209. https://www.reddit.com/r/reinforcementlearning/comments/cdwzp3/pluribus_superhuman_ai_for_multiplayer_poker/etwu82u/

  210. 87f57c57495f55ca13842c7c25db6c9ed9e0efa3.html

  211. https://x.com/evanthebouncy/status/1642918859866009600

  212. https://x.com/mayfer/status/1637767003078533122

  213. https://x.com/md_rumpf/status/1647911393796956162

  214. https://x.com/repligate/status/1827900674325045375

  215. PRER: Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

  216. https%253A%252F%252Farxiv.org%252Fabs%252F2312.08926.html

  217. Learning few-shot imitation as cultural transmission

  218. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html

  219. JaxMARL: Multi-Agent RL Environments in JAX

  220. https%253A%252F%252Farxiv.org%252Fabs%252F2311.10090.html

  221. Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

  222. https%253A%252F%252Farxiv.org%252Fabs%252F2311.03736.html

  223. Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)

  224. https%253A%252F%252Farxiv.org%252Fabs%252F2308.09175%2523deepmind.html

  225. Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

  226. https%253A%252F%252Farxiv.org%252Fabs%252F2308.01404.html

  227. Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology

  228. https%253A%252F%252Fwww.nber.org%252Fpapers%252Fw31422.html

  229. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

  230. Guy Lever

  231. Nicolas Heess

  232. https%253A%252F%252Farxiv.org%252Fabs%252F2304.13653%2523deepmind.html

  233. CICERO: Human-level play in the game of Diplomacy by combining language models with strategic reasoning

  234. Mike Lewis

  235. %252Fdoc%252Freinforcement-learning%252Fimperfect-information%252Fdiplomacy%252F2022-bakhtin.pdf.html

  236. Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning

  237. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DDY1pMrmDkm.html

  238. Social Simulacra: Creating Populated Prototypes for Social Computing Systems

  239. Percy Liang

  240. Michael Bernstein

  241. https%253A%252F%252Farxiv.org%252Fabs%252F2208.04024.html

  242. DeepNash: Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

  243. Sherjil Ozair

  244. https%253A%252F%252Farxiv.org%252Fabs%252F2206.15378%2523deepmind.html

  245. Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

  246. https%253A%252F%252Farxiv.org%252Fabs%252F2206.14349.html

  247. Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

  248. https%253A%252F%252Farxiv.org%252Fabs%252F2206.07505.html

  249. MAT: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

  250. https%253A%252F%252Farxiv.org%252Fabs%252F2205.14953.html

  251. NeuPL: Neural Population Learning

  252. Nicolas Heess

  253. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07415%2523deepmind.html

  254. Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination

  255. https%253A%252F%252Farxiv.org%252Fabs%252F2112.11701%2523tencent.html

  256. Player of Games

  257. https%253A%252F%252Farxiv.org%252Fabs%252F2112.03178%2523deepmind.html

  258. From Motor Control to Team Play in Simulated Humanoid Football

  259. Guy Lever

  260. Nicolas Heess

  261. https%253A%252F%252Farxiv.org%252Fabs%252F2105.12196%2523deepmind.html

  262. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling Coordinated Agents

  263. https%253A%252F%252Farxiv.org%252Fabs%252F2104.11980.html

  264. Imitating Interactive Intelligence

  265. Language Understanding Grounded in Perception and Action

  266. https%253A%252F%252Farxiv.org%252Fabs%252F2012.05672%2523deepmind.html

  267. Towards Playing Full MOBA Games with Deep Reinforcement Learning

  268. https%253A%252F%252Farxiv.org%252Fabs%252F2011.12692%2523tencent.html

  269. TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning

  270. https%253A%252F%252Farxiv.org%252Fabs%252F2011.12895%2523tencent.html

  271. Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions [blog]

  272. https%253A%252F%252Fbair.berkeley.edu%252Fblog%252F2020%252F07%252F11%252Fauction%252F.html

  273. Grandmaster level in StarCraft II using multi-agent reinforcement learning

  274. Yuhuai (Tony) Wu’s Home Page

  275. Koray Kavukcuoglu

  276. %252Fdoc%252Freinforcement-learning%252Fmodel-free%252Falphastar%252F2019-vinyals.pdf%2523deepmind.html

  277. Emergent Tool Use from Multi-Agent Interaction § Surprising behavior

  278. https://x.com/bobmcgrewai

  279. Igor Mordatch

  280. https%253A%252F%252Fopenai.com%252Fresearch%252Femergent-tool-use%2523surprisingbehaviors.html

  281. ICML 2019 Notes

  282. https%253A%252F%252Fdavid-abel.github.io%252Fnotes%252Ficml_2019.pdf.html

  283. Human-level performance in 3D multiplayer games with population-based reinforcement learning

  284. Guy Lever

  285. Koray Kavukcuoglu

  286. %252Fdoc%252Freinforcement-learning%252Fexploration%252F2019-jaderberg.pdf%2523deepmind.html

  287. Distilling Policy Distillation

  288. https://sites.google.com/view/razp/home

  289. https%253A%252F%252Farxiv.org%252Fabs%252F1902.02186%2523deepmind.html

  290. Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory

  291. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs42003-018-0078-7.html

  292. Homo Moralis-Preference Evolution Under Incomplete Information and Assortative Matching

  293. %252Fdoc%252Freinforcement-learning%252Fmulti-agent%252F2013-alger.pdf.html

  294. If multi-agent learning is the answer, what is the question?

  295. %252Fdoc%252Freinforcement-learning%252Fmulti-agent%252F2007-shoham.pdf.html