“The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games”, Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu2021-03-02 (, ; backlinks; similar)⁠:

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is statistically-significantly lessused than off-policy learning algorithms in multi-agent settings. This is often due to the belief that on-policy methods are statistically-significantly less sample efficient than their off-policy counterparts in multi-agent problems.

In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in 3 popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures.

In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency.

Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO’s practical performance.