“Evaluating the Rainbow DQN Agent in Hanabi With Unseen Partners”, Rodrigo Canaan, Xianbo Gao, Youjin Chung, Julian Togelius, Andy Nealen, Stefan Menzel2020-04-28 (, ; similar)⁠:

Hanabi is a cooperative game that challenges existing AI techniques due to its focus on modeling the mental states of other players to interpret and predict their behavior. While there are agents that can achieve near-perfect scores in the game by agreeing on some shared strategy, comparatively little progress has been made in ad-hoc cooperation settings, where partners and strategies are not known in advance.

In this paper, we show that agents trained through self-play using the popular Rainbow DQN architecture fail to cooperate well with simple rule-based agents that were not seen during training and, conversely, when these agents are trained to play with any individual rule-based agent, or even a mix of these agents, they fail to achieve good self-play scores.