“Diversifying AI: Towards Creative Chess With AlphaZero (AZdb)”, Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh2023-08-17 (, , , , , )⁠:

[media; cf. jury learning, RLHF mode collapse] In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality.

In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called Drosophila of AI.

We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZdb [but still a single model]. We train AZdb to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning.

Our experiments suggest that AZdb plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZdb solves twice as many challenging puzzles as AZ, including the challenging Penrose positions.

When playing chess from different openings, we notice that players in AZdb specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ.

Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.

Figure 7: Scaling laws with AZdb. Top: Max over trials, Bottom: Sub-additive planning. Left to right: (1) Solve rate in % on Lichess puzzles with a different number of simulations and latents. (2) Relative gains in % from increasing the number of latents for each simulation budget. (3) scaling with the number of latents for different simulation budgets. (4) Relative gains with a different number of trials—latent in solid lines, seeds in dashed lines—at different simulation budgets.

Analysis of diversity bonuses in AZdb: In the puzzle evaluation section, we observed that diversity bonuses emerge at the computational boundaries of AZdb. In this section, we analyse what components of AZdb are the most important for diversity bonuses and if diversity bonuses emerge at other compute budgets. We focus our evaluation on the Lichess data set and use the fast configuration for that. In Figure 7 we study how AZdb scales with different simulation budgets and team sizes. The top figure presents results for max-over-latents and the bottom figure shows results for sub-additive planning based on LCB (Equation 9). Most importantly, we observe diversity bonuses for all compute budgets and team sizes. In the leftmost table, we present the absolute solve rate in %. We can see that AZdb’s performance improves monotonically with the number of simulations and the number of trials, implying that larger teams solve more puzzles together. We can also see that on the third sub-figure, which presents the same data in a different manner. For each column in the first table (simulation budget) we draw a line that presents the solve rate in % as a function of the number of trials (x-axis). We can see that the diversity bonuses keep increasing as we increase the number of trials for each simulation budget.

The second table from the right shows the relative gains computed in % of AZdb from having more players in the team. Interestingly, the highest relative gains are achieved when the number of simulations is 54 = 625, which is the simulation budget that is closest to the one we use in training (see the discussion section). On the rightmost sub-figure, we compare a diverse team of agents with a more homogeneous team. For the diverse team, we use different latents as before (solid lines) and for the homogeneous group, we use the best latent in the group (latent 0) and allow it different trials of search (with different seeds, in dashed lines). We can see that across different group sizes, simulation budgets, and for both max-over-latents and sub-additive planning, a diverse team outperforms the homogeneous one.