What is being learned by superhuman neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess.
By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network.
Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero’s representations, and make the resulting behavioral and representational analyses available online.
…Sequential knowledge acquisition: Figures 4, 7 & 6 suggest a sequence: that piece value is learned before basic opening knowledge; that once discovered, there is an explosion of basic opening knowledge in a short temporal window; that the network’s opening theory is slowly refined over hundreds of thousands of training steps.
Figure 4: Value regression from human-defined concepts over time. (a) Value regression methodology: we train a generalized linear model on concepts to predict AlphaZero’s value head for each neural network checkpoint. (b) Piece value weights converge to values close to those predicted by conventional theory. (c) Material predicts value early in training, with more subtle concepts such as mobility and king safety emerging later.
Figure 5: A comparison between AlphaZero’s and human first-move preferences over training steps and time. (a) The evolution of the first move preference for White over the course of human history, spanning back to the earliest recorded games of modern chess in the ChessBase database. The early popularity of 1. e4 gives way to a more balanced exploration of different opening systems and an increasing adoption of more flexible systems in modern times.
Figure 5: A comparison between AlphaZero’s and human first-move preferences over training steps and time. (b) The AlphaZero policy head’s preferences of opening move, as a function of training steps. Here AlphaZero was trained 3× from 3 different random seeds. AlphaZero’s opening evolution starts by weighing all moves equally, no matter how bad, and then narrows down options. It stands in contrast with the progression of human knowledge, which gradually expanded from 1. e4.
Figure 7: Rapid discovery of basic openings. The randomly initialized AlphaZero network gives a roughly uniform prior over all moves. The distribution stays roughly uniform for the first 25k training iterations, after which popular opening moves quickly gain prominence. In particular, 1. e4 is fully adopted as a sensible move in a window of 10k training steps, or in a window of 1% of AlphaZero’s training time. (a) After 25k training iterations, e4 and d4 are discovered to be good opening moves, and rapidly adopted within a short period of around 30k training steps. (b) Rapid discovery of options given 1. e4 e5. Within a short space of time, ♘f3 is settled on as a standard reply, whereas d4 is considered and discarded.