“Bayesian Optimization in AlphaGo”, Yutian Chen, Aja Huang, Ziyu Wang, Ioannis Antonoglou, Julian Schrittwieser, David Silver, Nando de Freitas2018-12-17 (; similar)⁠:

During the development of AlphaGo, its many hyper-parameters were tuned with Bayesian optimization multiple times.

This automatic tuning process resulted in substantial improvements in playing strength. For example, prior to the match with Lee Sedol, we tuned the latest AlphaGo agent and this improved its win-rate 50% → 66.5% in self-play games. This tuned version was deployed in the final match. Of course, since we tuned AlphaGo many times during its development cycle, the compounded contribution was even higher than this percentage.

It is our hope that this brief case study will be of interest to Go fans, and also provide Bayesian optimization practitioners with some insights and inspiration.

…Interestingly, the automatically found hyper-parameter values were very different from the default values found by previous hand tuning efforts. Moreover, the hyper-parameters were often correlated, and hence the values found by Bayesian optimization were not reachable with element-wise hand-tuning, or even by tuning pairs of parameters in some cases.

By tuning the mixing ratio between roll-out estimates and value network estimates, we found out that Bayesian optimization gave increased preference to value network estimates as the design cycle progressed. This eventual led the team to abandon roll-out estimates in future versions of AlphaGo and AlphaGo Zero.