“Exploring Bayesian Optimization: Breaking Bayesian Optimization into Small, Sizeable Chunks”, 2020-05-05 (; similar):
[Discussion of Bayesian optimization (BO), a decision-theoretic application of Bayesian statistics (typically using Gaussian processes for flexibility) which tries to model a set of variables to find the maximum or best in the fewest number of collected data points possible. This differs from normal experiment design which tries to simply maximize the overall information about all points given a fixed number of samples, not just the best point, or “active learning”, which tries to select data points which make the model as predictive as possible while collecting samples. The difference can be visualized by watching posterior distributions for simple 2D problems evolve as data is collected according to different BO or active learning or simple grid-search/random baseline strategies. The optimal strategy is usually infeasible to calculate, so various heuristics like “expected improvement” or “Thompson sampling” are used, and their different behavior can be visualized and compared. BO is heavily used in machine learning to find the best combinations of settings for machine learning models.]
In this article, we looked at Bayesian Optimization for optimizing a black-box function. Bayesian Optimization is well suited when the function evaluations are expensive, making grid or exhaustive search impractical. We looked at the key components of Bayesian Optimization. First, we looked at the notion of using a surrogate function (with a prior over the space of objective functions) to model our black-box function. Next, we looked at the “Bayes” in Bayesian Optimization — the function evaluations are used as data to obtain the surrogate posterior. We look at acquisition functions, which are functions of the surrogate posterior and are optimized sequentially. This new sequential optimization is inexpensive and thus of utility of us. We also looked at a few acquisition functions and showed how these different functions balance exploration and exploitation. Finally, we looked at some practical examples of Bayesian Optimization for optimizing hyper-parameters for machine learning models.