âTuring-Universal Learners With Optimal Scaling Lawsâ, 2021-11-09 (; similar)â :
For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithmâs test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as nâα for some α > 0. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest.
We observe the existence of a âuniversal learnerâ, which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (eg. đȘ(n2)), while incurring only poly-logarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levinâs universal search (1973).
And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.