“PILCO: A Model-Based and Data-Efficient Approach to Policy Search”, Marc Peter Deisenroth, Carl Edward Rasmussen2011-06-01 (, , ; backlinks; similar)⁠:

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way.

By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement.

We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

[Remarkably, PILCO can learn your standard “Cartpole” task within just a few trials by carefully building a Bayesian Gaussian process model and picking the maximally-informative experiments to run. Cartpole is quite difficult for a human, incidentally, there’s an installation of one in the SF Exploratorium, and I just had to try it out once I recognized it. (My sample-efficiency was not better than PILCO.)]