ā€œQ✱ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparisonā€, Tengyang Xie, Nan Jiang2020-03-09 (, )⁠:

We prove performance guarantees of two algorithms for approximating Q✱ in batch reinforcement learning.

Compared to classical iterative methods such as Fitted Q-Iteration—whose performance loss incurs quadratic dependence on horizon—these methods estimate (some forms of) the Bellman error and enjoy linear-in-horizon error propagation, a property established for the first time for algorithms that rely solely on batch data and output stationary policies.

One of the algorithms uses a novel and explicit importance-weighting correction to overcome the infamous ā€œdouble samplingā€ difficulty in Bellman error estimation, and does not use any squared losses.

Our analyses reveal its distinct characteristics and potential advantages compared to classical algorithms.