“Posterior Sampling for Large Scale Reinforcement Learning”, Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis2017-11-21 (, ; backlinks; similar)⁠:

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule.

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems.

We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature.

Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.