“Controlling Overestimation Bias With Truncated Mixture of Continuous Distributional Quantile Critics (TQC)”, Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov2020-05-08 (; similar)⁠:

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting.

Our method—Truncated Quantile Critics(TQC)—blends 3 ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements.

TQC outperforms the current state-of-the-art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.