“Hyperbolic Deep Reinforcement Learning”, Edoardo Cetin, Benjamin Chamberlain, Michael Bronstein, Jonathan J. Hunt2022-10-04 ()⁠:

[Twitter, Reddit, blog] We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space.

Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators.

Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations.

We empirically validate our framework by applying it to popular on-policy & off-policy RL algorithms on the Procgen & Atari 100K benchmarks, attaining near universal performance and generalization benefits.

Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.