“Learning Accurate Long-Term Dynamics for Model-Based Reinforcement Learning”, Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra2020-12-16 (; similar)⁠:

Accurately predicting the dynamics of robotic systems is crucial for model-based control and reinforcement learning. The most common way to estimate dynamics is by fitting a one-step ahead prediction model and using it to recursively propagate the predicted state distribution over long horizons. Unfortunately, this approach is known to compound even small prediction errors, making long-term predictions inaccurate.

In this paper, we propose a new parametrization to supervised learning on state-action data to stably predict at longer horizons—that we call a trajectory-based model. This trajectory-based model takes an initial state, a future time index, and control parameters as inputs, and directly predicts the state at the future time index.

Experimental results in simulated and real-world robotic tasks show that trajectory-based models yield statistically-significantly more accurate long term predictions, improved sample efficiency, and the ability to predict task reward.

With these improved prediction properties, we conclude with a demonstration of methods for using the trajectory-based model for control.