“Algorithmic and Human Teaching of Sequential Decision Tasks”, Maya Cakmak, Manuel Lopes2012 (, ; backlinks)⁠:

A helpful teacher can substantially improve the learning rate of a learning agent. Teaching algorithms have been formally studied within the field of Algorithmic Teaching. These give important insights into how a teacher can select the most informative examples while teaching a new concept. However the field has so far focused purely on classification tasks.

In this paper we introduce a novel method for optimally teaching sequential decision tasks. We present an algorithm that automatically selects the set of most informative demonstrations and evaluate it on several navigation tasks.

Next, we explore the idea of using this algorithm to produce instructions for humans on how to choose examples when teaching sequential decision tasks. We present a user study that demonstrates the utility of such instructions.

Figure 5: Comparison of learning from optimally-selected demonstrations and randomly-selected demonstrations. The top row shows the decrease in the uncertainty on the rewards, and the bottom row shows the change in the percentage of correctly chosen actions with the policy obtained from the estimated rewards. The x-axes are the increasing number of demonstrations.

Natural teaching is sub-optimal but spontaneous optimality is possible: Only 5⁄40 people in this condition spontaneously produced optimal demonstrations. The participants’ descriptions of how the demonstration was chosen reveals that these were not chosen by chance, but were indeed insightful. For instance, two participants describe their teaching strategy as: “[I tried] to demonstrate a route that contains several different elements the robot may encounter”, and “I tried to involve as many crossroads as possible and to use both green and blue tiles in the path.” As a result of such intuitions being rare, the performance of the trained learners averaged across participants is far from the optimal values