“Reinforcement Learning for Recommender Systems: A Case Study on Youtube”, 2019-03-28 ():
While reinforcement learning (RL) has achieved impressive advances in games and robotics, it has not been widely adopted in recommender systems. Framing recommendation as an RL problem offers new perspectives, but also faces substantial challenges in practice. Industrial recommender systems deal with extremely large action spaces—many millions of items to recommend and complex user state spaces—billions of users, who are unique at any point in time.
In this talk, I will discuss our work on scaling up a policy-gradient-based algorithm, ie. REINFORCE, to a production recommender system at YouTube. We proposed algorithms to address data biases when deriving policy updates from logged implicit feedback.
I will also discuss some follow-up work and outstanding research questions in applying RL, in particular off-policy optimization in recommender systems. [33m:16s; with slides]
View YouTube video: Reinforcement Learning for Recommender Systems: A Case Study on Youtube