Bibliography (3):

  1. Video Transformer Network

  2. ViS4mer: Long Movie Clip Classification with State-Space Video Models

  3. Attention Is All You Need