Bibliography (3):
Video Transformer Network
ViS4mer: Long Movie Clip Classification with State-Space Video Models
Attention Is All You Need