Bibliography (4):

  1. Attention Is All You Need

  2. MAE: Masked Autoencoders Are Scalable Vision Learners

  3. Contrastive Representation Learning: A Framework and Review

  4. https://github.com/zinengtang/TVLT