Bibliography (7):

  1. Contrastive Representation Learning: A Framework and Review

  2. CoCa: Contrastive Captioners are Image-Text Foundation Models

  3. A Short Note about Kinetics-600

  4. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

  5. https://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/Kuehne_etal_iccv11.pdf

  6. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

  7. Dense-Captioning Events in Videos