-
Contrastive Representation Learning: A Framework and Review
-
CoCa: Contrastive Captioners are Image-Text Foundation Models
-
A Short Note about Kinetics-600
-
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
-
https://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/Kuehne_etal_iccv11.pdf
-
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
-
Dense-Captioning Events in Videos