Bibliography (7):

MAE: Masked Autoencoders Are Scalable Vision Learners
Contrastive Representation Learning: A Framework and Review
ImageNet Large Scale Visual Recognition Challenge
https://paperswithcode.com/dataset/ade20k
Wikipedia Bibliography:
1. Latent and observable variables
2. Image segmentation
3. Object detection