Bibliography (31):

Contrastive Representation Learning: A Framework and Review
CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
An Open Source Implementation of CLIP
‘end-to-end’ directory
https://github.com/LAION-AI/scaling-laws-openclip
ImageNet: A Large-Scale Hierarchical Image Database
ImageNet Large Scale Visual Recognition Challenge
Do ImageNet Classifiers Generalize to ImageNet?
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ImageNet-Sketch: Learning Robust Global Representations by Penalizing Local Predictive Power
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models
ImageNet-A: Natural Adversarial Examples
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Microsoft COCO: Common Objects in Context
https://paperswithcode.com/dataset/flickr30k
https://arxiv.org/pdf/2212.07143.pdf#page=27
https://pytorch.org/docs/stable/notes/ddp.html
https://apps.fz-juelich.de/jsc/hps/juwels/configuration.html#hardware-configuration-of-the-system-name-booster-module
https://hpc.stability.ai/
Scaling Laws for Neural Language Models
Scaling Vision Transformers
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
https://arxiv.org/pdf/2212.07143.pdf#page=22
Wikipedia Bibliography: