Bibliography (4):

CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3
https://laion.ai/laion-400-open-dataset/
Clustering-Laion400m: Script and Models for Clustering LAION-400m CLIP Embeddings. Models Were Fit on the First Million or so Image Embeddings.
Wikipedia Bibliography:
1. k-nearest neighbors algorithm