I just added weights for the first CLIP model trained from scratch on LAION-2B (english subset of 5B) in OpenCLIP (github.com/mlfoundations/ope…). A ViT-B/32 w/ an ImageNet-1k top-1 eval of 65.62%. Compute provided by @StabilityAI and a a h/t to @rom1504 for help with LAION-5B.

May 20, 2022 · 8:45 PM UTC

The 2B model was trained over 16 epochs instead of 32. So 2.5x sample count for same % of the LR schedule. Looking at the graph (green) in the previous tweet, the progress of the 2B B/32 was different than other 400M models...
... after jumping up fairly high at the start, the eval progress was quite slow until accelerating at epoch 10 (20 on a 32 epoch sched) and ultimately passing the 400M (or OpenAI) results by ~2.4%.
For anyone who's trained contrastive image-text models at this scale, is this indicative of a poor LR choice or the difference in samples seen? I used a fairly large global batch size here 46952 (112 * 416). @giffmana @_jongwook_kim ?
An L/14 (yellow) on 400M is also in the works on the JUWELS cluster.
Out of curiosity what are you training this on, and how long does it roughly take for an epoch?
The LAION-400M runs have (so far) all been on a German super computer (fz-juelich.de/ias/jsc/EN/Exp…) via a research grant. This 2B run was on AWS EFA A100 nodes provided by stability.ai ... ran into NCCL issues with EFA so ran under spec and it was ~14 hours per 2B epoch
This is totally awesome. And I also assume that this is true zero (text based) zero shot, no finetuning on imagenet? Would be nice to highlight this in the repi description.
Yup it's text based zero-shot, using prompt templates as per CLIP paper. @mehdidc has been working on an eval codebase for a much larger set of datasets. Pulling in prompts, datasets, ideas from CLIP, SLIP, WiSE-FT, LiT papers and code. github.com/LAION-AI/CLIP_ben…
Wow. Is it Distributed Data Parallel Training? And how you manage such huge data, i hope it's not loaded fully into memory right for training ??