Ross Wightman · May 20, 2022 · 8:45 PM UTC

Ross Wightman · May 20, 2022 · 8:45 PM UTC

Ross Wightman

Ross Wightman @wightmanr

20 May 2022

I just added weights for the first CLIP model trained from scratch on LAION-2B (english subset of 5B) in OpenCLIP (github.com/mlfoundations/ope…). A ViT-B/32 w/ an ImageNet-1k top-1 eval of 65.62%. Compute provided by @StabilityAI and a a h/t to @rom1504 for help with LAION-5B.

May 20, 2022 · 8:45 PM UTC

170

Ross Wightman · May 20, 2022 · 8:45 PM UTC

Ross Wightman @wightmanr

20 May 2022

The 2B model was trained over 16 epochs instead of 32. So 2.5x sample count for same % of the LR schedule. Looking at the graph (green) in the previous tweet, the progress of the 2B B/32 was different than other 400M models...

Ross Wightman · May 20, 2022 · 8:45 PM UTC

Ross Wightman @wightmanr

20 May 2022

... after jumping up fairly high at the start, the eval progress was quite slow until accelerating at epoch 10 (20 on a 32 epoch sched) and ultimately passing the 400M (or OpenAI) results by ~2.4%.

Ross Wightman · May 20, 2022 · 8:45 PM UTC

Ross Wightman @wightmanr

20 May 2022

For anyone who's trained contrastive image-text models at this scale, is this indicative of a poor LR choice or the difference in samples seen? I used a fairly large global batch size here 46952 (112 * 416). @giffmana @_jongwook_kim ?

Ross Wightman · May 20, 2022 · 8:45 PM UTC

Ross Wightman @wightmanr

20 May 2022

An L/14 (yellow) on 400M is also in the works on the JUWELS cluster.

Sudeep Pillai · May 20, 2022 · 9:02 PM UTC

Sudeep Pillai

@sudeeppillai

20 May 2022

Replying to @wightmanr @StabilityAI @rom1504

Out of curiosity what are you training this on, and how long does it roughly take for an epoch?

Ross Wightman · May 20, 2022 · 9:10 PM UTC

Ross Wightman @wightmanr

20 May 2022

The LAION-400M runs have (so far) all been on a German super computer (fz-juelich.de/ias/jsc/EN/Exp…) via a research grant. This 2B run was on AWS EFA A100 nodes provided by stability.ai ... ran into NCCL issues with EFA so ran under spec and it was ~14 hours per 2B epoch

more replies

Leo Boytsov · May 21, 2022 · 4:40 PM UTC

Leo Boytsov @srchvrs

21 May 2022

Replying to @wightmanr @StabilityAI @rom1504

This is totally awesome. And I also assume that this is true zero (text based) zero shot, no finetuning on imagenet? Would be nice to highlight this in the repi description.

Ross Wightman · May 21, 2022 · 9:46 PM UTC

Ross Wightman @wightmanr

21 May 2022

Yup it's text based zero-shot, using prompt templates as per CLIP paper. @mehdidc has been working on an eval codebase for a much larger set of datasets. Pulling in prompts, datasets, ideas from CLIP, SLIP, WiSE-FT, LiT papers and code. github.com/LAION-AI/CLIP_ben…

GitHub - LAION-AI/CLIP_benchmark: CLIP-like model evaluation

CLIP-like model evaluation. Contribute to LAION-AI/CLIP_benchmark development by creating an account on GitHub.

github.com

s r · May 21, 2022 · 1:55 PM UTC

s r @iamsarath

21 May 2022

Replying to @wightmanr @StabilityAI @rom1504

Wow. Is it Distributed Data Parallel Training? And how you manage such huge data, i hope it's not loaded fully into memory right for training ??