“Training Stable Diffusion from Scratch Costs <$160k”, 2023-12-24 ():
We wanted to know how much time (and money) it would cost to train a Stable Diffusion model from scratch using our Streaming datasets, Composer, and MosaicML Cloud Platform. Our results: it would take us 79,000 A100-hours in 13 days, for a total training cost of less than $160,000. Our tooling not only reduces time and cost by 2.5× from the time and cost reported in the model card from Stability AI, but it is also extendable and simple to use.
…Time and Cost Estimates: Table 1 & Figure 1 below illustrate how the Stable Diffusion V2 base training time and cost estimates vary by the number of GPUs used. Our final estimate for 256 A100s is 12.83 days to train with a cost of $160,000, a 2.5× reduction in the time and cost reported in the Stable Diffusion model card. These estimates were calculated using measured throughput and assumed training on 2.9 billion samples. Throughput was measured by training on 512×512 resolution images and captions with a max tokenized length of 77. We scaled GPUs 8–128 NVIDIA 40GB A100s, then extrapolated throughput to 256 A100s based on these measurements.