Andrej Karpathy: GPT-2 (124M) in llm.c, in 90 minutes for $20

StartledWatermelon · 2024-05-29T16:04:02+00:00

Andrej has generously put the value of his time working on this at 0 dollars per hour. But I doubt I can hire him at this rate, even if I asked super nicely.

Training GPT-2 (1.5B) on 10B tokens in 2019 cost $50,000. I think it is pretty evident that the so-called "soft costs", the talent cost for the development of this model was at least an order of magnitude higher. And, unfortunately, we haven't seen comparable cost reduction in this area over the past 5 years.

Another important thing to consider is that Andrej has reproduced the model, not the research effort needed to make this model at the frontier of knowledge. Which involves a lot of exploration and a lot of experiments. Say, I'm not certain the community knew the optimal learning rates and batch sizes to train language models on large-scale corpus back then.

Anyway, the pace of progress in ML is such that a frontier model in 2019 is a toy problem in 2024 (or at least a toy problem for a brilliant reseacher with low resources). Hope we'll keep up the pace. GPT-4o for twenty bucks in 2029 doesn't sound bad.

furrypony2718 · 2024-07-23T18:57:59+00:00

update: https://x.com/karpathy/status/1811467135279104217

mlscaling

MODERATORS

mlscaling

MODERATORS

Welcome to Reddit.

Want to add to the discussion?