âReplicating GPT-2-1.5Bâ, 2019-06-06 (; backlinks; similar)â :
In this post, I want to quickly talk about the technical and organizational questions around my recent replication of GPT-2-1.5b. Please read my main post for the full story. I will try to keep this post brief.
The important facts
Code: https://github.com/ConnorJL/GPT2; samples: https://github.com/ConnorJL/GPT2/tree/master/samples.
The code should run out of the box on GPUs and TPUs (and CPUs, if youâre really desperate). I used the parameters specified in
1.5B.jsonand trained it on a preemptible v3-512 TPU pod (which is actually more powerful than the machine OpenAI used) for around a week (with interruptions). Code and instructions for generating the dataset are also included in the repo.You can download my models with the script in the repo. Currently I have a weaker version of 117M, and a model I call PrettyBig which is slightly larger than OpenAIâs 345M, which means it is technically the largest GPT-2 model currently publicly available.
I will be releasing 1.5B to the public on July 1st, if, and only if, no one shows me a convincing reason not to. When I do, it will be downloadable just like my other models.