“OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Aaron Gokaslan, Vanya Cohen2019-08-22 (; backlinks; similar)⁠:

Recently, large language models like BERT, XLNet, GPT-2, and GROVER have demonstrated impressive results generating new content and multiple tasks. Since Open-AI has not released their largest model [GPT-2-1.5b] at this time, we seek to replicate the model to allow others to build on our pretrained model and further improve it. You can access the model and generate text using our Google Colab.

…We demonstrate that many of the results of the paper can be replicated by two masters students…Because our replication efforts are not unique, and large language models are the current most effective means of countering generated text, we believe releasing our model is a reasonable first step towards countering the potential future abuse of these kinds of models.

We base our implementation off of the GROVER model4 and modify their codebase to match the language modeling training objective of GPT-2. Since their model was trained on a similarly large corpus, much of the code and hyperparameters proved readily reusable. We did not substantially change the hyperparameters from GROVER.

From start to finish, we estimate that we use under $500,000 in cloud compute for all of our experiments including searching for hyper-parameters and testing various cleaning methods on our datasets. The cost of training the model from scratch using our code is about $50,000.

…Despite the differences in our training distribution, we do report similar perplexities over most datasets.