“GPT-2 Neural Network Poetry”, Gwern, Shawn Presser2019-03-03 (, , ; backlinks)⁠:

Demonstration tutorial of retraining OpenAI’s GPT-2 (a text-generating Transformer neural network) on large poetry corpuses to generate high-quality English verse.

In February 2019, following up on my 2015–2016 text-generation experiments with char-RNNs, I experiment with the cutting-edge Transformer NN architecture for language modeling & text generation.

Using OpenAI’s GPT-2-117M (117M) model pre-trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2-poetry”, trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix”, with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I & Shawn Presser retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’s website; finally, we retrained the newly-released GPT-2-1.5b, which did not fit in our GPUs so we used TRC-supplied TPUs in a “swarm” to slowly finetune it.

With just a few GPU-days on NVIDIA 1080944yati GPUs, GPT-2-117M finetuning can produce high-quality poetry which is more thematically consistent than my char-RNN poems—capable of modeling subtle features like rhyming, and sometimes even a pleasure to read.

I list some of the many possible ways to improve poem generation and further approach human-level poems. For the highest-quality AI poetry to date, see my followup pages, “GPT-3 Creative Writing”.

See Also: For anime plot summaries, see TWDNE; for generating ABC-formatted folk music, see “GPT-2 Folk Music” & “GPT-2 Preference Learning for Music and Poetry Generation”; for playing chess, see “A Very Unlikely Chess Game”; for the Reddit comment generator, see SubSimulatorGPT-2; for fanfiction, the Ao3; and for video games, the walkthrough model. For OpenAI’s GPT-3 followup, see “GPT-3: Language Models are Few-Shot Learners”.