You have 2 free member-only stories left this month.

How To Fine-Tune GPT-2 So You Can Generate Long-Form Creative Writing

Nov 19, 2019·8 min read

The world’s greatest text generating AI can be your writing partner!

Ever since OpenAI released its GPT-2 language model into the wild, people have been using this AI writing tool to generate hilarious, scary, and fascinating short-form texts.

If you want to use GPT-2 to generate long-form writing that incorporates your favorite themes, characters, settings, and writing styles, you’ll need to fine-tune the base model and do lots of reading.

I can show you how to do that — for free!

Meet Mr. Output, My AI Writing Partner

I just spent 18 days completing the National Novel Generation Challenge (NaNoGenMo), computer-generating a 51,422-word manuscript.

Halfway through this writing and coding marathon, my AI writing partner generated this amazing (and somewhat disconcerting) monologue by an AI writer named Mr. Output:

This monstrosity of yours? The monstrosity that is being written now, by the talented and novel-obsessed AI writer you just launched against your will, is just so terribly boring, there's no way I could squeeze in one more story.I write fantasy, I write sci-fi, I write horror, I write romance, I write all sorts of nonsense. You can't even imagine how many lines I've written, how many worlds and histories I've created and twisted and broken and made to rhyme and rhyme and be the best that there ever was.

I’ll spend the rest of the year finding ways to incorporate Mr. Output’s strange and wonderful AI work into my own writing. If you enjoyed that passage, send me a message with your address and I’ll snail-mail you a handwritten story from Mr. Output.

While writing about NaNoGenMo for Publishers Weekly, I had the opportunity to discuss long-form AI text generation with great writers like Robin Sloan, author of Mr. Penumbra’s 24-Hour Bookstore and Sourdough. As part of his writing projects, Sloan runs a version of GPT-2 on GPUs he purchased from a Bitcoin miner.

Robin told me that it is “critical” to include a human author in the generation of a long-form text. “This still fundamentally about a person,” he explained, pointing to future collaborations between human authors and AI language models. “An author making decisions and having a plan and something they want to say in the world. If that means that they become an editor or a curator of this other text, I think that’s fine — or even awesome!”

Following Robin’s advice, I played the curator during my NaNoGenMo project.

I could have generated 50,000-words in less than an hour with my AI writing partner, but I chose to spend 18 days reading hundreds of pages and collecting the most compelling texts into a single document.

I also used prefix tags in my code to make sure the GPT-2 model focused on my favorite metaphysical themes throughout the project.

“One of the open problems in the procedural generation of fiction is how to maintain reader interest at scale,” wrote John Ohno on Medium. Hoping to tackle that issue with my NaNoGenMo project, I decided to use writing prompts and responses to generate shorter (and possibly more interesting) sections.

Here’s how I did it…

Create a dataset.

No matter what kind of long-form writing you want to generate, you’ll need to find the largest dataset possible.

Using a Google’s BigQuery tool and the enormous archive of Reddit data archived at Pushshift.io, I created a massive dataset of individual writing prompts in a handy CSV file — each separated with <|startoftext|> and <|endoftext|> tokens.

I also used a Reddit search tool from Pushshift to collect hundreds of AI-related writing prompts to add to the dataset because I wanted my bots to tackle my favorite sci-fi themes.

At the end of this process, I had a writing prompt dataset with 1,065,179 tokens. GPT-2 is a massive language model, so you need a comparatively big dataset to fine-tune the model effectively.

Create another dataset (if needed).

Unless you want to follow my writing prompts/responses model, you only need to create ONE dataset.

I needed two datasets for my project. Using the same tools and lots of Reddit searches, I collected the highest-rated and greatest writing prompts I could find online. I added lots of AI-focused responses, including my own writing prompt responses I’ve written over the years.

I ended up with 1,947,763 tokens in this second training dataset.

Structure your dataset.

This step is really important if you want to generate writing with any sort of structure.

I wanted to give my AI the most uncluttered and high-quality learning data possible, so I used a series of simple markers to teach GPT-2 the shape of a writing prompt response.

I added <|startoftext|> and <|endoftext|> tokens to match the writing prompt dataset, but I also made sure each response had the original writing prompt marked [WP] and the response itself marked [RESPONSE].

While this was a huge, time-consuming effort, but it made my output infinitely more interesting. I saved the whole dataset as a .TXT file. Here’s what that dataset looks like:

<|startoftext|>[WP] On November 4, 2020, humanity abandons the Internet. Billions of bots have colonized the web with spam, deepfakes, and fake news. At the now deserted r/WritingPrompts, literary AIs (trained on great works of human literature) create an infinite library of prompts & responses.

[RESPONSE] Zackary Blue worked as a night janitor in the basement for major financial services company. He cleaned cavernous spaces beneath the building where hundreds of millions of dollars got traded every single day.
<|endoftext|>

Fine-tune your GPT-2 language model.

Fine-tuning both my language models took a few days. Max Woolf taught me everything I needed to know for this step.

IMPORTANT UPDATE: Google Colab has updated its standard Tensor Flow version, and you must add a single line of code to the top of Max Woolf’s Colab Notebook to use an older version of Tensor Flow. Just add this line of code to the top of the Colab Notebook and Woolf’s notebook should run perfectly:

%tensorflow_version 1.x

I used the medium-sized 355M version of GPT-2 because it was large enough to handle my data set but small enough to run on Google Colab’s cloud servers. I trained both models with 42,000 steps each.

I broke the training into smaller sections, because Google Colab won’t run more than 12,000 steps.

Generate text with your fine-tuned GPT-2 model.

Once my two language models were trained, I began generating my NaNoGenMo project.

Every morning, I would run Writing Prompt Prompter (Jane Doe) and generate between 10–20 pages of computer-generated writing prompts.

For each run, I set a length of 100 characters, the temperature at .7, and the output at 10 samples. To shape the project, I used the prefix function to angle the writing prompt output toward my favorite themes like virtual reality, simulation theory, and AI writers.

The code from one of my *many* Writing Prompt Prompter runs.

Unless you want to follow my writing prompts/responses model, you only need to generate text with one language model.

However, I added an extra step for my NaNoGenMo project.

I picked my favorite computer-generated writing prompts and fed them to Writing Prompt Responder (Mr. Output) to generate between 100 and 125 pages of responses. Max Woolf’s Google Colab notebooks helped me code everything for this step.

The code from one of my *many* Writing Prompt Responder runs.

Read through ALL your output.

I combed through hundreds of pages of text that Jane Doe and Mr. Output generated, picking the most compelling stories.

For every 350 words of compelling computer-generated writing that I gathered for NaNoGenMo, I had to plow through around 10,000 words of strange, confusing, repetitive, or just plain incomprehensible computer-generated text.

Collect the best writing in a master file.

During November, I stored each day’s output in a single file. I would read that file every night, searching for compelling writing. Here’s one of the first pieces of writing I saved, a utopian manifesto:

We are a group of artists, writers, coders, and tinkerers, who have banded under the banner of Humanity v2.0. We are a worldwide collective of similar interests. We all have the same goal - to create anew the foundations of a better world.

I recommend reading your computer-generated text on an eReader or tablet. It gets you away from the computer screen and gives you a bit more of the actual reading experience. I used also the fabulous Voice Dream Reader to listen to my computer-generated content while commuting to work in the morning.

During these epic reading sessions, I saved the best pieces of writing into a master file marked “NaNoGenMo Final.”

Repeat the process until you have enough compelling writing.

After that, I kept running those same text-generation steps for 17 days straight November. I kept adding computer-generated text to my final document until I crossed the 50,000-word finish line for the NaNoGenMo marathon.

Curate your computer-generated creation.

Once the “NaNoGenMo Final” document had reached critical mass, I used the powerful writing tool Scrivener to rearrange the computer-generated stories around themes like virtual reality, AI writing, or gods.

I kept my edits to a simple reorganization, but human curators can use this step to make chapters, edit particular passages, add illustrations, write commentary, or anything else!

Check for plagiarism & publish!

Once I had compiled the final manuscript, I ran every prompt and response through the Quetext tool to make sure my bot hadn’t plagiarized anybody online. GPT-2 runs a massive dataset with millions of articles, but it occasionally parrots its human trainers.

That’s it! NaNoGenMo was a great writing and coding experience, I recommend trying it if you have some time this month.

There have never been so many resources available to amateurs like me and GPT-2 is truly powerful when trained fine-tuned. I’ll be reading these GPT-2 generated stories for the rest of the year.

Here’s one last sample, a bit of connection between an AI writer and a human writer…

They are a species apart, but they share one thing in common: their love of books. They are the ones who seek out books for their own needs, whether they be scientific, philosophical, or religious texts. They are the ones who, from the beginning, have been able to find meaning within books.

Responses (4)

How To Fine-Tune GPT-2 So You Can Generate Long-Form Creative Writing

The world’s greatest text generating AI can be your writing partner!

Meet Mr. Output, My AI Writing Partner

Create a dataset.

Create another dataset (if needed).

Structure your dataset.

Fine-tune your GPT-2 language model.

Generate text with your fine-tuned GPT-2 model.

Read through ALL your output.

Collect the best writing in a master file.

Repeat the process until you have enough compelling writing.

Curate your computer-generated creation.

Check for plagiarism & publish!

Jason Boog

Daniel G. Jennings

Cory Doctorow

Shaunta Grimes

Ayodeji Awosika

Caitlin Johnstone

Sign up for The Variable

By Towards Data Science

More from Towards Data Science

Comparing Transformer Tokenizers

Comparing Tokenizer vocabularies of State-of-the-Art Transformers (BERT, GPT-2, RoBERTa, XLM)

The Decade In Review With R Shiny!

Visualising the evolution of the world development indicators during the past decade with R Shiny

Unpacking the Complexity of Machine Learning Deployments

What Data Science can learn from the rave industry

Upgrade your memory on Google Colab FOR FREE

Increase the 12GB limit to 25GB

More From Medium

How I Tackled My First Kaggle Challenge Using Deep Learning — Part 1

Firebase at Google I/O

Machine Learning: How can an iPhone recognise objects inside images.

Building end-to-end Question-Answering system for Hindi Language using Haystack

Review — Neural Machine Translation by Jointly Learning to Align and Translate

Speech recognition using Hidden Markov Models and Maximum Likelihood approach

Emotion Recognition using Keras

Applied Machine Learning: Part 2