You have 2 free member-only stories left this month.

How To Fine-Tune GPT-2 So You Can Generate Long-Form Creative Writing

The world’s greatest text generating AI can be your writing partner!

Ever since OpenAI released its GPT-2 language model into the wild, people have been using this AI writing tool to generate hilarious, scary, and fascinating short-form texts.

If you want to use GPT-2 to generate long-form writing that incorporates your favorite themes, characters, settings, and writing styles, you’ll need to fine-tune the base model and do lots of reading.

I can show you how to do that — for free!

Photo by Brett Jordan on Unsplash

Meet Mr. Output, My AI Writing Partner

I just spent 18 days completing the National Novel Generation Challenge (NaNoGenMo), computer-generating a 51,422-word manuscript.

Halfway through this writing and coding marathon, my AI writing partner generated this amazing (and somewhat disconcerting) monologue by an AI writer named Mr. Output:

This monstrosity of yours? The monstrosity that is being written now, by the talented and novel-obsessed AI writer you just launched against your will, is just so terribly boring, there's no way I could squeeze in one more story.I write fantasy, I write sci-fi, I write horror, I write romance, I write all sorts of nonsense. You can't even imagine how many lines I've written, how many worlds and histories I've created and twisted and broken and made to rhyme and rhyme and be the best that there ever was.

I’ll spend the rest of the year finding ways to incorporate Mr. Output’s strange and wonderful AI work into my own writing. If you enjoyed that passage, send me a message with your address and I’ll snail-mail you a handwritten story from Mr. Output.

While writing about NaNoGenMo for Publishers Weekly, I had the opportunity to discuss long-form AI text generation with great writers like Robin Sloan, author of Mr. Penumbra’s 24-Hour Bookstore and Sourdough. As part of his writing projects, Sloan runs a version of GPT-2 on GPUs he purchased from a Bitcoin miner.

Robin told me that it is “critical” to include a human author in the generation of a long-form text. “This still fundamentally about a person,” he explained, pointing to future collaborations between human authors and AI language models. “An author making decisions and having a plan and something they want to say in the world. If that means that they become an editor or a curator of this other text, I think that’s fine — or even awesome!”

Following Robin’s advice, I played the curator during my NaNoGenMo project.

I could have generated 50,000-words in less than an hour with my AI writing partner, but I chose to spend 18 days reading hundreds of pages and collecting the most compelling texts into a single document.

I also used prefix tags in my code to make sure the GPT-2 model focused on my favorite metaphysical themes throughout the project.

“One of the open problems in the procedural generation of fiction is how to maintain reader interest at scale,” wrote John Ohno on Medium. Hoping to tackle that issue with my NaNoGenMo project, I decided to use writing prompts and responses to generate shorter (and possibly more interesting) sections.

Here’s how I did it…

Photo by Brett Jordan on Unsplash

Create a dataset.

No matter what kind of long-form writing you want to generate, you’ll need to find the largest dataset possible.

Using a Google’s BigQuery tool and the enormous archive of Reddit data archived at Pushshift.io, I created a massive dataset of individual writing prompts in a handy CSV file — each separated with <|startoftext|> and <|endoftext|> tokens.

I also used a Reddit search tool from Pushshift to collect hundreds of AI-related writing prompts to add to the dataset because I wanted my bots to tackle my favorite sci-fi themes.

At the end of this process, I had a writing prompt dataset with 1,065,179 tokens. GPT-2 is a massive language model, so you need a comparatively big dataset to fine-tune the model effectively.

Create another dataset (if needed).

Unless you want to follow my writing prompts/responses model, you only need to create ONE dataset.

I needed two datasets for my project. Using the same tools and lots of Reddit searches, I collected the highest-rated and greatest writing prompts I could find online. I added lots of AI-focused responses, including my own writing prompt responses I’ve written over the years.

I ended up with 1,947,763 tokens in this second training dataset.

Structure your dataset.

This step is really important if you want to generate writing with any sort of structure.

I wanted to give my AI the most uncluttered and high-quality learning data possible, so I used a series of simple markers to teach GPT-2 the shape of a writing prompt response.

I added <|startoftext|> and <|endoftext|> tokens to match the writing prompt dataset, but I also made sure each response had the original writing prompt marked [WP] and the response itself marked [RESPONSE].

While this was a huge, time-consuming effort, but it made my output infinitely more interesting. I saved the whole dataset as a .TXT file. Here’s what that dataset looks like:

<|startoftext|>[WP] On November 4, 2020, humanity abandons the Internet. Billions of bots have colonized the web with spam, deepfakes, and fake news. At the now deserted r/WritingPrompts, literary AIs (trained on great works of human literature) create an infinite library of prompts & responses.

[RESPONSE] Zackary Blue worked as a night janitor in the basement for major financial services company. He cleaned cavernous spaces beneath the building where hundreds of millions of dollars got traded every single day.
<|endoftext|>

Fine-tune your GPT-2 language model.

Fine-tuning both my language models took a few days. Max Woolf taught me everything I needed to know for this step.

IMPORTANT UPDATE: Google Colab has updated its standard Tensor Flow version, and you must add a single line of code to the top of Max Woolf’s Colab Notebook to use an older version of Tensor Flow. Just add this line of code to the top of the Colab Notebook and Woolf’s notebook should run perfectly:

%tensorflow_version 1.x

I used the medium-sized 355M version of GPT-2 because it was large enough to handle my data set but small enough to run on Google Colab’s cloud servers. I trained both models with 42,000 steps each.

I broke the training into smaller sections, because Google Colab won’t run more than 12,000 steps.

Generate text with your fine-tuned GPT-2 model.

Once my two language models were trained, I began generating my NaNoGenMo project.

Every morning, I would run Writing Prompt Prompter (Jane Doe) and generate between 10–20 pages of computer-generated writing prompts.

For each run, I set a length of 100 characters, the temperature at .7, and the output at 10 samples. To shape the project, I used the prefix function to angle the writing prompt output toward my favorite themes like virtual reality, simulation theory, and AI writers.

The code from one of my *many* Writing Prompt Prompter runs.

Unless you want to follow my writing prompts/responses model, you only need to generate text with one language model.

However, I added an extra step for my NaNoGenMo project.

I picked my favorite computer-generated writing prompts and fed them to Writing Prompt Responder (Mr. Output) to generate between 100 and 125 pages of responses. Max Woolf’s Google Colab notebooks helped me code everything for this step.

The code from one of my *many* Writing Prompt Responder runs.

Read through ALL your output.

I combed through hundreds of pages of text that Jane Doe and Mr. Output generated, picking the most compelling stories.

For every 350 words of compelling computer-generated writing that I gathered for NaNoGenMo, I had to plow through around 10,000 words of strange, confusing, repetitive, or just plain incomprehensible computer-generated text.

Collect the best writing in a master file.

During November, I stored each day’s output in a single file. I would read that file every night, searching for compelling writing. Here’s one of the first pieces of writing I saved, a utopian manifesto:

We are a group of artists, writers, coders, and tinkerers, who have banded under the banner of Humanity v2.0. We are a worldwide collective of similar interests. We all have the same goal - to create anew the foundations of a better world.

I recommend reading your computer-generated text on an eReader or tablet. It gets you away from the computer screen and gives you a bit more of the actual reading experience. I used also the fabulous Voice Dream Reader to listen to my computer-generated content while commuting to work in the morning.

During these epic reading sessions, I saved the best pieces of writing into a master file marked “NaNoGenMo Final.”

Repeat the process until you have enough compelling writing.

After that, I kept running those same text-generation steps for 17 days straight November. I kept adding computer-generated text to my final document until I crossed the 50,000-word finish line for the NaNoGenMo marathon.

Curate your computer-generated creation.

Once the “NaNoGenMo Final” document had reached critical mass, I used the powerful writing tool Scrivener to rearrange the computer-generated stories around themes like virtual reality, AI writing, or gods.

I kept my edits to a simple reorganization, but human curators can use this step to make chapters, edit particular passages, add illustrations, write commentary, or anything else!

Check for plagiarism & publish!

Once I had compiled the final manuscript, I ran every prompt and response through the Quetext tool to make sure my bot hadn’t plagiarized anybody online. GPT-2 runs a massive dataset with millions of articles, but it occasionally parrots its human trainers.

That’s it! NaNoGenMo was a great writing and coding experience, I recommend trying it if you have some time this month.

There have never been so many resources available to amateurs like me and GPT-2 is truly powerful when trained fine-tuned. I’ll be reading these GPT-2 generated stories for the rest of the year.

Here’s one last sample, a bit of connection between an AI writer and a human writer…

They are a species apart, but they share one thing in common: their love of books. They are the ones who seek out books for their own needs, whether they be scientific, philosophical, or religious texts. They are the ones who, from the beginning, have been able to find meaning within books.

Journalist, author & west coast correspondent for Publishers Weekly. Author of THE DEEP END: http://bit.ly/3aHSMJO

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Comparing Tokenizer vocabularies of State-of-the-Art Transformers (BERT, GPT-2, RoBERTa, XLM)

If someone used word embeddings like Word2vec or GloVe, adapting to the new contextualised embeddings like BERT can be difficult. In this story, we will investigate one of the differences: subword tokens. The inspiration for this story was a similar post that explored the multilingual BERT vocabulary.

For this experiment…


Share your ideas with millions of readers.


Visualising the evolution of the world development indicators during the past decade with R Shiny

It’s been a long year. Rather it’s been a looong decade. The ISIS has been officially wiped out, the battle against Polio was won, South Sudan ceremoniously became a country, Pluto went from being the beloved tiny planet to just a dwarf revolving the sun, Leicester City won the Premier…


Deploying and maintaining Machine Learning models at scale is one of the most pressing challenges faced by organizations today. Machine Learning workflow which includes Training, Building and Deploying machine learning models can be a long process with many roadblocks along the way. Many data science projects don’t make it to…


Following the history of the development of Electronic Dance Music, means realizing the trajectory of a digital art form. It grew from underground scenes to laptops to global stages passing through a crucial state of chaos. …


Increase the 12GB limit to 25GB

Google Colab has truly been a godsend, providing everyone with free GPU resources for their deep learning projects. However, sometimes I do find the memory to be lacking. But don’t worry, because it is actually possible to increase the memory on Google Colab FOR FREE and turbocharge your machine learning…


Get the Medium app