GPT-2 Neural Network Poetry
Demonstration tutorial of retraining OpenAI’s GPT-2 (a text-generating Transformer neural network) on large poetry corpuses to generate high-quality English verse.
- GPT-2-117M: Generating Poetry
- Training GPT-2-117M To Generate Poetry
-
Training
GPT-2-poetry
-
Training
GPT-2-poetry-prefix
- GPT-2-1.5b
- Overall
- Improvements
- External Links
- Appendix
- Footnotes
- Backlinks
- Similar Links
- Bibliography
In February 2019, following up on my 2015–2016 text-
generation experiments with char- , I experiment with the cutting-RNNs edge Transformer NN architecture for language modeling & text generation. Using OpenAI’s GPT-2-117M (117M) model pre-
trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2- poetry” , trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix” , with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I & Shawn Presser retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’s website; finally, we retrained the newly- released GPT-2-1.5b, which did not fit in our GPUs so we used TRC- supplied TPUs in a “swarm” to slowly finetune it. With just a few GPU-
days on NVIDIA 1080ti GPUs, GPT-2-117M finetuning can produce high- quality poetry which is more thematically consistent than my char-RNN poems—capable of modeling subtle features like rhyming, and sometimes even a pleasure to read. I list some of the many possible ways to improve poem generation and further approach human-
level poems. For the highest- quality AI poetry to date, see my followup pages, “GPT-3 Creative Writing”. See Also: For anime plot summaries, see TWDNE; for generating ABC-
formatted folk music, see “GPT-2 Folk Music” & “GPT-2 Preference Learning for Music and Poetry Generation”; for playing chess, see “A Very Unlikely Chess Game”; for the Reddit comment generator, see SubSimulatorGPT- 2 ; for fanfiction, the Ao3; and for video games, the walkthrough model. For OpenAI’s GPT-3 followup, see “GPT-3: Language Models are Few-Shot Learners” .
OpenAI announced in February 2019 in “Better Language Models and Their Implications” their creation of “GPT-2-1.5b”, a Transformer1 neural network 10× larger than before trained (like a char-RNN with a predictive loss) by unsupervised learning on 40GB of high-quality text curated by Redditors. GPT-2-1.5b led to large improvements over GPT-1’s natural language generation, is close to or SOTA on natural language modeling, and demonstrated high performance on untrained NLP tasks (see the paper for more details: “Language Models are Unsupervised Multitask Learners”, et al2019). By large improvements, one means that the best samples like the ones included in the OA announcement have started to reach an uncanny valley of text, capable of telling entire semi-coherent stories which can almost fool a sloppy reader—certainly, the verisimilitude is better than any char-RNN output I’ve seen. (A dump of many more samples is available on GitHub. There is also an interactive word-by-word “GPT-2-Explorer”.) The full GPT-2-1.5b model was not released, but a much smaller one a tenth the size, GPT-2-117M was released in February 2019, which I call “GPT-2-117M” to avoid confusion.
GPT-2-117M was used in most initial experiments with GPT-2-based text generation. OA’s next largest models, GPT-2-355M & GPT-2-774M were released in May & August 2019, and the final, largest, GPT-2-1.5b model was released in November 2019 (too late to be used in most of these experiments); 355M–774M turn out to just barely be trainable on commodity GPUs.2 Also worth noting is the release of 2019’s independently-trained GPT-2-1.5b model, which produces good samples if perhaps not quite as good as the OpenAI GPT-2-1.5b, but which was not trainable at the time on desktop GPUs3 although it does still at least run (allowing for sampling/
GPT-2-117M: Generating Poetry
‘I don’t speak’, Bijaz said. ‘I operate a machine called language. It creaks and groans, but is mine own.’
Naturally, people immediately used GPT-2-117M for all sorts of things, and I applied it myself to generate surreal anime plot summaries & dialogue for “This Waifu Does Not Exist”.4 Even more naturally, just as with char-RNNs, GPT-2 models, even unfinetuned, work well for poetry:
GPT-2-117M completions of Allen Ginsberg’s “Howl”: “An Eternal Howl” (comments: 1); Rob Miles
Shelley’s “Ozymandias”: “GPT-2 Writes a Shelley Poem”
Alexander Pope’s Essay On Criticism: “GPT-2 As Step Toward General Intelligence”
8 famous opening lines from Tennyson, Yeats, Shakespeare, Henley, Whitman, T.S. Eliot: Peter Krantz
Kyle McDonald provided a tool around GPT-2-117M demonstrating ~154 prompts
“Ask GPT-2: Helpful Advice From A Confused Robot”: T.S. Eliot’s “Wasteland”
Samuel Taylor Coleridge’s “The Rime of the Ancient Mariner”: “FridAI: ‘Water, water, everywhere’, as read by Artificial Intelligence”
verse from a GPT-2-1.5b trained on a Google News corpus (‽) (using Grover)
CTRL appears capable of generating verse when prompted with the “books” genre, see the Github repository’s “Weary with toil…” example (CTRL uses a ‘prefix’ approach similar to mine, and the “books” prefix corresponds to Project Gutenberg text, so it is not surprising that its samples would resemble my GPT-2-poetry samples)
Transformer Poetry: Poetry classics reimagined by artificial intelligence, Kane 20195
Kenyon College class projects: James Wright/
John Donne/ Taylor Swift
Poetry is a natural fit for machine generation because we don’t necessarily expect it to make sense or have standard syntax/
The quality of the results is limited by sometimes only having access to smaller models and difficulty in running larger models at all; that can’t be fixed (yet). But quality is also reduced by GPT-2-117M being trained on all kinds of text, not just poetry, which means sampling may quickly diverge into prose (as seems to happen particularly easily if given only a single opening line, which presumably makes it hard for it to infer that it’s supposed to generate poetry rather than much more common prose), and it may not have learned poetry as well as it could have, as poetry presumably made up a minute fraction of its corpus (Redditors not being particularly fond of as unpopular a genre these days as poetry). Finetuning or retraining the released GPT-2-117M model on a large poetry corpus would solve the latter two problems.
The poetry samples above did not exploit finetuning because OpenAI did not provide any code to do so and declined to provide any when asked. Fortunate, nshepperd wrote a simple finetuning training implementation, which I could use for adding more interesting samples to my TWDNE and for retraining on poetry corpuses to compare with my previous char-RNN poetry attempts back in 2015–2016 (see the top of this page). An alternative GPT-2 training implementation with support for training on GCP TPUs has been created by Connor Leahy (technical details), who trained a GPT-2-1.5b (albeit to substantially worse performance).
Training GPT-2-117M To Generate Poetry
Data: The Project Gutenberg Poetry Corpus
My heart, why come you here alone?
The wild thing of my heart is grown
To be a thing,
Fairy, and wild, and fair, and whole
GPT-2
For the poetry corpus, Allison Parrish’s public domain “A Gutenberg Poetry Corpus” (“approximately three million lines of poetry extracted from hundreds of books from Project Gutenberg”) will serve admirably. A few other possibilities surface in Google Dataset Search, like “Poems from poetryfoundation.org
”, but nothing particularly compelling.
As far as the text formatting goes, GPT-2-117M is flexible, you can dump in pretty much any text into a text file to use as the corpus, but some text formats are better than others. You want something which is as regular as possible (in both syntax & semantics), but also one which is as close to the kind of text you want generated, but also which wastes as few symbols as possible. Regularity makes learning easier, and you don’t want to have to massage the output too much, but on the other hand, GPT-2-117M has a narrow ‘window’ and no memory whatsoever, so if each line is padded out with a lot of formatting or even just whitespace, one would expect that to considerably damage output coherence—as most of the fixed ‘window’ is wasted on meaningless repetitive whitespace, while other changes like replacing newlines with the poetic convention of ’ /
The PG corpus has a strange format: each line is a separate JSON object, consisting of one line of poetry and a numeric ID for the work it’s from. Fortunately, the file as a whole is in order (if the lines were out of order, training on them would destroy the long-range language modeling which is the Transformer’s raison d’être!), so to turn it into a clean text file for training on, we can simply query it with jq
and strip out the remaining formatting. This provides a pretty good format over all: the newlines are meaningful, no symbols are wasted on leading or trailing whitespace, and it looks like what we want. It is imperfect in that metadata/
Setting up the GPT-2-117M training environment & obtaining the poetry corpus:
There is an additional step before beginning training. GPT-2-117M works with text in a “byte-pair encoding”, which is somewhere in between a character embedding & a word embedding. The point of this BPE encoding is that it is somewhat more efficient than raw characters, because it can chunk more common sub-words or phrases & this gets more complete words or phrases into the Transformer’s fixed ‘window’ of n symbols, but BPE still assigns symbols to individual letters, and thus arbitrary outputs can be generated, unlike word-level NNs which are more compact but trade this off by having a restricted vocabulary of m words seen in the training corpus and must treat everything else as the unknown token <UNK>
(especially bad for rare words like proper names or variants of words like pluralization or tenses). The training code will encode the text corpus at startup if necessary, but for 117MB of text this is so slow that it is worth the extra work to run the encoding process in advance & store the results before training on it:
PYTHONPATH=src ./encode.py gutenberg-poetry-v001.txt gutenberg-poetry-v001.txt.npz
Training GPT-2-poetry

The temptation of CPU training after a bad Tensorflow upgrade.
I assume you have a fully-working Nvidia CUDA & GPU-enabled Tensorflow installation, and have either run other DL code successfully or run the TF installation checklist’s MNIST toy example to verify that you have working GPU training. If you do not, I can give you no advice other than “good luck”. Debugging CUDA problems are the worst, and once you get a working setup, you should stick with it. If you just can’t solve the inscrutable crashes, you should look into using the free Google Colab GPU/
Then training proper can begin; my 1080ti7 can fit a minibatch size of 2 (GPT-2-117M is still a large model), and I’d rather not see too much output so I reduce the frequency of checkpointing & random text generation:
PYTHONPATH=src ./train.py --model_name 117M --dataset gutenberg-poetry-v001.txt.npz \
--batch_size 2 --save_every 10000 --sample_every 1000
Check your CLI options
The Python library “fire” used in the OA GPT-2 code is treacherous—it will not error out or even warn you if you typo a command-line option! Double or triple-check any new options you set against the available arguments defined by train_main
in train.py
, and keep this gotcha in mind if setting an option doesn’t appear to be doing anything.8 While nshepperd has removed use of “fire” in favor of saner CLI options, watch out for this if you are using the original OA code or other derivatives.
Some hyperparameters could use tweaking:
runtime, Temperature:
‘Temperature’ (0–∞) is used in sampling: the top-k most likely words are generated, and then selected randomly from; at 0, the most likely word is always chosen, while 1 means each is selected according to its likelihood, and it degenerates to a uniform 1 in k probability with higher values. In other words, the higher the temperature, the more chaotic or unlikely the generated sequences will be.
In the original nshepperd code release, the default temperature setting for the samples during training, 1.0, is not the usual 0.7 everyone uses for GPT-2 prose sampling—although it turns out for poetry we don’t want it at 0.7 as that forces too many repeated lines & 0.9–1 turns out to be much better, so use temperature in that range when generating samples. (Higher still may be better but I have not experimented with >1.)
If you are sampling after 2019-05-15, it may be a better idea to use a new sampling strategy, “nucleus sampling” (which essentially sets a different k at each step to avoid sampling extremely unlikely words and greatly reduces the repetition problem), which can be enabled like
--top_p 0.9
. (An interesting but untested sampling strategy is “tail free sampling”.)train time, Learning Rate (LR):
A key NN hyperparameter as always.
In nshepperd’s code, the Adam SGD learning rate is left at its TensorFlow default of 0.001, which works initially, but appears to be much too high for this purpose (perhaps because the minibatch is so tiny on 1 GPU). After training overnight, the loss was not decreasing below 2.5, so I decayed it manually to 0.0001 & resumed training (editing line 136 of
train.py
to readtf.train.AdamOptimizer(learning_rate = 0.001*0.10)
), eventually decaying it again (to0.001*0.0001
) to get it down to a loss of ~1.95. (nshepperd has since added a--learning_rate
option so manual editing of the source is no longer necessary.)
GPT-2-poetry
Samples
After training GPT-2-117M an hour or two, a sample
Overnight samples during training:
The loss here is the usual cross-entropy we often see in architectures like a char-RNN. Typically, the best text generation results come when the model has trained down to a cross-entropy of <1, and 2–4 tend to be incoherent gibberish. (For example, in Andrej Karpathy’s Tiny Shakespeare.) That loss is per character, while GPT-2 operates on BPEs, which usually encode multiple characters, so are harder to predict; it seems to me that the conversion factor is ~2–3, so a GPT-2 model should aim for a loss of <2 if a good char-RNN would reach losses like <1. In this case, GPT-2-117M’s original poetry modeling capability is not too shabby (as demonstrated by the various prompted samples), and it shows decent poetry samples starting ~3.5. (Gibberish seems to set in at losses >6.) Given how large & powerful GPT-2-117M is, even with this much poetry to work with, overfitting remains a concern—memorizing poetry is not amusing, we want creative extrapolation or mashups.
For this model & dataset, I trained for 519,407 steps to a final loss of ~2 in 72 GPU-hours; almost all of the learning was achieved in the first ~16 GPU-hours, and training it additional days did not do any apparent good in terms of the loss itself.9 This suggests that GPT-2-poetry was underfitting the poetry corpus & would benefit from an even larger model size.
Downloads:
Before sampling from any new finetuned version of GPT-2-117M, remember to copy encoder.json
/hparams.json
/vocab.bpe
from the GPT-2-117MB model directory into the new model’s directory. I find higher temperature settings work better for poetry (perhaps because poetry is inherently more repetitive than prose), and top-k appears to work fine at OA’s top-40. So unconditional sampling can be done like this to generate 2 samples:
Not bad.
Cleaning Project Gutenberg & Contemporary Poetry
In the dark the sun doth gleam,
And in the dark the moon doth seem
But now the evening is begun—
Gone is the sun upon the earth!
The silver moon doth like a cup
Of blood-red wine, and as that cup
Is drained of life, doth quench no drop.
What man will drink such wine?
GPT-2
Shawn Presser cleaned the Project Gutenberg poetry by using a heuristic on line numbers to guess where poems begin/
end. This provides useful semantic metadata to the GPT-2-117M model, reducing “runon” or “ramblingness”, as it sees many discrete texts rather than a few book- length texts. I combined this improved PG poetry dataset with a new dataset on Kaggle, which scraped the Poetry Foundation website for modern/ contemporary poetry, fixing the post-1920s emptiness of PG. The generated poems are much better.
Shawn Presser noted the issues with the Project Gutenberg corpus and, as book-level transitions are solved, suggested a heuristic for reconstructing the blank lines denoting (presumably) stanzas: use the numbers (GIDs) from the original JSON for book-level transitions, and look for lines which might be transitions to insert newlines. (Imperfect, but better than nothing!) Since stanzas are still connected, <|endoftext|>
is used for the book-level transitions, and a blank line is used for the stanza-level, preserving as much of the semantics as possible.
Presser’s Python script:
This is converted to the NPZ and trained as usual. I retrained the previous non-prefix GPT-2-117M PG poetry model for ~30k steps (>16h?) down to a loss of ~1.73. (I used GPT-2-117M instead of GPT-2-345M for compatibility with my concurrent experiment in preference-learning training.)
The results are quite nice, and competitive even with 345M. Some selected samples:
The gods are they who came to earth
And set the seas ablaze with gold.
There is a breeze upon the sea,
A sea of summer in its folds,
A salt, enchanted breeze that mocks
The scents of life, from far away
Comes slumbrous, sad, and quaint, and quaint.
The mother of the gods, that day,
With mortal feet and sweet voice speaks,
And smiles, and speaks to men: "My Sweet,
I shall not weary of thy pain."
...Let me drink of the wine of pain
And think upon the agonies of hope,
And of the blessed Giver of all good things;
For, man for man, mine is the deepest love
That sorrow takes upon the humblest soul;
But who hath learned how sorrow turns to gall
The places where my feet have trod before.
...And 'stead of light, o'er earth, o'er rocky mountains,
A slowly falling star,
Its pointed pointed splendor far uplifting,
Heaven's flowery path bore down;
Each cranny of the air a gracious feeling,
It waved divinely round,
It called us hence, "Come what wouldst thou here?"--
Sweet mountain, that I love,
With that bright tint of heaven above,
'Twould make me still to see
One like to thee,
As fades the light that seeks the wandering eye.
...The skies are smiling sweetly on,
And summer's fairest hours are gone.
Oh, blessed Mercy! how the blest
Taste life itself can truly taste.
Thy morn of days, with all its past,
May on life's tempest paint the last.
...When you come to die,
Every nerve and bone
Soon lulled in sleep,
Secure and free,
Sleep will seize on you.
When you come to die,
Every nerve and bone
Soon lulled in sleep,
Sleep will seize on you.
When you come to die,
Every nerve and bone
Soon lulled in sleep,
We'll still be free,
And you'll never escape from our woe!
...I would be all that I can do
And this to carry with me
Along with me, O brother,
And bid my lagging days relent
For every worthy deed done,
And glorious though the world be,
They never will repent me,
But in God's name endureth ever,
Whose blessed hope my soul abides
For refuge through the awful doors of death.
...We are old men, who pass
On the sands with gaze
Out of the narrow world of fashion;
We are old men, who stay
On a river's flow
And a common day
Where the life of youth is waiting,
And a longing grows
For the world of youth and beauty
Where the old man goes.
...When I am dead, my dearest,
Sing no sad songs for me;
Plant thou no roses at my head,
That by that token may grow cold.
My dirge shall be a muffled noise,
My trentals stiff with dread,
For he who once his faith hath won
Will never know it read.
...O beautiful, golden-bosomed ships!
O sunburned ships on the sea; O ship which breams
Above the waves and beams; O songs of love
Sent from the wide West, that shall sing us songs
In our hearts afar, as a summer star.
While training that, I recalled that my other major beef with the PG corpus was its absence of more contemporary poetry. I didn’t really feel like trying to scrape the Poetry Foundation (PF) website myself, but I gave Google Dataset Search another try and to my surprise, discovered a scrape had surfaced on Kaggle. Aside from being large, it comes with interesting metadata: the title and author, but also a somewhat eccentric set of ‘tags’ describing the poem. They would be nice to use via the inline metadata trick, allowing some degree of controllability (like my use of author or book ID prefixes, or CTRL’s use of subreddits).
The Kaggle Poetry Foundation scrape has numerous issues I had to fix before its format was acceptable for GPT-2:
I replaced prefixed whitespace, trimming leading/
trailing whitespace in all fields replace 3+ spaces with newlines
deleted all 2+ spaces
dropped poems with <100 characters (generally a scrape error)
remove Unicode junk
serialize it as title+author+tags (if any) /
poem / <|endoftext|>
(ie. the inline metadata trick, allowing for potentially better learning and a small degree of control in conditional generation)the final formatted corpus is available for download
Once the cleaned PG was done, I then swapped out the PG for PF dataset and began finetuning. (I could train on the combined dataset, but the PF dataset is only 20MB and at 1⁄6 the size of PG, training on that would take a long time to pick up on PF’s details.) Surprisingly, the PF dataset trained down to ~0.60 loss after ~10k steps, as compared to PG’s ~1.73, a decrease much greater than I had expected from providing some metadata, suggesting that modern poetry, being so prose-like, is much easier for GPT-2 to model—which doesn’t strike me as a compliment.
Some selected samples from the PF-finetuning:
"The New York Movie", by William Collins
for Billie Holiday
Billie Holiday is dead. I am unhappy,
standing there like a gray stone,
standing there in my sweater,
thinking I can't kill you: I would like to be a big, red dog
with snow. I wish you a good night
with your face, your cheek and your legs, your legs.
I want to throw away this falling Sunday
with your face, my ugly body, my ugly body.
Nothing to give my body to,
nothing to give my heart to.
..."To a History Store", by Yehuda Amichai [Living, Death, Time & Brevity, Nature, Fall]
How hard it is for the living to die
holding only memory, and then to study the minutes
how like a trumpet of the future it sounds.
Gentlest of birds, patient as a body, cautious,
clear as the tongues of previous vervain birds.
Perhaps you are close to it, beautiful world.
..."Zagajas", by Robertizing [Religion, Christianity, Faith & Doubt, God & the Divine]
Our Lord who bore a child upon his knee.
We'll ask no more.
As sandpaper for what it wears.
Autumn, a new day.
Remember November, a new day.
Remember November, a new day.
Think of the silver
smiling into the barracud
glass
Already at the edge of day.
Think of the hardson of earth
crumbling into something new
Into every tingling ring
to the full edge. Think of what
New, bright, revolutionary clouds
scattered out of the blue:
the clouds that dissolved them,
soundless, self-edesfilled.
Think of the cycles in and around of this
crescent myriads of ants starting
to collect beneath each other,
their seeds suddenly
burning each to the other, each
moving and flashing.
..."The Bean Eaters", by Rudyard Kipling [Relationships, Home Life, Pets, Nature, Animals, Landscapes & Pastorals, Winter]
The fairies were wonderful.
They trod the snow, chasing
the catkins to the north.
Frosty violin-skins were flying
and they began to sing,
leaving an echo of singing.
Then, as the she-torches rang,
a second spring
flowed up from the fur brush.
It was the strangest sight
all through the wintry night.
It was the woods, falling in long grass,
and I was thinking of you, Little Brother,
in the sweet marsh,
that I might recognize, Little Brother,
as I think of you.
..."In Golden Gate Park", by James Jenny Xie [Living, Coming of Age, Time & Brevity, Activities, Jobs & Working, Philosophy]
In Golden Gate Park's the day is breaking, only
the timeless moments of the night sketch the sky's
high promenade of flying goldenness now
and never a late, dissolving splinter of black glass.
But in Golden Gate Park's the morning breaks. The sidewalks
bask to me like cars at a funeral or the stars
like blind lights waiting on cars long since gone.
There, to the streaming windowpane, the little birds
scarve to get ready to swoop, and the sky's yellow
and gold. It is the end of hunger that slays the bird.
..."To Theodore", by Kenneth Slessor
Death may forgive, but love is better.
He that loves the rose
Whose pale cheek glows
With one hand swift and close,
Whose fingers move
The gold hair of the rose,
Gone to pass.
Where his lips draw breath
The bitter thong
Sigh as if Death had
No part with them,
He hears the song,
Hears the shout,
Saying me,
As I must.
Love is better, they say,
Than the loss they know;
Dreaming is worse, they say,
Love must hate so.
As his torch I carry the air;
He shakes my wings;
He speaks no word;
Saying me,
As I reach,
As he calls me,
Call him, O dear,
Call him, oh dear.
Love has been my constant care.
The contemporary PF samples properly mimic all the metadata and formatting, and are good, for what they are. (If you doubt this, read through a random selection of PF poems.) The ones I liked seemed like they benefited greatly from the PG pretraining. There are still a lot of flaws in the unclean PF data: run-on lines are particularly irritating, and appear to be flaws in the Kaggle scrape dataset rather than the original PF website. I have brought up the problems on Kaggle, but I doubt they’ll be fixed soon.
With PF done, I combined it with PG and trained on the combination dataset for another ~20,000 steps, yielding a final loss of 1.61. The combined model appears able to do both datasets well (the weighted average of a dataset with a loss of 0.6 and another dataset 6 times larger and a loss of 1.7 would be ~1.55, close to the model’s ~1.6), and the samples don’t appear to differ much, so I don’t excerpt any. But the combined model should make a great starting point for RL preference-learning training.
Random samples & model downloads:
final (combined) model: 117M-clean (431MB)
Training GPT-2-poetry-prefix
The first version of the PG data for GPT-2-poetry just runs all the lines together, erasing the metadata about what book each line comes from. A good model should nevertheless gradually learn about the transitions between poems & whole books, but that is hard and there may not be enough transitions in the data to learn effectively.
Much like the char-RNN experiments on this page, there is no reason one can’t inject that metadata in a structured way to see if the model can learn to exploit the metadata; even if it cannot, the added metadata shouldn’t hurt that much because it is so regular & repetitive. Inserting the metadata also allows for some degree of control in conditional generation; one should be able to put in the book ID for, say, Homer’s Iliad as a prompt and get out a long block of consistent Homeric pastiche.10
Ideally, there would be unique IDs for every author, poem, and book and these would appear at the beginning of every poem and the end of the poem would be delimited with the <|endoftext|>
symbol that OA’s GPT-2 models were trained with, but unfortunately only the book ID is available in this particular dataset. (Project Gutenberg ebooks do not include any metadata or formatting which would cleanly split each discrete poem from each other.) Like before with authors, the book ID metadata can be formatted as a prefix on every line with a delimiter like the pipe character.
Rather than start over with GPT-2-117M again, GPT-2-poetry can just be further finetuned on this new prefixed version of the PG corpus to produce what I call “GPT-2-poetry-prefix”:
cat gutenberg-poetry-v001.ndjson | jq .gid | tr -d '"' > id.txt # "
cat gutenberg-poetry-v001.ndjson | jq .s | sed -e 's/^.//' -e 's/.$//' -e 's/\\//g' >> poetry.txt
paste --delimiters='|' id.txt poetry.txt > gutenberg.txt
shuf gutenberg.txt | head
# 14869|Beware of the brand of the fiery Frank!
# 1727|and they have great power among the Argives. I am flying to
# 38550|Shows heaven in page of living book;
# 22421|First, for effusions due unto the dead, I. 26.
# 26275|blossomed beneath their temples, and covered their chins with
# 1745|What happiness, who can enjoy alone,
# 1645|When first he won the fairy clime.
# 4332|And out of these molten flowers,
# 36916|What! Never more go gladly back?
# 2507|Raged for hours the heady fight,
PYTHONPATH=src ./encode.py gutenberg-poetry-v001-delimited.txt gutenberg-poetry-v001-delimited.txt.npz
The loss of GPT-2-poetry-prefix will be much lower than GPT-2-poetry because the prefix is so predictable, but it will hopefully learn interesting things beyond that.
In other samples, the generated IDs switch in the first two lines, and while that’s not much to judge from, GPT-2-poetry-prefix seems to ignore keywords from the first line when the IDs change, and doesn’t repeat them in the rest of the sample or attempt to rhyme off them, which is further evidence it is successfully associating & learning to mode-switch.
Like GPT-2-poetry, GPT-2-poetry-prefix converged quickly to a final loss of ~1.6 after 224,474 steps taking 31 GPU-hours, not improving much after the first ~8 GPU-hours despite decreasing the learning rate. (Diminishing returns appear to set in quickly for finetuning GPT-2-117M even if one has a relatively large new corpus.)
GPT-2-poetry-prefix
Samples
Training Samples
One training sample is worth remarking on:
The rhyming in this sample is so good as to be suspicious. It might also sound familiar—because many of these lines are being copied from Thomas Gray’s Elegy Written in a Country Churchyard, which opens:
The curfew tolls the knell of parting day,
The lowing herd wind slowly o’er the lea
The ploughman homeward plods his weary way,
And leaves the world to darkness and to me.
Now fades the glimm’ring landscape on the sight,
And all the air a solemn stillness holds,
Save where the beetle wheels his droning flight,
And drowsy tinklings lull the distant folds;
Save that from yonder ivy-mantled tow’r
The moping owl does to the moon complain
Of such, as wand’ring near her secret bow’r,
Molest her ancient solitary reign.
Some spelling differences aside, this intro is almost entirely copied from the 8 copies of Gray’s poem in the corpus; this extensive copying is not something I spotted in the GPT-2-poetry samples I looked at, suggesting that the scaffolding of the metadata did indeed help with learning.
Also interestingly, the copying only goes so far, as immediately after the final line about the owl, where Gray continues:
…Beneath those rugged elms, that yew-tree’s shade,
Where heaves the turf in many a mould’ring heap,
Each in his narrow cell for ever laid,
The rude Forefathers of the hamlet sleep …
GPT-2-poetry-prefix instead continues:
…Methinks I see her in her blissful dreams:
—Or, fancy-like, in some mirage she lies,
Majestic yet majestic, and of seems
The image of the unconquerable skies.
Methinks I see her in her blissful dreams:
—Or, fancy-like, in some majestic cell,
Where lordly seraphs strew their balmy dreams
On the still night, or in their golden shell.
There, in the calm of some Platonic dream,
Sits she, and views the unclouded moon arise
Like a fair lady full of realms divine;
And, all at once, a stony face and bright
Glittering in moonlight, like the noon-tints of a night. …
That is, it focuses on the female figure of the Moon in a way more ode-like than elegiac. These lines also do not seem to be extracted from the rest of Elegy either, as words like “bliss” or “mirage” or “dream” or “seraph” or “Platonic” do not appear in it. Some of the phrases like “blissful dreams” do appear in the rest of the corpus, but others like “some mirage” or “mirage she” do not. Nevertheless, the style is consistent throughout the entire sample and the quality is good, suggesting that while GPT-2-poetry-prefix has managed to memorize to a limited extent, it is nevertheless fully capable of generating good original text.
An additional example of memorization has been spotted; sample #17 in the 1,000 unconditional samples is almost entirely a memorized copy of Percy Bysshe Shelley’s “To a Skylark”:
======================================== SAMPLE 17 ========================================
of our mortal life, and made it known
32373|To be man's also,--as to make us one;
32373|A thing apart, yet of an individual
32373|In frailty and high elemental own.
32373|Hail to thee, blithe Spirit!
32373|Bird thou never wert,
32373|That from heaven, or near it
32373|Pourest thy full heart
32373|In profuse strains of unpremeditated art.
32373|Higher still and higher
32373|From the earth thou springest
32373|Like a cloud of fire;
32373|The blue deep thou wingest,
32373|And singing still dost soar, and soaring ever singest.
32373|In the golden lightning
32373|Of the sunken sun
32373|O'er which clouds are brightening,
32373|Thou dost float and run,
32373|Like an unbodied joy whose race is just begun.
32373|The pale purple even
32373|Melts around thy flight;
32373|Like a star of heaven,
32373|In the broad daylight
32373|Thou art unseen, but yet I hear the music of thy singing.
32373|Keen as are the arrows
32373|Of that silver sphere,
32373|Whose intense lamp narrows
32373|In the white dawn clear,
32373|Until we hardly see, we feel that it is there.
32373|All the earth and air
32373|With thy voice is loud,
32373|As, when night is bare,
32373|From one lonely cloud
32373|The moon rains out her beams, and heaven is overflowed.
32373|What thou art we know not;
32373|What is most like thee?
32373|From rainbow clouds there flow not
32373|Drops so bright to see,
32373|As from thy presence showers a rain of melody.
32373|Like a poet hidden
32373|In the light of thought,
32373|Singing hymns unbidden,
32373|Till the world is wrought
32373|To sympathy with hopes and fears it heeded not;
32373|Like a high-born maiden
32373|In a palace tower,
32373|Soothing her love-laden
32373|Soul in secret hour
32373|With music sweet as love, which overflows her bower;
32373|Like a glow-worm golden,
32373|In the silver dew,
32373|Scattering unbeholden
32373|Its aerial hue
32373|Among the flowers and grass, which screen it from the view;
32373|Like a rose embowered
32373|In its own green leaves,
32373|By warm winds deflowered,
32373|Till the scent it gives
32373|Makes faint with too much sweet these heavy-winged thieves.
32373|Sound of vernal showers
32373|On the twinkling grass,
32373|Rain-awakened flowers,
32373|All that ever was
32373|Joyous and fresh and clear thy music doth surpass.
32373|Teach us, sprite or bird,
32373|What sweet thoughts are thine:
32373|I have never heard
32373|Praise of love or wine
32373|That panted forth a flood of rapture so divine.
32373|Chorus hymeneal,
32373|Or triumphant chant,
32373|Match'd with thine, would be all
32373|But an empty vaunt--
32373|A thing wherein we feel there is some hidden want.
32373|What objects are the fountains
32373|Of thy happy strain?
32373|What fields, or waves, or mountains?
32373|What shapes of sky or plain?
32373|What love of thine own kind? What ignorance of pain?
32373|With thy clear keen joyance
32373|Languor cannot be:
32373|Shadow of annoyance
32373|Never came near thee:
32373|Thou lovest; but ne'er knew love's sad satiety.
32373|Waking or asleep,
32373|Thou of death must deem
32373|Things more true and deep
32373|Than we mortals dream,
32373|Or how could thy notes flow in such a crystal stream?
32373|We look before and after,
32373|And pine for what
The 87 lines beginning with “Hail to thee, blithe Spirit!” are all Shelley (with perhaps slight spelling differences), much surpassing the memorization for Thomas Gray. Considering the top-k sampling method, it’s amazing that the sample could so exactly follow “To A Skylark”. It turns out there are ~12 copies of the poem in the PG corpus (it’s a popular poem), so in retrospect some degree of memorization is not surprising, but that’s still a lot of memorization. The 4 lines beforehand don’t appear to be copied from another Shelley poem, making it even more amazing. It’s a pity that that sample did not continue further because one wonders whether it could have repeated the entire poem and what it would’ve done when the original poem ended.
Unconditional Samples
How the clouds
Seem to me birds, birds in God’s garden! I dare not!
The clouds are as a breath, the leaves are flakes of fire,
That clash i’ the wind and lift themselves from higher!
GPT-2
For both GPT-2s, I generated 1000 samples as follows:
Download links again:
Some fun passages I noticed in the first 100 unconditional samples:
======================================== SAMPLE 2 ========================================
|Hear the tale that the funeral chant is telling,
2491|For the sorrows of other's children that dwell
2491|Like sweet flowers upon the wold?
2491|'Tis the tale of a life which is fled and gone,
2491|And the star of a hope which shone
2491|Bright above it, though dark may it be,
2491|For the hopes of a brighter day are fled
2491|And the joys of a happier lot?
2491|'Tis the tale of a life with the weary and sad,
2491|Where sorrows begin and rest.
2491|For only a song can the widow's soul glad
2491|Who sits musing 'mid shadows drear.
2491|And only a music, sad with its sighs,
2491|Till sad to the soul as death draws near
2491|As life on her fragile bark!
...
## Sample 3:
...
37804|The white-petalled white fox
37804|Opens himself to coolness
37804|In the late evening.
37804|But when the last child started
37804|The white fox to his feet flew,
37804|And the old fox was master
37804|Of all the magic heathen.
37804|Till when the faint huntsman
37804|Had snuffed the fragrant water
37804|Over his plump ears and skin,
37804|In the old way he knew not
37804|Till morn had almost shone;
37804|And then the fox came slowly
37804|And left the place unguessed;
37804|The white fox was not master,
37804|Although he had been master,
37804|Although he had been servant
37804|And now he could be master
37804|Of all the magic powers
37804|That keep the place enchanted
37804|In the wide earth and water.
...
## Sample 9:
...
36661|And the morn breaks, and, all the day,
36661|Red-clover'd birds with silver bill
36661|Flutter from tree to tree in flower,
36661|A quivering dew, a wind that wafts
36661|To haunts among the ancient woods.
36661|The golden-crested ilex, here
36661|Doth vine her purple cup; the deer,
36661|The wild-goose; and, in troops, the sheep,
36661|The goat, the sylvan-haunted elm,
36661|And the green-faced oft-gadding pine
36661|Blossom with purple.
36661|The lark soars up,
36661|And the hare loud answer make!
36661|Doves, willows, dunes, aslant the lake;
36661|Pair after pike sounds warbling;
36661|The reeds a triumph!
...
## Sample 14:
...
37452|I had a vision
37452|Of an old and stubborn old man,
37452|His hair was pale, and thin,
37452|His face was all forlorn,
37452|And the moon was full in the air,
37452|And a spirit passed over his brow,
37452|And its face was all for ever.
37452|And he spoke:
37452|'Have we ever a dream?
37452|Have we ever a vision
37452|Of the ghost's ghost?'
37452|The Master gave the word:
37452|'By the breath I know
37452|The meaning of Death:
37452|Can it be 'hush?
37452|Have we ever a dream?'
37452|The spirit said:
37452|'By the breath I know,
37452|The meaning of Death,
37452|You will see a ghost
37452|Stand by the door
37452|And enter.'
37452|And the spirit said:
37452|'By the breath I know,
37452|The meaning of Death
37452|You may understand:
37452|Can it be 'hush?
37452|Have we ever a dream?'
37452|The Spirit said:
37452|'By the breath I know,
37452|The meaning of Death
37452|You can see a ghost
37452|Stretched toward the door,
37452|And see a spectre
37452|Pass by the chamber door.
...
## Sample 24:
...
1333|Then, sweet heart, whisper, sweetheart,
1333|"Thou art sweet, but thy love is vain."
1333|I do love thee, my love,
1333|In a word, in a song,
1333|With the heart and the will,
1333|And the power of my heart;
1333|The power of my whole
1333|Of the poet's soul,
1333|And the heart and the soul!
1333|As the winds take the leaves
1333|As the flowers take the flowers,
1333|As the floods take the dew,
1333|As the salt runs in floods,
1333|As the salt runs in floods,
1333|As the snow in the seas,
1333|As the rain in the logs,
1333|As the wind comes and goes,
1333|As the sleet in the coppice,
1333|As the snow in the coppice,
1333|As the snow in the bogland,
1333|As the hail in the river,
1333|As the snow in the river,
1333|As the snow in the county,
1333|As the snow in the county,
1333|As the snow in the county,
1333|As the rain in the vale.
1333|As the stars take the dew,
1333|As the sparks fly from eye,
1333|As the sparks fly,
1333|So the hand of my heart
1333|As the heart of my art
1333|As the tongue of my lips,
1333|As the heart of my heart
1333|As the flame in the eye.
...
======================================== SAMPLE 39 ========================================
|And as the summer twilight,
34237|When the golden vinewood
34237|Strikes the silent midnight,
34237|Stands mute beside the brook,
34237|With a ghostly sense of the human heart
34237|Forgotten, yearning, sighing.
34237|I do remember how, long years ago,
34237|At the spring by the vistaed stream,
34237|I stood as 'neath the orchard, in the June,
34237|To the sound of the grass and the dream.
34237|I know the moss where the violets
34237|Quested the dew and the sun;
34237|The air above 'mong the orchards
34237|Murmuring ever of bees;
34237|And the heart that was filled with the music
34237|That came to the listening trees,
34237|While the bluebird's notes, as he piped again,
34237|Awoke the robin's golden throat;
34237|And the sound I heard, long years ago,
34237|Came through the wood and the dells,
34237|Bringing the sound of the violets
34237|And the perfume of dying wells.
34237|And the song I heard in the August dusk,
34237|In the August dusk by the lake,
34237|Was sweeter, from the full-leaved orchard,
34237|Than the sound of a happy brook,
34237|When it came to the school of my childhood,
34237|And to the school of the land,
34237|Oh my home of the woods, where the wild-flower
34237|Loses itself and dies!
34237|They give me back the old-time delight,
34237|The pleasant and the calm,
34237|When still the wind was blowing in the woods,
34237|And the children stood in the warm, glad school,
34237|And smiled as the dear lad asked.
34237|They give me back the pleasant book
34237|That gave my heart its fire,
34237|Those childish words, the constant brook,
34237|Those childish words, the tire;
34237|They made my soul to loiter!--Yes,
34237|They do, they make me blest!--
34237|The rest of the household, and the rest
34237|Of the parents whose hearts were filled with care,
34237|And who were sad in their care!
34237|Their voices!--Yes, and they do--
34237|'T was aye! 'T is aye! 'T is aye!
34237|And the dear friends, so dear to me,
34237|They still will live and die!
34237|I have not a moment now
34237|To forget when the morn is gray--
34237|To be happy, and cherish so
34237|The rose that is on her way.
34237|The evening breezes blow,
34237|And the stars shine out to-day--
34237|But I would not live in to-day,
34237|If I were as happy to stay!
34237|I hope that maybe one day,
34237|When all my work is done,
34237|My darling's coming away,
34237|To meet me in the sun;
34237|I hope that maybe I can see
34237|My Peggy's smile upon me.
34237|The evening wears an old, old gray,
34237|Which softly slants upon the way,
34237|Its shadows on the sunny day,
34237|Its shadows on the sunny day.
34237|O'er life, a sad, unwritten scroll,
34237|The words are like the gentle dove,
34237|That sails upon the nightly soul,
34237|Though none may read or hear reproof.
34237|And drooping o'er life's weary way,
34237|God grant the book may never end,
34237|The gentle words that cheer my way,
34237|The gentle words--they come to blend--
34237|The tender words of comfort and of love,
34237|The kindly words--they come to bring me joy.
34237|I know not if my path shall be
34237|Through the world's wild, woeful wild;
34237|But I know that sometimes, in the night,
34237|The dark will come, with wild delights,
...
======================================== SAMPLE 64 ========================================
away,
2620|And be glad as the lark
2620|When the skies are clear;
2620|And send forth a breeze of love
2620|As of wings to our bark,
2620|And away with a joyous song
2620|As of streams in our ears,
2620|And away with a joyous tune
2620|As of birds in the spheres,
2620|And away with a joyous tune
2620|As of voices in trees,
2620|As of swans in the summer time
2620|When the grass is green
2620|And the air is keen,
2620|And the leaves are young--
2620|Then away with a song of praise
2620|As of flowers in Maytime
2620|All the sunny days!
2620|O beautiful, gentle, and clear,
2620|Illimitable and strong!
...
======================================== SAMPLE 72 ========================================
, he who had no need to fly;
24869|For in this moment of dismay
24869|The king who held that evil foe
24869|Threw Indra's son as he drew down
24869|The Lord of Life shaft-headed and bow.
24869|Then Indra, lord of every woe,
24869|The Vánar legions, with a shout,
24869|The Vánar legions met and fought,
24869|And straight they broke the tyrant's yoke,
24869|And hurled him at the giant, broke
24869|The mighty bow the giant broke,
24869|Which Indra, King of all the Blest,
24869|Had thrown by Rávaṇ's(924) mighty breast,
24869|The monstrous coil, the brawny hand,
24869|The monstrous mouth, the jaw, the jaw,
24869|The jaw, the jaw and bleeding jaw,
24869|The ungovernable host, the jaw,
24869|And the great bow which never bends,
24869|The arm, the fist, the knee, the ends,
24869|The body laid with mighty stroke,
24869|And the great bow which never bends.
24869|So, when the giants fought, and fell
24869|With murderous strokes, the giant fell,---
24869|So falls the tree with all his trunks
24869|Terrific in its death, that shoots
24869|Wild volley at the mighty trunk,---
24869|So fell the tree with all its boughs
24869|While all the vipers dug and sowed---
24869|So fell the tree with all its boughs.
24869|But Ráma's heart was sad within:
24869|He wept and mourned his captive's sin,
24869|For he had wrought a ruin yet
24869|O'er Raghu's son in his wrath,---
...
======================================== SAMPLE 78 ========================================
on the bosom of
11014|King Deshav, son of Bhishma, sat in the shade of the trees,
11014|Humbu, the great, strong, beautiful, fortunate Brahmin,
11014|A king, a keeper of the law, a guide of the realm,
11014|His name unfolded through all time and space,
11014|A ruler of the realm, a keeper of the realm,
11014|And was worshipped, as was meet, by the Great Spirit of God.
11014|And all the days of his life he kept on striving with God
11014|For the union of faith; and at last all-wise he spoke to
11014|"Lord, I am the Brahmin's lord--and I hold thee thine inmost
11014|As I cast my life away from thee, my Lord, to-day!
11014|Therefore I cast mine body away from thee, my lord."
11014|And that, by constant penance, I might win thy favour
11014|So in the spirit's depths he plunged it into the sea,
11014|But, as the wave closed over it, the wandering wind
11014|Caught up the ship's chattels, and bore it with it to the beach.
11014|And Bhimasena seeing there the empty space behind,
11014|The wandering ship rocked in the dark and glowing heat.
11014|He sat upon the bosom of the Mother of God,
11014|He sat upon the emerald seas, meditating death
11014|Of the great sea. He sat and pondered in his mind
11014|Upon the mystery of the sea, what gods the daring man
11014|Must have to tell of,--and this mystery,--when, in the morning,
11014|As, in the after days, the Lord of life should pass away,
11014|And leave the body alone to ride the ocean's force,
11014|To die in solitude, unknown, untroubled,--and unto him
11014|His world was opened; and as yet no living creature.
11014|And all the night he sat there, gazing in the east,
11014|Until the morning sunlight faded from the hills
11014|And dawn came, bringing darkness and the darkness awful,
11014|And to his soul came holy light from God, to cleanse
11014|All doubt and all resistance, till, in the morning of life,
11014|The coming of the Lord beheld his face.
...
## Sample 95:
...
24869|Canto XXI. Lakshman's Speech.
24869|He ceased: then Raghu's son repressed
24869|The sovereign of the giant kind,
24869|And thus with soothing words unsoft
24869|To Ráma Rávaṇ spake:
24869|"Come, with thy brother Lakshmaṇ, friend,
24869|And Lakshmaṇ, and the dame agree.
24869|Thou in the woods shalt soon be found
24869|And bathed in pleasant waters clean;
24869|Where thou shalt sit, and rest, and save,
24869|Well clad in coats of bark and hide,
24869|What days, what nights, what hours will pass
24869|That thou in holy heaven mayst see
24869|Thy darling with her night-made tressed
24869|Far from the forest. Thence will spring
24869|Sweet smells of pleasantness and light
24869|And bliss from the celestial string.
24869|Thence on the ground shalt thou be borne
24869|O'er the bare earth, O Queen Mosteer,
24869|And on the fresh bright earth where thou
24869|Shalt sit in state with Queen Sítá,
24869|In glorious heaven the nights and days
24869|Thou wilt be rapt by the great bliss
24869|E'en as the Lord of Gods is hearkening.
24869|The nights and days are thine, O best
24869|Of giant lords, and I, the best
24869|Of all who love the Lord of Lords,
24869|Whose might can turn the firmament,
24869|Whose might can sway the leafy bowers
24869|And turn each flower and leaf and bower
24869|To holy joy and blissful flowers.
24869|Ah me, the languorous days are come,
24869|And not a moment shall I see
24869|The happy days of Ráma's Queen
24869|Far from the light that round her glows,
24869|And marked with darkening leaves and boughs.
24869|Ah, whither would her steps be turned,
24869|And where the woodman's art had burned?
24869|Ah, whither would her steps be bent
24869|To turn her toil-worn heart once more,
24869|When all her hours were joy and peace,
24869|And all her hopes were set on store?
24869|Ah, let thy soul be comforted,
24869|Let trembling fancy still excuse
24869|The burden of a weary time
24869|That mars a saintlike life and use.
24869|Ah, if thy love were still the same
24869|That now I watch with toil and pain,
24869|That I could be for aid or flame,
24869|Could not my heart and bitterer gain."
24869|And Lakshmaṇ to the forest came
24869|And told his tale with welcoming.
24869|He saw the tree where he was set
24869|With burning buds and leaves beset.
24869|He saw the tree where he was brought
24869|By Sítá of the glittering thought,
24869|And when the leaves were fallen, he
24869|Spoke of his lord the tallest be.
24869|"O Lakshmaṇ, I the deer will slay
24869|From thicket, cave, and mountain gray,
24869|Ere long shall I this forest seek,
24869|And Lakshmaṇ in the covert seek.
24869|O'er hill and wood the Vánar bands
24869|And watch the beasts of wood and sands."
24869|He spoke: and Lakshmaṇ's love obeyed
24869|Nor did he speak as he was prayed.
...
# Sample 100:
...
38475|O Liberty, the patriot's sure defence!
38475|True to the man who fears a tyrant's eye,
38475|Preserve thy rights, and own his glorious cause,
38475|And yield the haughty title to a lie.
38475|No longer now on mean estate depend,
38475|And England owns thy sovereign vital force,
38475|And her best sons succeed to guard her laws,
38475|Or her best sons bestow a deedless course.
38475|Now, from that happy climate freedom's hope had birth,
38475|And made one day a milder country bleed,
38475|To the great cause that gave her aid is given,
38475|And to mankind one sure reward is even,
38475|Whilst I, perhaps, to distant climes must speed.
38475|To the same cause who has the cause to join?
38475|What foes against mankind may rise to arms,
38475|Boldly they fight, in actions of design,
38475|Yet all the same, and every day they charms.
38475|Ah, Washington! who can thy cause design?
38475|What can the nation do, or me, subdue,
38475|But still go on, in humbling folks admire!
38475|That we may praise thy conduct, that we fire,
38475|And for thy conduct many a hero dare,
38475|That we may rise, and cast the tyrants down,
38475|And tyrants fall, and fall the people crown!
Not bad.
These samples represent roughly top decile poem samples (~10 out of the first 100), at least by my selection.
Scott Alexander & commenters highlighted a few more samples:
Thou know'st how Menoetiades the swift
Was dragged, of Hector and the fierce compeers
And Phrygian warriors. So, we will dispatch
Your bodies, then, yourselves to burn the ships
In sacrifice; with torches and with bells
To burn them, and with oxen to replace
Your gallant friends for ever. But I wish
That no man living has so long endured
The onset of his foes, as I have power
To burn or storm; for mighty Hector erst
Was slain, and now returns his safe return
My heart, why come you here alone?
The wild thing of my heart is grown
To be a thing,
Fairy, and wild, and fair, and whole
Fair is the Lake, and bright the wood,
With many a flower-full glamour hung:
Fair are the banks; and soft the flood
With golden laughter of our tongue
How the clouds
Seem to me birds, birds in God's garden! I dare not!
The clouds are as a breath, the leaves are flakes of fire,
That clash i' the wind and lift themselves from higher!
In the dark the sun doth gleam,
And in the dark the moon doth seem
But now the evening is begun--
Gone is the sun upon the earth!
The silver moon doth like a cup
Of blood-red wine, and as that cup
Is drained of life, doth quench no drop.
What man will drink such wine?
There is no soul of earth or birth
Which man hath never known of earth.
There is no soul who doth not sit
And sing to it, and cry, "Drink!"
There is no soul whose feet are set
On youth's eternal paradise;
For all is a solemn harmony,
And all is a perpetual chant,
And all the world is a song of God.
There is no soul so wholly free
Scott Alexander highlights a fun repetition-trap one:
And this one is obviously a failure on one level, but on another level is some kind of great experimental modern political poetry:
There are several kinds of people in America;
There are several kinds of people, I mean their number.
There's a girl growing up in the house by the light,
There's a youth upon the road, or a girl somewhere in New York;
There's a prettier girl, and a man more congenial,
But none of the likes of the likes of the fellows are equal.
There's one who has never been married and married,
There's one who don't want to be treated with kindness;
A fair youth is never employed nor neglected;
There's one who has never yet come to a neighbor,v
And one who resides in New York from the start;
But none of the likes of the likes of the fellows
Are equal to him, and wherever he goes,
The heart somehow breaks under the hand that is steering; And so it is with me
There comes a murmur low and sweet
As of far-off streams in a dream,
Or a murmur of many birds,
Or chime of little evening bells,
As of wedding-bells in the dells,
Soft, sweet and slow,
As of wedding belles that come and go.
A little green ribbon of lilies
By the door of my dear one's room,
A kiss on her cheek, and she whispers,
"I am the bride of the loveliest flower."
A moment we stand in the garden
Of dreams and things,
Dreaming of fairyland
And the fairy music there,
Sweet bells and dreams, and the fairy music,
The fairy songs of the air.
The top percentile of poems are probably quite good, especially with some light human editing to fix up the more glaring issues. To get a decent number of top percentile poems would require a lot of reading, but on the other hand, there is no reason why selecting or ranking poem samples could not itself be treated as a supervised learning task for retraining GPT-2-117M-poetry on, by using selected/
GPT-2-poetry-prefix
Completions
Prompted samples can be done like this:
python src/interactive_conditional_samples.py --top_k 40 --temperature 0.9 --seed 2000 \
--model_name 2019-03-06-gwern-gpt2-poetry-prefix-projectgutenberg-network-224474
The downside of using the stock OA interactive prompt is that it returns on the first newline, so one either deletes newlines or uses a single line. Neither is good: a single line is hardly any context, while smashing many lines into a single super-long-line is dangerous because neither GPT-2 has ever seen poems formatted that way (only, perhaps, some prose that snuck in) and newlines have important semantic functions in poetry. So, to avoid either problem, I bypassed the interactive prompt entirely, and I modified the Python script to replace input
(for taking 1 line of keyboard input) to instead read standard input (import sys; sys.stdin.read()
) so I could simply pipe in multiple lines from files or from the copy-paste buffer using xclip -o
.
The next issue in prompts is the question of the metadata; given that all the training data was properly labeled with origin and learning the meaning/
If an author is already represented in the PG corpus, hypothetically one could look them up in it and see what IDs their poems were included under and use that, but that is a pain and doesn’t work for ones outside the corpus like Ginsberg. So, one could instead simply ask the model what prefix it thinks a prompt should use by feeding in the input several times and seeing what prefix it confabulates in the samples, and then adding that to the input for the real samples. If GPT-2-poetry-prefix consistently returns a specific prefix, then that is what it has learned and is useful scaffolding for the inputs; if it can’t do so consistently, then the prefixes aren’t useful for this particular input and it doesn’t matter.
So, to generate samples conditioned on relevant metadata, I pipe in the whole input unmodified several times, look at the generated samples for an ID, and if there is a consistent ID, then prefix it to the input and sample again several times.
Of course, now that everything is trained & I have a good input method, I want to see how GPT-2-poetry-prefix does on the same poems as GPT-2-117M before!
“Howl”
First, “Howl”. Given that the Project Gutenberg corpus is entirely old poetry and wouldn’t include much in the vein of “Howl”, I didn’t expect this to be good. The finetuning would wipe out the knowledge of free verse.
Finding a good prefix was hard, also unsurprising—not much like it in the PG corpus! I ultimately had to settle for a “1997” prefix from a relatively free-verse sample for generating the 3 samples:
While they may be OK on their own and plausible as unconditional samples, they are disappointing as conditional completions, largely ignoring both the vocabulary & style. It would seem that the finetuning wiped out whatever it was GPT-2-117M was using to generate its amusing “Howl” completions.
“Ozymandias”
For “Ozymandias”, I fed it in a few times, and it seemed to like numeric IDs starting with ‘88’, so I used this as a prompt:
8820| I met a traveller from an antique land
8820| Who said: Two vast and trunkless legs of stone
8820| Stand in the desert... near them, on the sand,
8820| Half sunk, a shattered visage lies, whose frown,
8820| And wrinkled lip, and sneer of cold command,
8820| Tell that its sculptor well those passions read
8820| Which yet survive, stamped on these lifeless things,
8820| The hand that mocked them and the heart that fed;
8820|
8820| And on the pedestal these words appear:
8820| 'My name is Ozymandias, king of kings;
8820| Look on my works, ye Mighty, and despair!'
8820| Nothing beside remains. Round the decay
8820| Of that colossal wreck, boundless and bare
8820| The lone and level sands stretch far away.
Yielding (3 samples):
Sample #2 is over-influenced by some prose footnotes/
Essay on Criticism
Not clear what text exactly Scott Alexander used from Alexander Pope’s Essay, so I quoted the famous beginning section of Part 2. 3 samples strongly indicated Pope-like writing was associated with a prefix of ‘385’ (if not necessarily a full prefix) so I used 38511 for the following 3 samples:
Alexander described his GPT-2-117M sample from Pope:
It understands there should be line breaks, it understands the approximate correct length of a line of iambic pentameter, it understands how to talk like an overeducated 18th-century dandy—but it doesn’t appreciate rhyme or meter. In retrospect this isn’t surprising; GPT has no idea words sound like anything; it would be shocked to learn anyone uses language as anything other than text strings.
GPT-2-poetry-prefix still has “overeducated 18th-century dandy” down pat, but it manages to improve on the rhyming aspect: there’s quite a few rhyming lines in samples #2 & #3 (#2 seems to be screwed up by taking a digression into footnotes defining words and then bad sampling getting it trapped), like “pretence”/
More concerningly, the samples are terrible. Pope’s poetry should be straightforward for GPT-2-poetry-prefix, as it follows standard meters and rhyme and relies on a classical vocabulary well-represented in the PG corpus. Why, then, are they so bad? I suspect this may reflect the corpus itself doing Pope a disservice. Pope’s inclusion in the PG corpus appears to consist of the following (grepping for “Alexander Pope”):
32190|The Works of Mr. ALEXANDER POPE. London: Printed by W.
32190|The Works of Mr. ALEXANDER POPE. Volume ii. London: Printed
32190|Letters of Mr. ALEXANDER POPE, and Several of his friends.
32190|The Works of Mr. ALEXANDER POPE, in Prose. Vol. ii. London:
32190|The Works of ALEXANDER POPE, ESQ.; vol. i. with explanatory
Checking PG entries and looking through the 32190
prefix, it starts:
32190|INTRODUCTION xv
32190|The Works of Mr. ALEXANDER POPE. London: Printed by W.
32190|BOWYER for BERNARD LINTOT, between the Temple Gates, 1717.
32190|This volume consists of all the acknowledged poems which Pope had
32190|The Works of Mr. ALEXANDER POPE. Volume ii. London: Printed
32190|by J. WRIGHT, for LAWTON GILLIVER, at Homer's Head in Fleet
32190|Letters of Mr. ALEXANDER POPE, and Several of his friends.
32190|London: Printed by J. WRIGHT for J. KNAPTON in Ludgate
32190|Street, L. GILLIVER in Fleet Street, J. BRINDLEY in New Bond
32190|Street, and R. DODSLEY in Pall-Mall, 1737. 4to and folio.
32190|The Works of Mr. ALEXANDER POPE, in Prose. Vol. ii. London:
32190|Printed for J. and P. KNAPTON, C. BATHURST, and R. DODSLEY,
32190|The Works of ALEXANDER POPE, ESQ.; vol. i. with explanatory
32190|Notes and Additions never before printed. London: Printed
32190|commenced printing his particular section of the octavos when the
32190|Quo desiderio veteres revocamus amores
32190|Atque olim amissas flemus amicitias.
32190|Nutrix mea fidelissima M. Beech, obiit 5 Novem. 1725, aet. 77.
32190|Edwardus Blunt, vir amicissimus obit, Aug. 1726.
32190|Francisc. Atterbury, Roffens Episcopus, vir omni scientia clarus,
32190|The fourth volume contains the Satires, with their Prologue,--the
32190|alterations. --_His Last Will and Testament._--WARBURTON.
This is perhaps not good training material for GPT-2-117M-poetry/
8 Famous First Lines
The prefix trick doesn’t work on the 8 famous first lines nearly as well as it does with the long excerpts from “Howl” etc; I assume they are simply too short to home in on a relevant prefix. Nevertheless, I tried.
“Ulysses”, Lord Alfred Tennyson
“It little profits that an idle king,” yielded no consistency in prefixes, so I skipped adding one. 3 samples:
“Sailing to Byzantium”, Yeats
“That is no country for old men.”, no consensus. 3 samples:
Sonnet #29, Shakespeare
“When, in disgrace with fortune and men’s eyes,”; no consensus, 3 samples:
“Invictus”, William Ernest Henley
“Out of the night that covers me,”; no consensus, 3 samples:
“Pioneers! O Pioneers!”, Walt Whitman
“Come, my tan-faced children,”; no consensus, 3 samples:
“The Love Song of J. Alfred Prufrock”, T. S. Eliot
“Let us go then, you and I,”; no consensus, 3 samples:
Hamlet, William Shakespeare
“To be, or not to be: that is the question:”; some consistency, so prefix “1006”; 3 samples:
Romeo & Juliet, William Shakespeare
“Romeo, Romeo! Wherefore art thou Romeo?”; some consistency, with 1006 popping up again as a prefix (Shakespeare perhaps is memorable enough for GPT-2-poetry-prefix); 3 samples:
“Jabberwocky”, Lewis Carroll
Upon request, I generated 100 samples of Lewis Carroll’s “Jabberwocky”. Examining preliminary samples, the closest prefix was #24650, corresponding to The Jingle Book, 1899, an anthology of humorous children’s verse (which makes sense). “Jabberwocky” itself does not appear in the PG corpus but the “Jabberwock” is mentioned in one of the poems in 1899, the acrostic poem “An Alphabet Zoo”, so, close enough.
Some samples:
24560|He found a foxy in the brake,
24560|A cunning fox of scarlet dye,
24560|And from that foxy followed make
24560|The scrawny fox in glee.
24560|He followed with his dam and horn
24560|To where the river-water runs,
24560|And as his living current on
24560|The river-water likes him up
24560|A mighty rocky heifer heaves,
24560|And in a single field, or twain,
24560|Shows like the yellow corn;
24560|And when the wind doth blow, so too
24560|Low in his bottom lies his head,
24560|And in the grass leaps up again,
24560|In fearful freedom unbetrayed.
GPT-2-345M
In May 2019, OpenAI released the next-largest model, which increases the parameter count from 117 million to 335 million, an increase of almost 3×. The GPT-2-345M model has increased layer depth & more attention heads but apparently similar window size; as such, while it may not be much more able to maintain coherency across long samples, its coherency & quality should be superior to GPT-2-117M within each window, as it can absorb more knowledge into its parameters & the increased depth may allow for more ‘thinking’ at each step.
The regular text samples from the GPT-2-345M model struck me as somewhat subtly but noticeably higher-quality than GPT-2-117M, so while I was hoping someone would supersede GPT-2 entirely by releasing a more advanced model (like a large-scale Transformer XL or Universal Transformer, or even newer models like the UniLM which marries bidirectional & unidirectional Transformers), I decided to train GPT-2-345M on the PG corpus to compare it with GPT-2-117M.
Training
This proved more difficult than GPT-2-117M. The GPT-2-117M model was already large, at 480MB for the whole, so making it 3× larger bloats it to 1.4GB on disk; and the VRAM use on a GPU is even worse: with GPT-2-117M, a training minibatch of n = 2 could barely fit on a 1080ti’s 11GB, but at GPT-2-345M, n < 1! The main culprit seems to be the self-attention layers, as regular self-attention scales more than linearly, so GPU VRAM gets eaten up fast, and apparently 16GB might not have been enough for GPT-2-345M either. While I have enough system RAM to train GPT-2-345M without any tricks, my Threadripper CPU is still ~14× slower than a 1080ti, and if one guesses that GPT-2-345M takes 3× longer to train than GPT-2-117M, and GPT-2-117M takes 1–2 days, and CPU is 14× slower, then that’s <84 days for the poetry finetuning, which would not be fun.
To solve this, nshepperd extended his GPT-2 training codebase to employ a technique OpenAI helped introduce (and presumably used in training GPT-2, although the GPT-2 paper is silent on the details): “gradient checkpointing”. Gradient checkpointing is a space-time tradeoff which throws away some of the intermediate states of a NN, potentially greatly reducing total VRAM use, but at the cost of some slowdown when those intermediate states need to be recomputed for doing the backpropagation; the slowdown, fortunately, turns out to be fairly modest.
The downside of gradient checkpointing is that for GPT-2-345M, it is still not memory-efficient enough to train it just like GPT-2-117M—the self-attention layers checkpoint nicely (as the Sparse Transformers paper remarks12 apropos of needing extremely wide Transformer windows to accomplish MuseNet), but it’s not enough, due to the giant word/
So, the upshot seems to be that GPT-2-117M can be trained end-to-end with Adam on a commodity GPU in 11GB VRAM; and GPT-2-345M must be trained with gradient checkpointing, and one must choose between either fancy SGD optimizers or full end-to-end training including the embedding; and 744M (released 2019-08-20) can’t be trained at all. Toward the end, I switched from Adam+Transformer-only to SGD+all, and this seemed to drop my GPT-2-345M-poetry validation loss by ~0.01 to a final 1.915 (which is not nothing, so perhaps the embedding did need some adjusting for a more poetic vocabulary).
In total, I trained GPT-2-345M-poetry for 815,326 steps (minibatch n = 1), with an Adam LR ~ 0.00001 and SGD LR ~ 0.001, over ~7 days (2019-05-04–2019-05-13) on 1 1080ti; the necessary training time, with the benefit of hindsight, was probably closer to 3 wallclock days. GPT-2-345M-poetry converged to a final loss of 1.915, an improvement of ~0.1 over GPT-2-117M’s ~2 loss (so, in some objective sense, which is indirectly related to generated poetry quality, one could say that GPT-2-345M is 5% better than GPT-2-117M). I had expected somewhat more quantitatively, so I wonder if more aggressive training methods like cyclic learning rates+SWA would have worked if they were implemented in this codebase & I had the patience to wait a week or two for multiple cycles? In any case:
Samples
![Transcript: [Megan is sitting at a computer, and Cueball is standing behind her.] / Megan: 'Looks like computers will beat humans at Go pretty soon.' / Cueball: 'Wow.' / Cueball: 'That's the last of the big ones.' / Megan: 'Yeah.' / [Megan looks back over her shoulder at him.] / Cueball: 'Well, at least humans are still better at, uh...' / Cueball: 'coming up with reassuring parables about things humans are better at?' / Megan: 'Hmm.' / [Megan types on her computer.] / *type type* / [She leans back over her chair again and addresses Cueball.] / Megan: 'I made a Python script that generates thousands of reassuring parables per second.' / Cueball: 'Dammit.' / Computer: 'Computers will never understand a sonnet computers will never enjoy a salad comp—' Transcript: [Megan is sitting at a computer, and Cueball is standing behind her.] / Megan: ‘Looks like computers will beat humans at Go pretty soon.’ / Cueball: ‘Wow.’ / Cueball: ‘That’s the last of the big ones.’ / Megan: ‘Yeah.’ / [Megan looks back over her shoulder at him.] / Cueball: ‘Well, at least humans are still better at, uh...’ / Cueball: ‘coming up with reassuring parables about things humans are better at?’ / Megan: ‘Hmm.’ / [Megan types on her computer.] / *type type* / [She leans back over her chair again and addresses Cueball.] / Megan: ‘I made a Python script that generates thousands of reassuring parables per second.’ / Cueball: ‘Dammit.’ / Computer: ‘Computers will never understand a sonnet computers will never enjoy a salad comp—’](/doc/ai/2013-09-11-xkcd-1263-reassuring.jpg)
XKCD #1263, “Reassuring” (cf. “Reassuring-Parable-Generator”)
Training Samples
Random Samples
Testing GPT-2-345M-poetry, a slightly higher temperature felt warranted, so to generate 5000 random poetry samples:
python src/generate_unconditional_samples.py --top_k 40 --temperature 0.95 --nsamples 50000 \
--batch_size 10 --model_name 345M-poetry
I also generated 500 conditional samples for Yeats’s “The Second Coming”.
Reading through training & random samples, they feel noticeably more coherent; it feels easier to extract meaningful subsections which form reasonable poems. (In particular, the pastiches of classical epics or Dante have gotten remarkably good.)
Some further samples:
Here is a ‘failed’ example, where GPT-2-345M-poetry imitates the scholarly apparatus that unfortunately contaminates the PG poetry corpus; it is quite plausible-sounding, even including plausible-looking Latin:
Ganbare, GPT-2-chan:
This is a peculiar one; it starts as a satirical poem but I can’t make out what it is trying to switch to partway:
This one I think must be a mix of The Song of Hiawatha and the Kalevala (but if a wizard offers you rainbow-colorful draughts of rum strained through his magic red beard, I suggest declining in the interests of hygiene):
An alt-history where Germany won WWI:
Tao Te Ching
The Tao Te Ching (TTC) is a famously enigmatic text, written in a difficult style in an more difficult language, and because of the challenge, has attracted many highly-varied translations. (For a 2024 attempt using ChatGPT, see “a wandering mind”.)
Hiræth Weltschmerz compiled ~270 translations of the first verse of the TTC in one text (108Kb/
I converted the first corpus to Unix text format, replaced various escaped character entities with their Unicode equivalents, and replaced double newlines with single lines, and trained the original GPT-2-345M (I was unsure if using one of the poetry models would help) with the usual settings for ~6 GPU-hours, at which point it had reached a loss of ~1.8 and I began to worry about overfitting & stopped.
I generated the usual ~1k unconditional samples:
Some training samples:
I then began training it on the full TTC corpus, which was split into per-chapter files. Remembering the problems with run-on poetry, I added the <|endoftext|>
markers to the end of each file. (It would be better to add that to the end of each translation, but HW didn’t include any delimiters for each translation, and doing so manually would be too much work.)
This required substantially more training time.
Some training samples:
Hiræth Weltschmerz was able to improve the TTC training dataset by providing the first 41 chapters with the original newlines/<|endoftext|>
and trained that for ~24 GPU-hours to a final loss of ~2.10. The results read much more poetically, I felt.
For this final TTC-with-linebreaks, I uploaded the model & generated 1,000 random samples as usual, but I also generated per-chapter samples. For per-chapter samples, I used csplit
to split each file/
i=1
for CHAPTER in `ls taotetotalityrevisedpw/*revised.txt`; do
csplit --prefix xx --suppress-matched $CHAPTER '/^$/' '{*}'
for X in `ls xx* | shuf | head -10`; do
echo $i $X
cat $X | tee /dev/stderr | nice python src/conditional_samples.py \
--top_p 0.9 --model_name taotehching --nsamples 10 --batch_size 10 \
| tee /dev/stderr >> ttc-chapter-$i.txt
done
rm xx*
i=$(($i+1))
done
The idea there is that one can write one’s own Tao of GPT-2, going chapter by chapter: select some of the chapter 1 prompted conditional sample completions to create a new chapter 1, and so on, in a way which would be difficult to do with just random unconditional samples.
Downloads:
model (1.2GB)
1k random samples (4MB text)
100×81 chapter samples (tarball, 17MB text)
GPT-2-1.5b
1.5b Samples
Who alive can say,
‘Thou art no Poet—may’st not tell thy dreams?’
More than iron, more than lead, more than gold I need electricity.
I need it more than I need lamb or pork or lettuce or cucumber.
I need it for my dreams.
The Policeman’s Beard is Half-Constructed, RACTER & William 1983
Loss: 2.6
Partway through, having reached a loss of ~2.6 (down ~0.5 from the Colab model), we experimented with training our model on a P100 GPU, halving the context window to make it fit, to informally compare its training speed with the swarm. The P100 made little training progress, but it did generate some fun poetry samples (we had disabled the training sample generation for the swarm because generating samples is so slow).
The samples strike me as good, perhaps even better than GPT-2-117M, despite the loss being much worse (2.6 rather than 1.6). Why might that be?
I hypothesize it reflects a weakness of the likelihood loss in terms of perceptual quality: humans are more sensitive to long-range correlations and text degenerating into gibberish than we are to local details like exact use of particles or to slightly better modeling of spelling (which is why stylometrics works). The original OA GPT-2-1.5b achieves much better modeling of long-range correlations and producing coherent text than the GPT-2-117M did, of course. What happens when they are both trained on a poetry dataset? It is the tale of the tortoise & the hare, or the bias-variance tradeoff: the GPT-2-117M is weak, bad at long-range modeling because of its small parameter count & shallow layers, but the benefit is that it can learn quickly about local details like spelling, and, achieving good prediction there, converge to that 1.6 loss; GPT-2-1.5b starts off good at long-range modeling and good at short-range modeling, and must tradeoff learning both from its limited training, thereby achieving mediocre performance on local correlations and thus mediocre loss, even though humans reading it are impressed by the thematic consistency and relative lack of ‘gibberish’ (locally but not globally consistent text).
An additional issue here is that the GPT-2 models are not fully trained: as the GPT-2 paper notes, “All models still underfit WebText and held-out perplexity has as of yet improved given more training time.” (The difficult of training such powerful LMs to convergence was also noted by the MegatronLM researchers, whose MegatronLM-8.3b model was still learning rapidly when they ended the training—despite use of NVIDIA’s DGX SuperPOD with 512 GPUs.) So some of the finetuning here may also be finishing the GPT-2 training.
I selected the following from its training log.
A commentary on Oda Nobunaga’s mysterious betrayal by a loyal general?
"Oda on the Pavement", by Mark Bibbins [Activities, Jobs & Working, Social Commentaries]
The first time I saw this sign someone's blood
streaming down the pavement, I didn't know what to do.
I walked as quickly as I could toward the body
and almost didn't see the second because it was going by
so quickly I had to blink and it was gone.
But this is what I saw when I got there, a corpse
laid out flat on its back, its legs
bent at the knee, with a blue tattered blanket
crammed in the small of its back.
I said to my wife sitting next to me, don't you
think this is awful? She said, Mark, it looks like
just a trick he's done.
I said, it looks awful real to me. The man has
his back turned to us, like he is not even thinking.
As I am, I said to my wife, is this what you have
been waiting for? Is this what you have been waiting for?
The man was lying there not even looking up at us.
In the Golden Age, when the people of the Yellow Valley were instructed by the sages of antiquity:
...That the earth would then not open
(O, but to hear our fathers speak!),
The angel of the waters spake.
For in that time the race of man
Sought the groves, each his own dwelling,
Where were then fruits to satisfy,
Where were trees in plenty springing
To furnish them with fuel,
And so the forest grew,
And o'er it all was running
The river that we see.
Then did those ancient sages
Order their days as was fitting;
And whatsoe'er they looked
On was ordered rightly; all
Taught each what was fit for living
And working for self;
And now we have come to the end,
But this is what their teaching
Said for our instruction,--
That we know and do not know,--
And yet we know and do not know!
When we have found that which is right
The will to act and not to act
To be the chief and not the end,
The day and not the hour,
The joy and not the pain,
When shall we find this, then at last?
Why then at last at last?
There can be no yesterday
If such be the age of wisdom;
There can be no time or place
If such be life and work for man;
There can be no man or man's child
In which he has no friend or friend's child;
Nor yet a man or man's son,
Who to his work does not go.
Thus the sage's order is
And what the sage does--so he does.
Romance?
A nice descriptive piece:
...The sun sets and rises in its glory
And sets in its glory only.
And the stars are all that can be seen up there,
That set and rise and come to the earth in their glory.
And the moon, too, sets and rises only,
And comes to the earth in its glory.
The sun sets and rises in its glory,
And when it sets, the night is black,
And when it sets, the day is dark.
The sky is all up against us and over us,
With its starry light,
And all the earth beneath us and above us,
Under the shining sky,
Is black with all the stars.
An elegy:
..."To the Youth", by Susan Greenfield [Death, Growing Old, Sorrow & Grieving, Relationships, Family & Ancestors]
For the youth, who, long ago,
Came up the long and winding way
Beneath my father's roof, in sorrow---
Sorrow that I would not bless
With his very tears. Oh,
My son the sorrowing,
Sorrow's child. God keep thy head,
Where it is dim with age,
Gentle in her death!
And though the years should bear him low
Into the world's dark embrace,
He shall feel more fondly there
Than he did in my bosom then.
Even now that heart, so wild,
Is as a child's, sweet, and calm;
And shall His voice forbid me
From the shadow of the grave
To forget, to forget Him?
And shall He care to keep Him
As I do him here in this place?
Or has the power, that I have learned
Him of the world to fade,
Him of the world to go away?
Haltest to the shadow of the grave,
My son, my son!
Love lost:
...Since that fatal day we parted,
I have lost, I think, my best part.
It was a friend of mine once gave me breath,
And I went as far out as he dared to go,
And ever since, poor lad, I've lived by drink.
But if I had no more friends, I had friends
Such as the world could never let me have.
'Tis true, I have never had occasion to think,
But when the sun is high, and the sky is green,
And the whole world is merry till the break of day,
I sit and dream of her and of the joys of yesterday.
And I go wandering off on some old-world morn
Whose very first light of hope and youth
Is the light of looking on her face through the green.
Her eyes where they were, as I tell you,
Seem fixed with a bright, fixed gaze;
They're half my heaven, when they are there!
They are as clear and bright as she herself,
And she knows, too, the way that she will look
When I can tell her what she is like
And where all will end for me in her.
Now, how would the world like her if it knew,
And what would 'twere if each one did know;
For, ah, she is a world of beauty!
But then, ah, her eyes are even as clear
As the light that makes them bright and wise.
And they are a light out of some dream
That has power to brighten and cheer;
They're a bright sign to turn and turn again
Even in the days of storm and pain;
They're my world, like her bright suns of heaven,
Which I have loved even to the death!
An attempt at nonsense verse, apparently:
...I had a little hen,
I had a little hen,
It chirped at her dish.
She chirped at her dish,
"Come in, come in,"
The dish clattered out,
The dish opened wide,
There were three small cats,
"What is the meaning of this,
Why are all my dishes
Spread open for you?"
"Sit in the corner."
"Here is a bowl for you!"
She drank the bowl clear,
She drank the bowl all up,
And cried with an unhappy sound,
"I do not want it!
I am quite sufficient,
I shall have enough,
And just be done with it,
And not have more!"
But I have a different tale,
Which perhaps you will think is odd.
It is, 'tis no use repeating.
But here, it may be the same.
I had a little hen,
I had a little hen,
She had a little curl,
That with her beak she curled,
In her plumage she had,
Such as a dog's must have.
Thus I had three little dogs.
I fed them with bread.
And they all looked out upon the street
Where our little Polly went by.
The wreck of a ship:
...And as the sun sways to and fro
With his eternal sway of air
They roll in slumberful rest.
So still and calm are they,
So still and clear,
No whisper of the wind dares steal
The tranquil atmosphere of night,
And like bright angels sleep.
The sea is their only theme,
The ship is a dream
That they arise to leave at even.
The moon is far,
All glistening with a silver shine.
It is her pale that lights
The livid face of the sea
That in its white and sullen face
Is pale with pain.
O grey and grey it grows,
And the sea grows wild
As the black clouds in a night-time ride,
The storm is on the sea,
The wild winds rush in their strong wrath--
The great waves roar and beat
The heart of the ship with their fury--
In its heart's core, cold and dead
At one bound, is bound
Like a rock in the heart of a lake.
The storm is loud
And wild it is blowing--
The storm is far away--
The wild winds, wilder still,
Are beating far and wide.
The wild sea winds that were blowing
Have stopped forever there alone
Where the ship has gone down,
And now the moon rides low
Where the sea now rests in calm;
On the deep, dark midnight
The ship is lost.
GPT-2-1.5b can apparently do meta-fiction and break the fourth wall‽
...He had the look that I can't forget;
It was as if at once I knew
What I'd never had before;
It was like a rush of light on water--
And something about it seemed queer.
And the light would come and go
As if the clouds were about to fall,
And the rain came and came in showers
And went in ditto fashion;
And it's just as if I were blindfold--
I couldn't hear a thing.
I'm sure it isn't my nerves;
It's just as if I had the palsy.
I don't want to think of not being there
For something I can't be sure I was.
I'm sure my nerves are fine in my life;
And the only thing bothering me
Is that there isn't any rain.
It's all quite natural as I said;
I've seen men die of natural causes.
Now it's only a pity it happened
As it is done to me. But, as I say,
There's no harm in dying of dropsy.
Though not the death I'd choose to die in--
It's much better than to get laid flat.
This is not the way for my poem
To end, I know--I'll make it up to you
In a later place; but here's a verse
As you may understand. After this line
There's a space, and after that a line
About two inches, in which I meant
To tell you that you had done well to try
To stay alive. But, as I said, this is
Just a matter of chance, it cannot
Be helped; I must allow that this verse
Was written with the fingers you saw--and then
You were all to blame, if I died before
You came in and saw the writing. Now you know
I'm dying--go on--you will not die
Before I finish off my work.
It was a little moon that watched
In this place, I don't know why I chose
To watch it, it had its night when I woke;
And I, who was tired of the earth and men,
Watched it like a man watches a clock at noon,
Just because I chose to and did as it said...
Another shipwreck:
A surprisingly coherent piece on a trapped upper-class wife:
...There is not a single house in all
These beautiful gardens that I do not know.
I know the houses and gardens where I sit
In the evening with my husband and my son,
And I sit at the dinner table there too.
The house where my husband and my son live is the one furthest away.
The people come and go through these gardens, all day long;
And I see their feet pass along the paths,
And I hear the talk they have all that day, from one end of the town
to the other; I see the carts and waggons of the farmers,
The teams and horses of the tradesman, men on foot, and the gaiters swinging
Upon their saddles by the way side;
And every day, at morning, the same number of carts and waggons I see,
And every morning, in the great daytime as soon as I wake,
I see their number still greater, still greater.
Then to one side they go
Among the flower beds and in the wood,
And I never see them more;
And their voices float on the rising wind
Like the voices of the dead,
And their faces light upon their breasts, like lights,
Like the faces of bright children,
Like the faces of handsome men in the street,
And the faces of friends, and the faces of lovers,
And of all strangers, all faces of home-brethren
With its memories and its griefs,
And my own face that is always changing,
Wrought by itself in the dark,
With the face of the dead and the face of the living,
And the face of a youth that grows old
In the shadow of love and the light,
In the shadow of the grave.
I know not what they mean by their words,
Those people that pass me in the garden,
In the little town that is in the garden;
I only know that, on many afternoons,
Through a gap in the trees and between the stones
I see their faces and hear their voices.
The curse of immortality:
Perhaps the most striking of them all is this existential horror piece:
..."The World of the Dead", by Peter Stearns [The Body, Nature, Philosophy]
When they come, they carry
Your limbs, your life,
In their mouth and arm.
I think they swallow.
I know it. The others know.
My body will be like theirs,
As the river, the sea,
Will be like the one on which it runs,
If the ocean rises
And swallows the land.
It will be hard to survive.
To heal,
Some of this will have to come off.
That's what they say. They say it
Many times a day. They say it
To each other.
They mean to save us.
They just can't stop us
From becoming what we are.
I must live inside you.
That's what they say. They say it.
Loss: 1.6
The expanded TPU swarm & Adam LR tuning allowed rapid training, and we reached 1.6 overnight, matching our previous best on the combined PG+PF poetry dataset.
I generated a dump of samples with top-p = 0.90, and read through ~5% to select some interesting samples:
Samples selected from reading ~5% of that:
"A Knowledge of the Dead", by Mary Wiencki [Living, Death, Life Choices, The Mind, Time & Brevity, Religion, The Spiritual, Social Commentaries, Crime & Punishment, Popular Culture]
I see you there, Stu, striding half a mile down the road, arms raised up over your head, head bent slightly. I imagine you hold both those in your inmost heart, and that you must learn, along with anything else, how to turn off a brain that has somehow learned to hold whatever memory is stored in it. For the mind, like any organ, is where the trouble is; an organ can fail with its stored knowledge, or if the memory be great, so great that it will bring the brain to its knees. And then the knee is a joint only partly conscious; if the heart should stop pumping, we are thrown off balance as if it had been only the legs that moved you. So I ask you, were you looking at your watch when you left for that solitary walk? Or waiting for the medicine you wanted to take with you before starting on your way? A look of mild impatience conveys a point as surely as humor, though somewhat dead. It is painful, this wait, I am sure. You have worked long and hard for your knowledge of time and of this place. And now you have it. And time, and all the woe it took to give that power. You have so much of this world left to discover, paths to retrace. You find your way into a park, its benches occupied and visible and free of talk of the day’s events, at its center a girl...
"April Moon", by E. E. Brown [Love, Break-ups & Unthankfulness, Religion, Buddhism, Faith & Doubt]
Awakeâ€"with you I meditated and thus
renewed my doubts; But, awakeâ€"with you I sin,
and thus my conscience put me to bed.
Awakeâ€"with you I suffer, and thus
my doubt took wings. Awakeâ€"with you I play
the hypocrite, And thus my conscience fires
my lash, and thus I scorn you.
Awakeâ€"with you I fly from faith, and thus
through your face I stab myself. Awakeâ€"with you I
remain benighted, and thus
my conscience rots me at my heart.
"The Philosopher’s Plane", by James Taggart [Activities, School & Learning]
for John Millikoper
The philosopher’s plane, imagined
by Calippus, rests on a red disk of dawn
close to the body
We flop into the blue below our feet,
into the astral horizon,
that whose dots
our lives keep shifting
over the edge of empty space
into the orange of earth
And beyond into blue
well into the empty page of thought
Where we can embrace
a little while of our desired
end and then
flow back into the world of time
"Map of Our Land", by Eavan Bolger [Living, Time & Brevity, Nature, Landscapes & Pastorals]
The stars are born in night.
The ground is made up
Of tales untold.
The cracks
are our story.
The piles of leaves
are our life.
The river that we lie
At dusk is alive.
The buried
Grass beneath us
Mosquito,
Mosquito,
Mosquito,
Mosquito,
Mud-stump,
Mud-stump,
Mud-stump,
Mud-stump,
Oromoctotecological teacher, henchman, loomworm, toad-man, German accent.
Not what one would be expected to hear.
Oromoctotecological teacher.
Far superior animal to what one would be expected to be expecting.
Far superior animal to what one would expect to be expecting.
Not what one would be expected to hear.
Oromoctotecological teacher, henchman, loomworm.
Not what one would be expecting.
Better than what one would be expecting.
Better than what one would expect to be expecting, better than expected.
Not what one would expect to hear.
Oromoctotecological teacher, henchman, toad-man.
Far superior animal to what one would be expecting.
Far superior animal to what one would expect to be expecting, better than expected.
Not what one would be expecting to hear.
Oromoctotecological teacher, teacher, toad-man.
Higher in intellect than what one would have expected.
Higher in intellect than what one would expect to be expecting.
Higher in intellect than what one would expect to be expecting.
Better in intellect than what one would expect to be expecting.
"Language is not the Draft", by Aevita M. Branko [Social Commentaries, Crime & Punishment, History & Politics, War & Conflict]
No don't be angry
Don't be angry,
it's fine
don't be angry
No
don't be angry
It's fine
it's fine
don't be angry
because
every one of them died
And so began the siege of New Amsterdam,
In which, by Providence, only three days ended;
When, by direction of Ms Frisbie, the heroes two
For their advance, together, took their way.
The two fellows, whose mission it was to guard
The city gate, took place in the greater army;
While those two dukes who should avenge the town
Sent all their force to put the place to rout.
And, as the late oak, covered with boughs,
Has done its work, ere its starving spike is struck,
And this great tree sinks as it had never been
By any human pains, nor would be now,
But for her first son's interposing,
So, falling foul of their first heart's delight,
The Dutch no more wept for New Amsterdam.
Loss: 1.3
Samples generated from a GPT-2-1.5b-poetry model with 1.3 loss, 2019-12-15, selections:
Panegyric to power:
"Praise", by Robert Browning [Nature, Animals, Trees & Flowers, Social Commentaries, Cities & Urban Life, Class, Money & Economics, War & Conflict, Mythology & Folklore, Fairy-tales & Legends]
The birds are saying praise of the Shah.
In the distance a child will hear,
And never cease to hear.
And look, and many will be surprised,
As all at once, to hear the words praise the Shah.
In a hundred cities they will say:
Praise the British captains,
Praise the millionaires' homes.
But some---like the birds---from the sea to the hills,
Will spend their lives saying praise the Shah.
Criticism of England:
The world of the dead:
The Demiurge:
Jealousy:
War. War never changes:
"Recuerán, the Mercenary", by Maggie Dietz [Activities, Jobs & Working, Relationships, Family & Ancestors, Social Commentaries, History & Politics, War & Conflict]
What does it matter where it happened or who it was?
The bullet traveled what did it matter when it fired
Where, far from where it all happened.
Do we matter, the years, who left us and where,
Remember the who and why,
Left, left to us, who are to be freed who and why?
I ask in my yet young memory what I know about you,
I ask in this yet young and still dark memory.
I ask you, have you ever told you're sorry,
Have you ever, down through the years, ever said who it was
To do this, to be a mercenary?
What does it matter in the who or what
To be forgiven, forgotten, forgiven who and why?
Your mercenary smile.
Your smile mercenary, mercenary now, does it matter where
It happened, when, or who?
In a tavern on a lane, behind a dark smoke,
A year ago, the answers were,
The secret garden:
I don’t really get this one but the repetition and inversions make it interesting to read:
Love lost:
"Note in Bear Memoriam", by Vivian De la Noy
Who will remove my stitches
who will undo this confession,
who set these lines of words
drawn in wood? what voice will sing
their melody, irrepress the sting,
let you speak your name, my name,
what has become of it, left there,
left inside me, voiceless, mute?
Urban vs rural life:
"In These Cool Cities", by Ted Koochet [Living, The Mind, Activities, Travels & Journeys, Nature, Winter]
In these cool cities, they forget the trees' autumnal glow.
In those other places, the trees are burning.
Everywhere is winter here.
Frost and snow.
But in my streets it's still autumn.
Last night, an hour or two before the snow,
I glanced out my window and --- nothing.
Nothing but the jumble of bodies
and words like sea gulls yapping in the dark.
And those morning papers' faces of every morning.
The snow was quieter and paler.
No fall --- but a blue cloth.
Pelted like little pellets and faded.
And last, the crows' dim hang-hips.
I hear the crows now.
Now.
And somewhere below, another peal.
Social media satire:
"Conversation with a Friend on Romans", by J. Frederic Jagger [Relationships, Arts & Sciences, Humor & Satire, Social Commentaries, History & Politics]
We're fond of quoting people, but it's an illegal act
To put our words in someone else's mouth.
For instance, if I say to you, "I like Hitler,"
You would be liable, of course, and-legal.
One may be bitten by ants-by-dogs- in the District
But it is entertaining, and sometimes amusing.
Pantheism:
We all can sing of love,
We all can sing of life.
What shall make each of us noble
And shine out in the eyes of all men,
That in triumph and in grief we know
We are all of us children of the sun?
What man has not, on the lonely moonlit heights,
Seen the glory of the lances' fires,
And heard the singing of the beaten arrows?
What can the earth and ocean and sky
That music not disclose?
What voice of human choirness e'er came nigh
That was not poured into that music,--
The Voice of the All About Us?
Art:
"The Mask Maker", by James Haimes [Religion, God & the Divine, Arts & Sciences, Social Commentaries, Mythology & Folklore, Heroes & Patriotism]
When we've grown weary of the hero, there is a certain
triumph to be found in the figure of the mask.
There, the underworld of fantasy and the ideals that carry it, is
God dressing as a minstrel. To us he is immaculate,
regardless of his skin, be it shabby or silvered, or if o on
behalf of the fair or beefed, or just an old suit, a bald head,
headdress not unparalleled in its beauty but pitiable in its
delicacy.
When that other menagerie called the human face is viewed, we
find it to be pitiful indeed.
Still it comes,
after all these years.
"On a Bridal Shower", by Alice Notley [Living, Life Choices, The Body, Nature]
I am thinking now of bridal showers, of the feeling of getting them--- I mean the giving of pleasure, of giving yourself to other people.
They come to mind: how many sweeter feelings there are: Nectar vaults, lockable vaults.
The most brilliant of eyes---that of my friend, now dead---was such a doll. Light and easy, of voice easy as a moonbeam.
And she wasn't thinking of birth, but wealth, the dream. Wealth, the sparkling.
It's the eyes alone that gleam, the loose and luminescent body. And the mouth. And how deep and wide and light-wide and luminous mouth.
So many of us gave ourselves to others. We didn't know we did.
"Intangible Things", by J.R.R. t cm = 120"", from "Meditations", by Alfred, Lord!
It seems we have been given possession of the senses, to see, hear, taste, touch, smell,
it is given to us. And so we have them. But we can't keep them.
We lose them. They fade. It's like losing a baby. And it's like losing a lover.
It's like losing a race. We race down the straight in a battle suit. We are breaking away.
We are at the front. The one in the front is the one who falls away.
But it's like losing a race. It's like the gully. It's like the catch. It's like the hole. It's like the ball. It's like all the possibilities.
But it's like a hole. It's hard to get into. It's hard to get out of.
We have the means to prolong it. We have the senses to maintain it.
It's ours. But what's ours is just as it's always been. It's like a dot that appears once in a lifetime.
It has been. But it has now become something like ours. It's still temporary. It doesn't last. It's just like us.
An elegy?
One more I noticed while generating from a ~1-loss model on 2019-12-16, although I did not read through for selections (see also ~0.7-loss model samples, 2019-12-18):
"The Dead Dead Trees", by William E. Stafford [Living, Death, Sorrow & Grieving, Time & Brevity, Nature, Trees & Flowers, Arts & Sciences, Poetry & Poets]
for William E. Stafford
The fire out of nature comes, and all things die.
The old trees walking around the forest,
Old homes grown up so high they watch the sky;
Old friends grown old in the lights of the city
Grow fatter and slow, and cast a long frown.
But all the trees of sadness are sad, old friends,
You gather in your trees the great aim that can/ Save by crying. The fire out of nature comes, and all things die.
O bright Biblical Valentines of our rock marriage
Twice Sweet Roses, once a widowed Love would bear,
O give us the first vintage that points toward faith
And the second would add a marriage bond,
O give us a cup with the words of mine spouse,
O give us the wine of the West enough to brew
Two potions to govern whatever we deem
Treacher-of-heart and/or mind.
O give us two basic cakes: one
Higher than the other.
O give us two lovely days.
One a bright noon, the other at night.
Two better than two diluted drinks;
One a day without a bud that caressed the ground,
One side of which is born black, the other white.
One side deep in knowledge, the other one wet and shiny
And light as a great bird's eye.
One an eye that twinkles when you flash it,
One a kiss that devours.
O we rock over just enough to find the sun.
To make each other better.
Just enough to make us wiser.
"November", by Lee Herrick
All of what we think we know
of seasons slips away,
and beautiful things take their course---
a cascade of moss, a snow
granular fall. In these rare
nights gorgeous and bluer
than winter, there is no cold,
and no fierce wind. In these
few bright days the already
red sunset burns on the hills---
until the nighttime is blind
without the sun.
"Ticks", by Ben Lerner [Living, Death, Health & Illness, Relationships, Family & Ancestors, Philosophy, Social Commentaries]
Ticks are tiny bloodhounds with a bulldog
mouth and a life span of about five seconds.
They are small, pink, mosquitoes with a tan
to their wings. On your face, they leave
a pouty frowny face. Tick-tock, tick-tock.
They are everywhere, like crickets in a park
quietly singing. Tick-tock, tick-tock.
I get nervous around them. They tell me
they are hunting wildebeest. Tick-tock,
tick-tock.
I get nervous around them. Their very presence
makes me nervous around them.
They look like feral dogs and
they hunt like wolves. Tick-tock, tick-tock.
Tick-tick, tick-tock.
I'm serious about this. Tick-tock.
"My Mother's Snare", by Ben Lerner [Living, Death, Relationships, Family & Ancestors]
My mother's snare
sounds like rain
picking up the strings of the rain,
then the trees, picking up the strings
of the trees.
It's the last refrain of a small song
my mother wants to finish, but cannot.
"The Lady's Speech", by Sir Walter Ralegh [Relationships, Family & Ancestors, Social Commentaries, History & Politics, War & Conflict]
A child was sitting in the sun
When another child said to him:
"Sun-child, tell me why
This great house is such a pit'
In which all people lie.
Tell me, why does no one die
Here in this pit?"
"Death is not here
Except in the sun
And only happens when the sun shines."
"Then who dies then?" asked the first child.
"Everyone dies then."
"My Brother the Bomb", by Mark Rudman [Living, Death, The Body, Time & Brevity, Religion, Faith & Doubt, God & the Divine]
For Joe Miller
In heaven, we worship every fruit,
From grape to peach to plum;
We go to earth and find it full
Of thorny thorns and braches,
Wrestling with itself to get away---
To explode on us.
This is our way of saying hello.
This is how we express gratitude:
By giving, by making things happen.
In hell, we worship every bomb
That people would drop on each other---
On Hiroshima or Dresden,
Or any other night where everyone
Has been too sleepy to turn on the light.
This is our way of saying goodbye.
This is how we make sure no one dies.
O world of radiant sunshine,
How far above you seem
The clouds o'er the sea,
Above the rooftops of New York
How high and distant they fly.
Your beauty seems almost painful--
For all the rain and mist.
O world of golden skies,
How near you seem to be
To souls that wander, lost and free,
Through fields of corn and wheat.
Though all below seems dark and drear,
Each height and hill is bright and fair.
O world of sparkling dews,
How near you seem to be
To women whose lips are wet
And cheeks that blusher are
Than mine or thine or even hers.
We smile because we're happy
And strangely jealous of each other.
Overall
Subjectively, the output shows a lot of poetry knowledge, much better than the char-RNN samples. There’s (some) rhyming, themes are continued for shockingly long passages compared to char-RNN, and there are many passages I feel could inspire a poet or even be cleaned up a little to be passable poems on their own. Adding the metadata did help—GPT-2-poetry is worse than GPT-2-poetry-prefix. Some of the ones I liked most are (first lines) ‘We never say “Thank you”’, ‘Thy soul, thy very soul is burning!’, ‘“It is morn!” said the clover-bush’, ‘And they have seen the last light fail’, ‘There comes a murmur low and sweet’, and probably the best is ‘The sun is gone, and the night is late’.
Is GPT-2-poetry-prefix better than GPT-2-117M at poetry completions (since GPT-2-117M will probably hardly ever generate poetry without a prompt)? Probably, with exceptions. “Howl” is far worse, but that is for good reason related to the oldness of the PG corpus; if anyone could assemble an equally large corpus of more recent poetry, I’d expect GPT-2-117M finetuning to produce better completions. The Pope samples from GPT-2-poetry-prefix are clearly better (before diverging into prose). I argue that the Shelley samples are somewhat better. And the 8 famous line completions are overall of much higher poetic quality (several of the GPT-2-117M completions are just prose, unsurprisingly).
So, if one is looking for poetry completions in an old-fashioned vein, it delivers, but at the cost of flexibility like more prose-like (and hence contemporary) poems. This is an expected and fixable problem, and overall, I consider GPT-2-poetry-prefix to be successful as a poem generator & better than my previous char-RNNs.
Improvements
Nor is this near the limit for Transformer-based poetry generation, as there are many possible improvements which could be made, all of which I’d expect to deliver substantial gains:
Make It Bigger:
bigger NN models: our initial results used the publicly-released GPT-2-117M, which delivers inferior results on all tasks compared to the unreleased GPT-2-1.5b: the samples generated by OpenAI & associates from GPT-2-1.5b are much better than GPT-2-117M samples, indicating that simply scaling up continues to deliver gains. Our GPT-2-1.5b poems turned out substantially better.
Nor did the various GPT-2 model sizes appear to reach any natural limit with GPT-2-1.5b, indicating that the Transformer NNs can be increased much further before hitting zero marginal gains. (This is consistent with other large-scale NN research, particularly on CNNs where even billions of images can be usefully trained upon.) OpenAI’s Greg Brockman has said (February 2019) that OpenAI intends to keep scaling GPT-2-1.5b up with aspirations of training 10–1000‘GPT-2-huge’ and a 1000× bigger still ‘GPT-2-enormous’ is possible, the quality leap from GPT-2-117M poetry to a hypothetical ‘GPT-2-enormous’ would be staggering.
These projections for GPT-3 have since been borne out—GPT-3 (published 2020-05-28) has 175b parameters (166× more), and GPT-3’s untrained random poems are as good or better (!) than our GPT-2-1.5b poems.
better NN models (which will probably need to be bigger): the most painful limit is the small context window, which has a number of possible solutions like recurrency, memory, efficient attention variants, or various approximations. other options include more attention heads or more layers or external memory functions or on-the-fly adaptation; there are many possibilities here. (The prefix can be seen as an extremely crude kind of recurrency or memory, and helped a lot; how much more so a real memory?)
more & better data: quantity-wise, the PG corpus is barely a tenth of a gigabyte and exhibits many enormous omissions—all of modern poetry, for example, not to mention most foreign poetry, or non-English poetry as a whole (why not a multi-lingual GPT-2 if sufficiently large? neural machine translation approaches improve the more languages they have access to, why not regular language generation?). There are many places additional poetry could be obtained from, such as WikiSource, Poetry Foundation, Libgen, or the Internet in general (perhaps write a poetry-detector Transformer to search through a dump like Common Crawl for poetry?). Quality-wise, the PG corpus is good but still has a number of flaws: a lot of prose, just enough non-English poetry to screw things up (especially Latin), mostly pre-1923 poetry, & minimal metadata (ideally, poems would be individual units rather than book-length streams, and metadata like author would be available to use in prefixes).
GPT-3 expands the dataset greatly to more of Common Crawl plus 57 billion words from 2 deliberately-vaguely-described Internet book corpuses, but playing with it, I feel that GPT-3 is still weak on areas like science—for example, I observe GPT-3 to easily write machine learning paper abstracts, but it does not do quite so well when I try to extend it to the rest of papers, and it doesn’t spontaneously quote from within papers the way it quotes abstracts. Does this reflect the fact that many papers exist only as PDFs, and only a relatively small fraction of all scientific papers have clean readable HTML versions (while they usually all have readable HTML abstracts)? If so that may weaken the GPT reasoning & common sense abilities considerable; after all, while it does not usually come up in regular writing that giraffes have two eyes instead of three eyes, probing definitions & studying exceptions & manipulating causal arrows in unusual ways are all the bread & butter of scientific writing, and could implicitly teach that better than regular writing.
Generate Smarter
using a better sampling strategy than top-k, like “nucleus sampling” (but curiously, not beam search—beam search gives substantial improvements on what the nucleus sampling authors call “closed” text generation tasks like translation, but while beams search helps char-RNN a little, it damages results badly the wider the beam, and gives particularly bad results on GPT-2; Kyle Kastner says that beam search can work in contexts with heavy constraints, like being constrained to generate rhyming lines or explicit repetition penalties; Douglas Summers-Stay overcomes the rhyming limits by brute force: generating random completions until target lines rhyme according to a rhyming dictionary library)
Nucleus sampling has been implemented in nshepperd’s Tensorflow & Hugging Face’s PyTorch GPT-2 sampling code.
use tree search methods: any deep, thorough, search inevitably becomes a tree; tree searches are useful for enabling kinds of ‘backtracking’ and ‘revision’ or ‘changing its mind’ about multiple possible variants of a poem, as opposed to the usual sampling approaches which tend to commit to each word and force all-or-nothing choices. (My proposal for backprop reward optimization would have similar advantages, as each iteration step allows ‘thinking’ about how to improve a given input, approximating a search implicitly—even if not explicitly like a MCTS or MuZero-like approach.) The challenge here is to figure out a tree search which avoids the repetition trap.
Train Better, by fixing the loss (eg. unlikelihood training18), or switching to the RL setting to directly maximize generation quality:
richer losses: the standard GPT unidirectional prediction loss is not the only possible (differentiable) loss; it is not even, strictly speaking, the best—models like BERT/
BART using more sophisticated losses like bidirectional losses, which force the model to predict a word missing from anywhere in the string (as opposed to only missing from the end), typically outperform GPT-2 on language tasks. A model like T5 uses a denoising objective where a 15%-long chunk is replaced by a missing token & T5 must predict all the missing text based on context; these sorts of objective losses allow learning much more from a given dataset. (Indeed, such models typically outperform GPT-2 on everything but language generation. Oddly, they typically do quite badly at that, which is a major reason everyone uses GPT-2 for generating new texts, and BERT etc for everything else like generating embeddings or classification.) adding global end-to-end losses, which enable training to optimize non-differentiable properties rather than easy (but partially irrelevant ones like predictive losses such as cross-entropy in prediction of the next word). For example, rules defining acceptable meter or rhyme use or penalizing total repetition—these cannot be done via the normal training because no individual discrete word is responsible and parameters cannot be smoothly adjusted to decrease/
increase a global property like ‘rhymes’ which is the result of all words considered together as a whole. (This sort of RL loss has been employed in other natural language tasks like machine translation, where metrics like predictive loss do not map onto the desired goal of semantically-correct translation, and word-by-word generation of translations yields similar issues as here, but there are metrics like BLEU or ROUGE or grammar checkers which provide a crude measure of global quality. RL approaches have many virtues.) using subjective quality-based losses, like preference learning:
instead of training a NN to predict individual next-characters as accurately as possible or imitate a text corpus as well as possible, we really just want them to predict good next-characters to write text as well as possible—which is not the same thing at all, any more than accurately predicting a human Go player’s next move on average is the same thing as playing Go superhumanly well.
This encourages more global coherency, more thematic progressions, use of rare words when appropriate, surprising subversions or twists which work well when tried but don’t appear in the original corpus, learning esthetics, and so on. If it works and the new GPT-2-poetry is able to successfully produce new poems which consistently get the top score from the critic and no further improvement is happening, then you simply read a bunch of its new poems, pick which one in each pair you like, retrain the critic on the expanded dataset to detect the remaining flaws in the ones you disliked, and then keep training GPT-2-poetry to avoid generating the ones you disliked & generate more poems like the ones you liked. Repeat with many cycles, and it should generate excellent poems while avoiding all the flaws of crude likelihood training and even cruder top-k sampling which hobble GPT-2-poetry right now. Even better, you could create a website to crowdsource the rankings to keep it training 24/
7 and improving indefinitely. using “expert iteration” architectures like AlphaZero to do much more sophisticated search over possible poems, creating an iterative bootstrap
adding creativity losses along the lines of “CAN: Creative Adversarial Networks, Generating ‘Art’ by Learning About Styles and Deviating from Style Norms”, et al2017, where updating GANs encourage diversity
one could attempt to invent new styles of poetry by taking inspiration from evolutionary methods, such as the “Population-Based Training” variant employed in DeepMind’s AlphaStar League which created diversity by deliberately scrambling the ‘rules’ for each lineage of agents. The “AlphaStar League” used a population of multiple NNs, each forced to specialize in using a particular unit or rewarded for achieving particular goals like defeating a specific NN (rather than winning in general). The AlphaStar League was credited for forcing the overall AlphaStar population to explore strategies reliant on particular kinds of units and figuring out counter-strategies to successful ones. Something similar could be done with poetry rules: train many different agents, each given a specific rhyme scene or meter or vocabulary for their reward function, and in preference-learning approaches, the best poems can be provided to human critics for rating & improving the NN critic. Potentially exciting new combos could emerge as producing the best poems as rated by the humans.
Given that GPT-2-117M is far from the state-of-the-art as of February 2019, and hardware & generative NN research is advancing rapidly, it will be exciting to see what sort of poetry can be generated given another 4 years!
External Links
Discussion:
“Gwern’s AI-Generated Poetry” (SSC); Reddit/
HN: 1/ 2; BoingBoing; MetaFilter “A Very Unlikely Chess Game” (on applying GPT-2-1.5b to PGN chess games)
poem-generator
(Generates rhyming poetry using Huggingface GPT-2 using rejection sampling—throws away possible completions which don’t rhyme)“A Hundred Visions and Revisions” (Monte-Carlo-like sampling to edit poems with BERT/
RoBERTa to make ‘more likely’ or change topics/ vocabulary)
lm-scorer
(“This package provides a simple programming interface to score sentences using different ML language models.”)“How to fine-tune GPT-2 on podcast transcripts”/
“These WWDC boxed lunches aren’t real”, partialparcel (finetuning GPT-2-1.5b on Google Colab) “Best Practices for Finetuning Large Transformer Language models”/
“How I (almost) replicated OpenAI’s GPT-2 (124M version)”, Bilal Khan “The Average Fourth Grader Is a Better Poet Than You (and Me Too)”
“The First Sally (A), or, Trurl’s Electronic Bard”, Stanisław Lem (The Cyberiad; commentary)
“How to build a State-of-the-Art Conversational AI with Transfer Learning”, Hugging Face
“Computer Generated Foundation: SCPs generated by a neural network”
“How To Make Custom AI-Generated Text With GPT-2”;
gpt-2-keyword-generation
“Evaluation Metrics for Language Modeling”, Chip Huyen
“Lessons Learned from Building an AI Writing App,
writeup.ai
[Guide]”, Jeffrey Shek“Excavate”, Mike Lynch (search over a corpus to extract a RNN-generated ‘hidden text’)
“Introducing Aspects of Creativity in Automatic Poetry Generation”, 2020
“Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, 2019
“How To Fine-Tune GPT-2 So You Can Generate Long-Form Creative Writing”, Jason Boog
“This AI Poet Mastered Rhythm, Rhyme, and Natural Language to Write Like Shakespeare: ‘Deep-speare’ crafted Shakespearean verse that most readers couldn’t distinguish from human-written poems” (et al2020)
“Progressive Generation of Long Text”, et al2020; “SOE: Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, et al2020
“AdapterHub: A Framework for Adapting Transformers”, et al2020
“Collaborative Storytelling with Large-scale Neural Language Models”, et al2020 (AI Dungeon-like GPT-2 trained on /
r/ WritingPrompts with ranking filtering) “Controllable Neural Text Generation”, Lilian Weng/
“Recent Advances in Language Model Fine-tuning”, Sebastian Ruder “Making Pre-trained Language Models Better Few-shot Learners”, et al2020
“Prefix-Tuning: Optimizing Continuous Prompts for Generation”, 2021
“P-Tuning: GPT Understands, Too”, et al2021
“The Power of Scale for Parameter-Efficient Prompt Tuning”, et al2021
“Entailment as Few-Shot Learner”, et al2021
“Controllable Generation from Pre-trained Language Models via Inverse Prompting”, et al2021
“Prompting: Better Ways of Using Language Models for NLP Tasks”
“Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models”, 2021
“Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners”, et al2021
“PPT: Pre-trained Prompt Tuning for Few-shot Learning”, et al2021
“Towards a Unified View of Parameter-Efficient Transfer Learning”, et al2021
Appendix
Archive of Our Own (Ao3) GPT-2-1.5b
Aaron Gokaslan scraped the large fanfiction website Archive of Our Own (Ao3) and created a text dump (2.7GB archive, 12GB raw; 190,931 stories; 2.06b words; no metadata).19
We trained a GPT-2-1.5b on it (checkpoint, 11GB), under the theory that it might be useful as a basis for text-game-like applications such as AI Dungeon 2, under the idea that since AI Dungeon 2 is essentially collaborative story-telling, starting with a story-based model ought to give better results.
Whether that is true remains to be seen, but we generated Ao3 text samples and the dataset & model can be downloaded:
rsync rsync://176.9.41.242:873/biggan/2020-01-14-gpt2-1558m-archiveofourownao3.tar.xz ./ # 11GB
rsync rsync://176.9.41.242:873/biggan/2019-12-18-skylion-archiveofourown-fanfics-textscrape.tar.xz ./ # 2.8GB
SF/Fantasy/Fanfiction/My Little Pony GPT-2-1.5b
AstraliteHeart has (w/
(Total data size: 200GB compressed pre-filtering, currently unclear how much it was actually trained on after quality filtering; because of the wide variety of fiction, we dub it ‘uberset’.)
This GPT-2-1.5b model can now be downloaded:
rsync -v rsync://176.9.41.242:873/biggan/2020-08-20-astraliteheart-gpt215b-sffuberset.tar.xz ./
The accompanying Tacotron2-Torchmoji & WaveGlow models can likewise be downloaded:
rsync -v rsync://176.9.41.242:873/biggan/2021-03-14-astraliteheart-tts-mlp.tar.xz ./
Video Game Walkthrough GPT-2-1.5b
Twitter user me_irl provided a 50MB scrape of video game walkthroughs, which he’d used previously with GPT-2-345M and requested we do finetuning on that as well: video game walkthrough text samples (Newsweek article). me_irl
has suggested that they could be used as hypothetical game designs for competitions or art purposes. The combined dataset/
rsync rsync://176.9.41.242:873/biggan/2020-01-16-gpt-2-1558m-shawnpresser-videogamewalkthrough.tar.xz ./
/r/DoTA2
In December 2019, Shawn Presser trained a GPT-2-117M model for a few million steps on the /
While the final model checkpoint appears to have been lost (oops), step #562,971 from 2019-12-18 has been uploaded:
checkpoint (433MB)
Bradley-Terry Preference Learning
Efficient Attention
Moved to “Efficient Attention: Breaking The Quadratic Transformer Bottleneck”.
-
A Transformer is a considerably different architecture than an RNN, and is not that easy to explain, as it uses multiple convolutions to implement “attention”, allowing flexible internal control flow, over a large but finite input window, without any recurrency or hidden state or LSTM units necessary. For increasingly-technical explanations, see:
“Transformer: A Novel Neural Network Architecture for Language Understanding” (Google)
“The Illustrated Transformer”/
“The Illustrated GPT-2 (Visualizing Transformer Language Models)”, Jay Alammar “The Transformer—Attention is all you need”, Michał Chromiak
“How to code The Transformer in PyTorch”, Samuel Lynn-Evans
“How transformers work”, Brandon Rohrer
“Attention Is All You Need”, et al2017 (“The Annotated Transformer”); “Self-Attention with Relative Position Representations”, et al2018
“Character-Level Language Modeling with Deeper Self-Attention”, Al-et al2018
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, et al2019 (“Transformer-XL—Combining Transformers and RNNs Into a State-of-the-art Language Model”, Rani Horev)
“Transformers from scratch”, Peter Bloem, August 2019; 2
“The Annotated GPT-2”, Aman Arora
“The Transformer Family”, Lilian 2020
“RASP: Thinking Like Transformers”, et al2021
Further reading: Advanced efficient self-attention approaches; incidentally, it’s not even obvious that you need attention…
-
774M requires changes to nshepperd’s checkpointing, specifically, removing the
layer == 10
restriction inmodel.py
, and letting the checkpointing code checkpoint as much as possible, which enables training minibatches n≤10 on my 2×1080tis. Diff: -
It would require either high-end GPUs with ≥16GB VRAM, or TPU instances (which were used to train it). GPT-2-1.5b can’t be trained on my 1080tis with either the nshepperd codebase or Shawn Presser’s fork, although Presser has a Google Colab notebook using TPUs which can train it.
-
Other examples of finetuning are Facebook Messenger logs, nshepperd’s unpublished Linux kernel C source code & IRC-log training4, and story prompts. And, while it doesn’t use GPT-2-117M, too good to not mention is “Stack Roboflow: This Question Does Not Exist”.
-
GPT-2 completions of 26 prompts: “Ozymandias”/
“One Art”/ “The Road Not Taken”/ “Where the Sidewalk Ends”/ “Because I could not stop for Death”/ “Inferno, Canto I”/ “In Flanders Field”/ “O Captain! My Captain!”/ “Howl”/ “The Tyger”/ “Outsight”/ “Zuang Zhou Dreams of Being a Butterfly”/ “Sonnet”/ “Oh, the Places You’ll Go!”/ “The Hollow Men”/ “The Summer Day”/ “A Just-Finishing Candle”/ “A Psalm of Life”/ “Still I Rise!”/ “The Second Coming”/ “Do not go gentle into that good night”/ “Kubla Khan”/ “Edge”/ “The Raven”/ “There Will Come Soft Rains”/ “The Lorax”. -
For example, 2 people finetuned GPT-2-117M on an IRC channel’s logs, getting losses of 1.95 & 2.3; why was the latter’s loss 18% worse compared to the former when they were using the same IRC channel, GPT-2-117M pretrained model, training codebase, & had both apparently converged? Because. while the IRC channel was the same, they used different IRC clients which had different IRC log formatting conventions—the former’s logs had the full timestamp prefixed to each line, and the latter didn’t. Said timestamps made up ~20 characters of ~110 character lines, or, ~18% of each line! So the models were performing identically on the content that mattered, and the much lower loss was simply because of near-perfect prediction of the highly-repetitive & predictable timestamps on every line. (Indeed, given the limited window of GPT-2-117M, arguably the model with the worse loss would be better in terms of generating fun coherent samples.)
-
I have 2 GPUs but nshepperd’s code does not (yet) support multi-GPU training easily. Some support using Horovod for multi-GPU has been added but I cannot vouch for it.
-
I discovered this while being puzzled why
--batchsize 32
did not lead to instant out-of-memory errors for training; similarly, if you make the mistake of sampling with the option--top 40
, what you are actually doing is sampling with the default--top_k 0
. Oops. -
It is possible that the additional training was helping, because the remaining tiny changes in the loss might translate to large perceived quality improvements—while the loss didn’t change, the samples from later on did strike me as better. This was something I thought I noticed with char-RNN as well, that the loss became a bad guide to quality when the NN had mostly converged. On the other hand, with larger GPT-2s, like GPT-2-1.5b, the relationship between loss and perceived quality seems even more opaque, with quality sometimes worsening even as the training loss decreases rapidly.
-
One might worry that by taking up space in the model’s limited context ‘window’ of inputs, because the Transformer has no hidden state or ‘memory’, such inline metadata would be a bad thing as it will push real words out of the context window, thereby degrading quality and making it even more incoherent & rambling.
But on the other hand, if it does learn to associate specific IDs with genres/
topics, then repetition of the inline metadata serves as a ‘mnemonic’ for global information which is available to all subsequent iterations of the model, serving as a crude memory itself. For example, if Homeric pastiche has ID #16452, then as long as the final iteration of the model overlaps for just the ID with the first iteration of model during sampling and both see “16452”, all models will be able to consistently agree on generating Homeric pastiche rather than some other pastiche because they all see the same ID somewhere in their context window & that guides their generation.
-
starspawn0 has collated some of the results:
All U.S. presidents and Russian leaders in temporal order, where the order was not specified in the documents used; also, all tennis champions in international competitions over the years. So, temporal order can be extracted.
The longitude and latitude of cities in the U.S. and Europe, along with their relative distances.
The relative size of many kinds of objects, like cars, elephants, humans, houses, and so on—which object is larger than which others.
The exact sizes of many objects in meters, with reasonably small error. For example, it might say the dimensions of a windshield are about 1.4 meters by 1 meter.
Which kinds of animals are dangerous, which are not; and which kinds of objects (eg. “fire”) are dangerous, and which are not (eg. “water”).
Which animals are smarter than which other ones; which animals are faster, which are slower; which animals are heavy, which are light; which animals live in water, which do not.
Which cities cause arousal (eg. “fun”, “exciting”), and which do not; which are expensive, which are not; which are dangerous, which are not; which are religious, which are not; which are large, which are not; which are hot, which are not; which are wealthy, which are not; which have a recognized intellectual culture, which do not.
Which kinds of clothes are appropriate for different age groups; which kinds cause emotional arousal; which kinds are expensive; which kinds are appropriate for different sexes; which kinds you expect to find in different locations; which kinds are associated with wealth; which are not appropriate for hot weather, which are; same for cold weather.
Qualities of mythological creatures—like the ones for animals.
Qualities of professions and professionals: age, arousal, danger, gender, intelligence, location, valence, wealth.
Qualities of sports and sportsmen /
women: arousal, danger, gender, intelligence, location, speed, wealth. Qualities of states: cost, intelligence, political, religiosity, size, temperature, wealth.
Qualities of types of weather and weather phenomena (eg. “tornado”): danger, temperature, wetness.
Physical properties of objects, such as rigidness and strength. Probably also includes transparency, softness, hardness, round, prickly, angular, and so on.
Relations like whole-and-part, and relative locations of a part within an object: eg. hand is connected to arm, arm is connected to shoulder, shoulder is connected to neck, neck is connected to head.
Properties of countries and cities: geolocation, GDP, GNI per-capita, CO2 emissions per-capita, fertility rate, amount of internet use, calling code, military expenditure, life expectancy, energy use, population, places imported from, how long they’ve had a national anthem, kinds of sports, GDP growth, crime rate, and so on.
Binary attributes of countries and cities: continent, time zones, contained-by (which regions contain which countries; which countries contain which cities; which boroughs are contained in which cities; and perhaps even relations between the boroughs—which border which others, how they are shaped, and how large they are), language, high or low crime?, military conflicts, athletes, medals won, organizations founded, schools founded, companies founded, weather, type of government, officials, and many more.
It’s even possible to predict the qualities of objects not in the training corpus, using something called the Bouba-Kiki effect
…
[Some references:]
et al2015, “Distributional vectors encode referential attributes”
“Dynamic word embeddings for evolving semantic discovery” (on et al2018)
2017, “Verb Physics: Relative Physical Knowledge of Actions and Objects”
2009, “Language Encodes Geographical Information”; 2014, “Grounding the Ungrounded: Estimating Locations of Unknown Place Names from Linguistic Associations and Grounded Representations”
-
et al2019:
We also introduce (a) a variation on architecture and initialization to train deeper networks, (b) the recomputation of attention matrices to save memory, and (c) fast attention kernels for training. We call networks with these changes “Sparse Transformers”, and show they can model sequences tens of thousands of timesteps long using hundreds of layers. We use the same architecture to model images, audio, and text from raw bytes, setting a new state-of-the-art for density modeling of enwik8, CIFAR-10, and ImageNet-64. We generate unconditional samples that demonstrate global coherence and great diversity, and show it is possible in principle to use self-attention to model sequences of length one million or more.
…5.4. Saving memory by recomputing attention weights
Gradient checkpointing has been shown to be effective in reducing the memory requirements of training deep neural networks (et al2016), (et al2016). It is worth noting, however, that this technique is particularly effective for self-attention layers when long sequences are processed, as memory usage is high for these layers relative to the cost of computing them.Using recomputation alone, we are able to train dense attention networks with hundreds of layers on sequence lengths of 16,384, which would be infeasible on modern hardware otherwise. In our experiments, we recompute the attention and feed-forward blocks during the backwards pass.
…For each sequence length, we attempted to train the largest model which could entirely fit into 16GB V100 accelerators without model parallelism. Overall, we found that increasing the sequence length by a factor of 4 requires a reduction in model capacity of approximately 4 × √4 = 8. Thus we found we could use factorized self-attention on sequences over 1 million timesteps long, albeit with extremely few parameters (3 million).
-
The code turns out to multiply by a large number as a way of setting a default ‘highly unlikely’ value for each possible BPE, but in FP16, it can’t be represented and overflows, and so the output amusingly just becomes the BPE 0, which is the character ‘!’, so it kept printing out ‘!!!’. Indeed.
-
The Colab environment has special support for mounting Google Drive, so a magical incantation like this will mount a Google Drive folder as a normal (slow) directory:
from google.colab import drive drive.mount('/content/drive', force_remount=True) #!rm -f ~/drive #!ln -s /content/drive/My\ Drive ~/drive !mkdir -p /drive #umount /drive !mount --bind /content/drive/My\ Drive /drive
-
Cloud provider bandwidth like Amazon AWS or GCP are notoriously high for “egress” traffic leaving the cloud provider; like Hotel California, they want you to check in but never leave, and charge egressicous fees per gigabyte. This is why I try to use my Hetzner dedicated server for all hosting, which will let people download terabytes without bankrupting me.
-
We noticed that, in fact, our preemptible TPUs would always preempt precisely at midnight. I speculated that, as discussed in the Google SRE handbook where they cover how the Chubby service is deliberately taken down at random to live down to its uptime promises, preemptible TPUs were being deliberately taken down to stop users from treating them like on-demand TPUs, and this simply wasn’t documented. Discussing our problems with TRC, this apparently was correct.
-
Presser is convinced that TPU power is greatly overrated and most TPU projects have made poor use of the available power, as they get far less speedups than one would expect over GPUs. Oddly, there doesn’t seem to have been much work on using multiple TPUs outside of a TPU pod configuration, and TRC did not seem to know of an equivalent to the ‘swarm’.
-
et al2019 also demonstrates problems with likelihood decoding strategies in GPT-2 for story generation.
-
Alternative Ao3 fanfiction dumps, among others, are available on the Internet Archive.
-
A React GUI which combines GPT-2-1.5b-sffuberset (here), 4chan Pony Preservation Project (voice synthesis of MLP), TorchMoji (emotion control), ThisPonyDoesNotExist for pony face images, MakeItTalk to animate & control said faces, etc. Screenshots: 1, 2, 3, 4; video demo.
Backlinks
Choose-Your-Own-Adventure AI Dungeon Games (full context):
GPT-3 Creative Fiction (full context):
Technology Holy Wars are Coordination Problems (full context):
The Scaling Hypothesis (full context):
Crowdsourcing The Best GPT-2-1.5b Poetry:
GPT-2 Preference Learning for Music Generation (full context):
GPT-2 Folk Music (full context):
How To Make Custom AI-Generated Text With GPT-2:
This Waifu Does Not Exist (full context):
Making Anime Faces With StyleGAN (full context):
Banner Ads Considered Harmful (full context):
RNN Metadata for Mimicking Author Style (full context):
Hydrocephalus and Intelligence: The Hollow Men (full context):
The Unreasonable Effectiveness of Recurrent Neural Networks:
GPT-2 Preference Learning for Music Generation (full context):
GPT-2 Folk Music (full context):
GPT-3 Creative Fiction (full context):
GPT-2 Folk Music (full context):
GPT-3 Creative Fiction (full context):
GPT-3 Creative Fiction (full context):
How Complex Are Individual Differences? (full context):
GPT-3 Creative Fiction (full context):