“Excavate”, Mike Lynch2019-11-22 (, ; backlinks; similar)⁠:

After skipping it last year (I did NaNoWriMo instead) I decided that I missed doing National Novel Generating Month and thought I’d do something relatively simple, based on Tom Phillips’ A Humument, which I recently read for the first time. Phillips’ project was created by drawing over the pages of the forgotten Victorian novel A Human Document, leaving behind a handful of words on each page which form their own narrative, revealing a latent story in the original text. I wanted to simulate this process by taking a neural net trained on one text and use it to excavate a slice from a second text which would somehow preserve the style of the RNN. To get to the target length of 50,000 words, the second text would have to be very long, so I picked Robert Burton’s The Anatomy of Melancholy, which is over half a million words, and one of my favorite books.

The next step was to use this to implement the excavate algorithm, which works like this:

  1. read a vocab from the next L words from the primary text (Burton) where L is the lookahead parameter

  2. take the first letter of every word in the vocab and turn it into a constraint

  3. run the RNN with that constraint to get the next character C

  4. prune the vocab to those words with the first letter C, with that letter removed

  5. turn the new vocab into a new constraint and go back to 3

  6. once we’ve finished a word, add it to the results

  7. skip ahead to the word we picked, and read more words from the text until we have L words

  8. go back to 2 unless we’ve run out of original text, or reached the target word count

Here’s an example of how the RNN generates a single word with L set to 100:

Vocab 1: “prime cause of my disease. Or as he did, of whom Felix Plater speaks, that thought he had some of Aristophanes’ frogs in his belly, still crying Breec, okex, coax, coax, oop, oop, and for that cause studied physic seven years, and travelled over most part of Europe to ease himself. To do myself good I turned over such physicians as our libraries would afford, or my private friends impart, and have taken this pains. And why not? Cardan professeth he wrote his book, De Consolatione after his son’s death, to comfort himself; so did Tully”

RNN: s

Vocab 2: “peaks ome till tudied even uch on’s o”

RNN: t

Vocab 3: “ill udied”

RNN: u

Final result: studied

The algorithm then restarts with a new 100-word vocabulary starting at “physic seven years”

It works pretty well with a high enough lookahead value, although I’m not happy with how the algorithm decides when to end a word. The weight table always gets a list of all the punctuation symbols and a space, which means that the RNN can always bail out of a word half-way if it decides to. I tried constraining it so that it always finished a word once it had narrowed down the options to a single-word vocab, but when I did this, it somehow removed the patterns of punctuation and line-breaks—for example, the way the Three Musketeers RNN emits dialogue in quotation marks—and this was a quality of the RNN I wanted to preserve. I think a little more work could improve this.

…This kind of hybridisation can be applied to any RNN and base text, so there’s a lot of scope for exploration here, of grafting the grammar and style of one text onto the words from another. And the alliteration and lipogram experiments above are just two simple examples of more general ways in which I’ll be able to tamper with the output of RNNs.