ā€œDeepTingleā€, Ahmed Khalifa, Gabriella A. B. Barros, Julian Togelius2017-05-09 (, , , )⁠:

DeepTingle is a text prediction and classification system trained on the collected works of the renowned fantastic gay erotica author Chuck Tingle. Whereas the writing assistance tools you use everyday (in the form of predictive text, translation, grammar checking and so on) are trained on generic, purportedly ā€œneutralā€ datasets, DeepTingle is trained on a very specific, internally consistent but externally arguably eccentric dataset. This allows us to foreground and confront the norms embedded in data-driven creativity and productivity assistance tools. As such tools effectively function as extensions of our cognition into technology, it is important to identify the norms they embed within themselves and, by extension, us.

DeepTingle is realized as a web application based on LSTM networks and the GloVe word embedding, implemented in JavaScript with Keras-JS.

…Our training set includes all Chuck Tingle books released until November 2016: a total of 109 short stories and 2 novels (with 11 chapters each) to create a corpus of 3,044,178 characters.

…After initial testing, we opted to switch to a word representation instead of character representation…The network consists of 6 layers. The first layer is an embedding one that converts an input word into its 100 dimension representation. It is followed by 2 LSTM layers of size 1,000, which in turn are followed by 2 fully connected layers of same size. Finally, there is a softmax layer of size 12,444 (the total number of unique words in all Tingle’s books).

…We experimented with various amount of time steps for the LSTM and settled for 6 time steps, for it generated sentences that were more grammatically correct and more coherent than the other experiments. Input data is designed to predict the next word based on the previous 6 words [!].

…Results show that using neural networks for text prediction produce more coherent and grammatically correct text than Markov chains, but less so than the original text, which is reasonable considering the latter is written and reviewed by a human.

Example 3: 150 words generated from the line ā€œCall me Ishmaelā€, without word substitution.

[This really emphasizes the extreme quality leap in text generation from word-RNNs to Transformers; although to be fair, char-RNNs usually worked better than this DeepTingle RNN did.]