“GPT-2 Neural Network Poetry § Cleaning Project Gutenberg & Contemporary Poetry”, 2019-03-03 (; similar):
Demonstration tutorial of retraining OpenAI’s GPT-2 (a text-generating Transformer neural network) on large poetry corpuses to generate high-quality English verse.
Shawn Presser cleaned the Project Gutenberg poetry by using a heuristic on line numbers to guess where poems begin/end. This provides useful semantic metadata to the GPT-2-117M model, reducing “runon” or “ramblingness”, as it sees many discrete texts rather than a few book-length texts. I combined this improved PG poetry dataset with a new dataset on Kaggle, which scraped the Poetry Foundation website for modern/contemporary poetry, fixing the post-1920s emptiness of PG. The generated poems are much better.
- GPT-2-117M: Generating Poetry
- Training GPT-2-117M To Generate Poetry
- Training
GPT-2-poetry- Training
GPT-2-poetry-prefix- GPT-2-1.5b
- Overall
- Improvements
- External Links
- Appendix