“GPT-2 Folk Music § Spaceless Model”, Gwern, Shawn Presser2019-11-01 (, , , , ; similar)⁠:

Generating Irish/folk/classical music in ABC format using GPT-2-117M, with good results.

While training a GPT-2-117M on a folk music corpus written in ABC format, persistent syntax errors kept being generated by an otherwise-high-quality model: random spaces would be generated, rendering a music piece either erroneous or lower-quality. Why? It seems to be some issue with the GPT BPE encoder handling of spaces which makes it difficult to emit the right space-separated characters. We found that ABC does not actually require spaces, and we simply removed all spaces from the corpus—noticeably improving quality of generated pieces.