“Tradformer: A Transformer Model of Traditional Music Transcriptions”, 2022-05-20 ():
We explore the GPT-2 transformer neural network architecture for modeling music, specifically Irish and Swedish traditional dance music. Given the repetitive structures of these kinds of music, the transformer should be as successful with fewer parameters and complexity as the hitherto most successful model, a vanilla LSTM RNN network [folk-RNN].
We find that achieving good performance with the transformer is not straightforward, and careful consideration is needed for the sampling strategy, evaluating intermediate outputs in relation to engineering choices, and finally analyzing what the model learns. We discuss these points with several illustrations, providing reusable insights for engineering other music generation systems.
We also report the high performance of our final transformer model Tradformer in a competition of music generation systems focused on a type of Swedish dance.
…We used in total 50 epochs. These parameters however did not seem to have a major impact on model convergence. When Tradformer showed small to no improvement in its validation loss during training, we could still detect improvement in the quality of generated tunes. An explanation could be that, looking at the loss function, a wrong token is still wrong even if it is musically plausible. As training goes by, the average number of mistakes could be the same but the quality of those mistakes could be improving from a music theory standpoint.
…For sampling, Tradformer employs a combination of beam search and nucleus sampling…We found the biggest improvement in the quality of the generated music came from replacing a naive approach with a more sophisticated one based on beam search. Our early models were using a naive sampling approach.