“Music Transformer: Generating Music With Long-Term Structure”, Cheng-Zhi Anna Huang, Ian Simon, Monica Dinculescu2018-12-13 (, ; backlinks; similar)⁠:

Play with Music Transformer in an interactive Colab!

[Visualization of Transformer attention pattern over the input history]

Generating long pieces of music is a challenging problem, as music contains structure at multiple timescales, from millisecond timings to motifs to phrases to repetition of entire sections. We present Music Transformer, an attention-based neural network that can generate music with improved long-term coherence. Here are three piano performances generated by the model:

Similar to Performance RNN, we use an event-based representation that allows us to generate expressive performances directly (ie. without first generating a score). In contrast to an LSTM-based model like Performance RNN that compresses earlier events into a fixed-size hidden state, here we use a Transformer-based model that has direct access to all earlier events.

Our recent Wave2Midi2Wave project also uses Music Transformer as its language model. [Replication]