“Generating Structured Music through Self-Attention”, 2018 ():
[samples] Music relies heavily on self-reference to build structure and meaning. We explore the TRANSFORMER architecture ( et al 2017) as a generative model for music, as self-attention has shown compelling results on tasks that require long-term structure such as Wikipedia summary generation ( et al 2018). However, timing information is critical for polyphonic music, and TRANSFORMER does not explicitly represent absolute or relative timing in its structure.
To address this challenge, et al 2018 introduced relative position representations to self-attention to improve machine translation. However, the formulation was not scalable to longer sequences.
We propose an improved formulation which reduces its memory requirements from 𝒪(l2d) to 𝒪(ld), making it possible to train much longer sequences and achieve faster convergence.
In experiments with symbolic music generation, we find that relative self-attention substantially improves sample quality. When primed, the model generates continuations that develop the prime in a coherent fashion and exhibit long-term structure.