Attention Is All You Need
S4: Efficiently Modeling Long Sequences with Structured State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Attention as an RNN
QRNNs: Quasi-Recurrent Neural Networks
xLSTM: Extended Long Short-Term Memory
GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation