Bibliography (4):
Attention Is All You Need
Long Range Arena (LRA): A Benchmark for Efficient Transformers
S4: Efficiently Modeling Long Sequences with Structured State Spaces
Wikipedia Bibliography:
Prior probability