Bibliography (12):

Attention Is All You Need
Transformers Learn Shortcuts to Automata
Neural Networks and the Chomsky Hierarchy
Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Exploring Length Generalization in Large Language Models
https://arxiv.org/pdf/2402.09963.pdf#page=2
https://arxiv.org/pdf/2402.09963.pdf#page=27
https://arxiv.org/pdf/2402.09963.pdf#page=34
Sensitivity as a Complexity Measure for Sequence Classification Tasks
Wikipedia Bibliography:
1. https://en.wikipedia.org/wiki/Parity_learning :
  
  https://en.wikipedia.org/wiki/Parity_learning
2. Autoregressive model