Bibliography (12):

  1. Attention Is All You Need

  2. Transformers Learn Shortcuts to Automata

  3. Neural Networks and the Chomsky Hierarchy

  4. Masked Hard-Attention Transformers and Boolean RASP Recognize Exactly the Star-Free Languages

  5. Show Your Work: Scratchpads for Intermediate Computation with Language Models

  6. Exploring Length Generalization in Large Language Models

  7. https://arxiv.org/pdf/2402.09963.pdf#page=2

  8. https://arxiv.org/pdf/2402.09963.pdf#page=27

  9. https://arxiv.org/pdf/2402.09963.pdf#page=34

  10. Sensitivity as a Complexity Measure for Sequence Classification Tasks