Bibliography (4):

  1. https://x.com/guy__dar/status/1567445086320852993

  2. Attention Is All You Need

  3. A Mathematical Framework for Transformer Circuits

  4. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space