Bibliography (4):
https://x.com/guy__dar/status/1567445086320852993
Attention Is All You Need
A Mathematical Framework for Transformer Circuits
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space