Bibliography (4):

  1. https://x.com/MrCatid/status/1740461942697779397

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Fast Feedforward Networks

  4. https://github.com/pbelcak/UltraFastBERT