Bibliography (4):
https://x.com/MrCatid/status/1740461942697779397
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Fast Feedforward Networks
https://github.com/pbelcak/UltraFastBERT