Bibliography (4):
Contrastive Representation Learning: A Framework and Review
https://github.com/facebookresearch/NPM
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Wikipedia Bibliography:
Softmax function