Bibliography (6):
Attention Is All You Need
A domain-specific supercomputer for training deep neural networks
https://github.com/NVIDIA/FasterTransformer
PaLM: Scaling Language Modeling with Pathways
Wikipedia Bibliography:
Pareto front
Bfloat16 floating-point format :
https://en.wikipedia.org/wiki/Bfloat16_floating-point_format