Bibliography (6):

Attention Is All You Need
A domain-specific supercomputer for training deep neural networks
https://github.com/NVIDIA/FasterTransformer
PaLM: Scaling Language Modeling with Pathways
Wikipedia Bibliography:
1. Pareto front
2. Bfloat16 floating-point format :
  
  https://en.wikipedia.org/wiki/Bfloat16_floating-point_format