Bibliography (6):

  1. Attention Is All You Need

  2. A domain-specific supercomputer for training deep neural networks

  3. https://github.com/NVIDIA/FasterTransformer

  4. PaLM: Scaling Language Modeling with Pathways