Bibliography (5):

  1. Attention Is All You Need

  2. Deep Residual Learning for Image Recognition

  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  4. GPT-3: Language Models are Few-Shot Learners

  5. https://github.com/microsoft/mup