Bibliography (5):
Attention Is All You Need
Deep Residual Learning for Image Recognition
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT-3: Language Models are Few-Shot Learners
https://github.com/microsoft/mup