AI and Compute
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Towards a Human-like Open-Domain Chatbot
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
GPT-3: Language Models are Few-Shot Learners
The Evolved Transformer
https://mlcommons.org/
https://www.hpcwire.com/2019/03/19/aws-upgrades-its-gpu-backed-ai-inference-platform/
https://aws.amazon.com/blogs/aws/amazon-ec2-update-inf1-instances-with-aws-inferentia-chips-for-high-performance-cost-effective-inferencing/
https://arxiv.org/pdf/2104.10350.pdf#page=6
Attention Is All You Need
Energy and Policy Considerations for Deep Learning in NLP
The Evolved Transformer
https://arxiv.org/pdf/2104.10350#page=21&org=google
https://arxiv.org/pdf/2104.10350.pdf#page=9
https://arxiv.org/pdf/2104.10350.pdf#page=3
https://www.gstatic.com/gumdrop/sustainability/google-2020-environmental-report.pdf
https://arxiv.org/pdf/2104.10350.pdf#page=14
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding