Bibliography (5):
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
ImageNet Large Scale Visual Recognition Challenge
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Wikipedia Bibliography:
Stochastic gradient descent
Variance