Bibliography (7):
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ImageNet Large Scale Visual Recognition Challenge
β βend-to-endβ directory
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Wikipedia Bibliography:
Stochastic gradient descent
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adamββ:
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam
Variance