Gradient descent
Deep learning
https://en.wikipedia.org/wiki/Robust_optimization
:
https://en.wikipedia.org/wiki/F-divergence
:
Stochastic gradient descent
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam
:
BERT (language model)
Vision transformer
ImageNet