Bibliography (6):
Averaging Weights Leads to Wider Optima and Better Generalization
Deep Residual Learning for Image Recognition
DenseNet: Densely Connected Convolutional Networks
ImageNet Large Scale Visual Recognition Challenge
Wikipedia Bibliography:
Loss function
Stochastic gradient descent