Bibliography (6):

  1. Averaging Weights Leads to Wider Optima and Better Generalization

  2. Deep Residual Learning for Image Recognition

  3. DenseNet: Densely Connected Convolutional Networks

  4. ImageNet Large Scale Visual Recognition Challenge

  5. Wikipedia Bibliography:

    1. Loss function

    2. Stochastic gradient descent