Bibliography (12):

  1. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  2. Scaling Vision Transformers

  3. ImageNet: A Large-Scale Hierarchical Image Database

  4. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

  5. https://arxiv.org/pdf/2205.04596.pdf#page=19&org=google

  6. Evaluating Machine Accuracy on ImageNet

  7. BASIC: Combined Scaling for Open-Vocabulary Image Classification

  8. ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

  9. ImageNet Large Scale Visual Recognition Challenge

  10. CoCa: Contrastive Captioners are Image-Text Foundation Models

  11. Deep Residual Learning for Image Recognition

  12. Wikipedia Bibliography:

    1. AlexNet