Bibliography (19):

  1. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  2. PaLI: A Jointly-Scaled Multilingual Language-Image Model

  3. ImageNet: A Large-Scale Hierarchical Image Database

  4. LiT: Zero-Shot Transfer with Locked-image Text Tuning

  5. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

  6. Scaling Vision Transformers

  7. A domain-specific supercomputer for training deep neural networks

  8. The Efficiency Misnomer

  9. PaLM: Scaling Language Modeling with Pathways

  10. Efficiently Scaling Transformer Inference

  11. Partial success in closing the gap between human and machine vision

  12. https://arxiv.org/pdf/2302.05442.pdf#page=40&org=google

  13. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

  14. Revisiting the Calibration of Modern Neural Networks

  15. https://arxiv.org/pdf/2302.05442#page=14&org=google

  16. https://arxiv.org/pdf/2302.05442.pdf#page=37&org=google

  17. Distilling the Knowledge in a Neural Network