Bibliography (25):

  1. CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3

  2. CLIP: Learning Transferable Visual Models From Natural Language Supervision

  3. Very Deep Convolutional Networks for Large-Scale Image Recognition

  4. Deep Residual Learning for Image Recognition

  5. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  6. ImageNet Classification with Deep Convolutional Neural Networks

  7. Reassessing hierarchical correspondences between brain and deep networks through direct interface

  8. Deep image reconstruction from human brain activity

  9. Incorporating natural language into vision models improves prediction and understanding of higher visual cortex

  10. A Neurobiological Evaluation Metric for Neural Network Model Search

  11. End-to-end deep image reconstruction from human brain activity

  12. Brain-like functional specialization emerges spontaneously in deep neural networks

  13. Partial success in closing the gap between human and machine vision

  14. The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning

  15. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

  16. Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain

  17. A massive 7T fMRI dataset to bridge cognitive and computational neuroscience

  18. Semantic scene descriptions as an objective of human vision

  19. Deep learning models of cognitive processes constrained by human brain connectomes

  20. Brains and algorithms partially converge in natural language processing