Bibliography (47):

  1. https://github.com/gregorbachmann/scaling_mlps

  2. Pay Attention to MLPs

  3. MLP Architectures for Vision-and-Language Modeling: An Empirical Study

  4. https://www.kaggle.com/c/tiny-imagenet

  5. โ€‹ โ€˜MLP NNโ€™ directory

  6. ImageNet Large Scale Visual Recognition Challenge

  7. Attention Is All You Need

  8. MLP-Mixer: An all-MLP Architecture for Vision

  9. โ€‹ scaling-hypothesis#blessings-of-scale

    [Transclude the forward-link's context]

  10. Layer Normalization

  11. How far can we go without convolution: Improving fully-connected networks

  12. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

  13. Symbolic Discovery of Optimization Algorithms

  14. https://arxiv.org/pdf/2306.13575.pdf#page=5

  15. mixup: Beyond Empirical Risk Minimization

  16. Deep Residual Learning for Image Recognition

  17. https://arxiv.org/pdf/2306.13575.pdf#page=16

  18. https://arxiv.org/pdf/2306.13575.pdf#page=6

  19. Scaling Laws for Neural Language Models

  20. LLaMa-1: Open and Efficient Foundation Language Models

  21. https://arxiv.org/pdf/2306.13575.pdf#page=17

  22. โ€‹ index#convolution-learning

    [Transclude the forward-link's context]

  23. https://arxiv.org/pdf/2306.13575.pdf#page=7

  24. 2023-bachmann-figure1-mlpcomputescalingoncifar100.jpg

  25. 2023-bachmann-figure5-scalingofmlpsoncifar10andimagenet1k.png

  26. https://arxiv.org/pdf/2306.13575.pdf#page=8

  27. Chinchilla: Training Compute-Optimal Large Language Models

  28. https://arxiv.org/abs/2306.12517

  29. Vision Transformer: An Image is Worth 16ร—16 Words: Transformers for Image Recognition at Scale

  30. Faster SGD training by minibatch persistency

  31. ConvNeXt: A ConvNet for the 2020s

  32. ImageNet: A Large-Scale Hierarchical Image Database