Bibliography (92):

  1. https://jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

  2. Revisiting Simple Neural Probabilistic Language Models

  3. PairConnect: A Compute-Efficient MLP Alternative to Attention

  4. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

  5. Extraction de sΓ©quences numΓ©riques dans des documents manuscrits quelconques

  6. Deep Big Multilayer Perceptrons for Digit Recognition

  7. Do Deep Nets Really Need to be Deep?

  8. Network In Network

  9. How far can we go without convolution: Improving fully-connected networks

  10. Deep Neural Networks for Large Vocabulary Handwritten Text Recognition

  11. Tensorizing Neural Networks

  12. Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

  13. https://arxiv.org/pdf/1603.05691.pdf#page=7

  14. The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

  15. Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

  16. ​ face#sussman-attains-enlightenment

    [Transclude the forward-link's context]

  17. The Shattered Gradients Problem: If resnets are the answer, then what is the question?

  18. NFNet: High-Performance Large-Scale Image Recognition Without Normalization

  19. Fixup Initialization: Residual Learning Without Normalization

  20. Improving Transformer Optimization Through Better Initialization

  21. Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping

  22. ZerO Initialization: Initializing Residual Networks with only Zeros and Ones

  23. Understanding the Covariance Structure of Convolutional Filters

  24. Mimetic Initialization of Self-Attention Layers

  25. https://x.com/hi_tysam/status/1721764010159477161

  26. The Goldilocks zone: Towards better understanding of neural network loss landscapes

  27. Skip Connections Eliminate Singularities

  28. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

  29. NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations

  30. SwitchNet: a neural network model for forward and inverse scattering problems

  31. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science

  32. Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

  33. ReZero is All You Need: Fast Convergence at Large Depth

  34. Towards Learning Convolutions from Scratch

  35. Sifting out the features by pruning: Are convolutional networks the winning lottery ticket of fully connected ones?

  36. Towards Biologically Plausible Convolutional Networks

  37. Adapting the Function Approximation Architecture in Online Reinforcement Learning

  38. Data-driven emergence of convolutional structure in neural networks

  39. Noise Transforms Feed-Forward Networks into Sparse Coding Networks

  40. A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

  41. Scaling MLPs: A Tale of Inductive Bias

  42. Gesticulator: A framework for semantically-aware speech-driven gesture generation

  43. RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

  44. Less is More: Pay Less Attention in Vision Transformers

  45. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

  46. Well-tuned Simple Nets Excel on Tabular Datasets

  47. How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

  48. MLPs Learn In-Context

  49. MLP-Mixer: An all-MLP Architecture for Vision

  50. When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

  51. MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

  52. S2-MLP: Spatial-Shift MLP Architecture for Vision

  53. S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision

  54. When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism (ShiftViT)

  55. Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

  56. ResMLP: Feedforward networks for image classification with data-efficient training

  57. Pay Attention to MLPs

  58. MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

  59. Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition

  60. Container: Context Aggregation Network

  61. CycleMLP: A MLP-like Architecture for Dense Prediction

  62. PointMixer: MLP-Mixer for Point Cloud Understanding

  63. RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

  64. AS-MLP: An Axial Shifted MLP Architecture for Vision

  65. Hire-MLP: Vision MLP via Hierarchical Rearrangement

  66. Sparse-MLP: A Fully-MLP Architecture with Conditional Computation

  67. ConvMLP: Hierarchical Convolutional MLPs for Vision

  68. Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

  69. ConvMixer: Patches Are All You Need?

  70. Exploring the Limits of Large Scale Pre-training

  71. MLP Architectures for Vision-and-Language Modeling: An Empirical Study

  72. pNLP-Mixer: an Efficient all-MLP Architecture for Language

  73. Masked Mixers for Language Generation and Retrieval

  74. MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video

  75. Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

  76. ​ β€˜self-attention’ directory

  77. AFT: An Attention Free Transformer

  78. Synthesizer: Rethinking Self-Attention in Transformer Models

  79. Linformer: Self-Attention with Linear Complexity

  80. Luna: Linear Unified Nested Attention

  81. Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks (EAMLP)

  82. MetaFormer is Actually What You Need for Vision

  83. MoGlow: Probabilistic and controllable motion synthesis using normalizing flows

  84. A Style-Based Generator Architecture for Generative Adversarial Networks

  85. 2018-karras-stylegan-figure1-styleganarchitecture.png

  86. Image Generators with Conditionally-Independent Pixel Synthesis

  87. Fourier Neural Operator for Parametric Partial Differential Equations

  88. SIREN: Implicit Neural Representations with Periodic Activation Functions

  89. Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes

  90. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

  91. KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

  92. MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation