“ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training”, 2021-05-07 (; similar):
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.
It is a simple residual network that alternates (1) a linear layer in which image patches interact, independently and identically across channels, and (2) a two-layer feed-forward network in which channels interact independently per patch.
When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labeled dataset.
Finally, by adapting our model to machine translation we achieve surprisingly good results.
We share pre-trained models and our code based on the Timm library.