- See Also
-
Links
- “Organic Reaction Mechanism Classification Using Machine Learning”, 2023
- “DataMUX: Data Multiplexing for Neural Networks”, Et Al 2023
- “Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, Et Al 2022
- “Magic3D: High-Resolution Text-to-3D Content Creation”, Et Al 2022
- “How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers”, Et Al 2022
- “The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes”, Et Al 2022
- “DreamFusion: Text-to-3D Using 2D Diffusion”, Et Al 2022
- “Random Initializations Performing above Chance and How to Find Them”, Et Al 2022
- “Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?”, Et Al 2022
- “Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?”, Et Al 2022
- “Revisiting Pretraining Objectives for Tabular Deep Learning”, Et Al 2022
- “RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt”, Et Al 2022
- “MLP-3D: A MLP-like 3D Architecture With Grouped Time Mixing”, Et Al 2022
- “Sparse Mixers: Combining MoE and Mixing to Build a More Efficient BERT”, Lee-2022
- “Towards Understanding Grokking: An Effective Theory of Representation Learning”, Et Al 2022
- “Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?”, 2022
- “MLP-ASR: Sequence-length Agnostic All-MLP Architectures for Speech Recognition”, Et Al 2022
- “Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs”, Et Al 2022
- “PNLP-Mixer: an Efficient All-MLP Architecture for Language”, Et Al 2022
- “Data-driven Emergence of Convolutional Structure in Neural Networks”, 2022
- “When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism (ShiftViT)”, Et Al 2022
- “ConvMixer: Patches Are All You Need?”, 2022
- “MAXIM: Multi-Axis MLP for Image Processing”, Et Al 2022
- “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]”, Et Al 2022
- “The GatedTabTransformer: An Enhanced Deep Learning Architecture for Tabular Modeling”, 2022
- “MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, Et Al 2021
- “Noether Networks: Meta-Learning Useful Conserved Quantities”, Et Al 2021
- “Zero-Shot Text-Guided Object Generation With Dream Fields”, Et Al 2021
- “MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video”, Et Al 2021
- “MetaFormer Is Actually What You Need for Vision”, Et Al 2021
- “PointMixer: MLP-Mixer for Point Cloud Understanding”, Et Al 2021
- “Deep Learning without Shortcuts: Shaping the Kernel With Tailored Rectifiers”, Et Al 2021
- “ZerO Initialization: Initializing Residual Networks With Only Zeros and Ones”, Et Al 2021
- “ADOP: Approximate Differentiable One-Pixel Point Rendering”, Et Al 2021
- “Exploring the Limits of Large Scale Pre-training”, Et Al 2021
- “Rapid Training of Deep Neural Networks without Skip Connections or Normalization Layers Using Deep Kernel Shaping”, Et Al 2021
- “Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?”, Et Al 2021
- “ConvMLP: Hierarchical Convolutional MLPs for Vision”, Et Al 2021
- “Sparse-MLP: A Fully-MLP Architecture With Conditional Computation”, Et Al 2021
- “Hire-MLP: Vision MLP via Hierarchical Rearrangement”, Et Al 2021
- “RaftMLP: How Much Can Be Done Without Attention and With Less Spatial Locality?”, 2021
- “S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision”, Et Al 2021
- “CycleMLP: A MLP-like Architecture for Dense Prediction”, Et Al 2021
- “AS-MLP: An Axial Shifted MLP Architecture for Vision”, Et Al 2021
- “Real-time Neural Radiance Caching for Path Tracing”, Et Al 2021
- “Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition”, Et Al 2021
- “Towards Biologically Plausible Convolutional Networks”, Et Al 2021
- “Well-tuned Simple Nets Excel on Tabular Datasets”, Et Al 2021
- “PairConnect: A Compute-Efficient MLP Alternative to Attention”, Et Al 2021
- “MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”, Et Al 2021
- “S2-MLP: Spatial-Shift MLP Architecture for Vision”, Et Al 2021
- “When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations”, Et Al 2021
- “Container: Context Aggregation Network”, Et Al 2021
- “MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation”, 2021
- “Pay Attention to MLPs”, Et Al 2021
- “FNet: Mixing Tokens With Fourier Transforms”, Lee-Et Al 2021
- “ResMLP: Feedforward Networks for Image Classification With Data-efficient Training”, Et Al 2021
- “Multi-scale Inference of Genetic Trait Architecture Using Biologically Annotated Neural Networks”, Et Al 2021
- “Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet”, Melas-2021
- “RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition”, Et Al 2021
- “MLP-Mixer: An All-MLP Architecture for Vision”, Et Al 2021
- “Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets”, Et Al 2021
- “Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, 2021
- “Fully-Connected Neural Nets”, 2021
- “Revisiting Simple Neural Probabilistic Language Models”, 2021
- “KiloNeRF: Speeding up Neural Radiance Fields With Thousands of Tiny MLPs”, Et Al 2021
- “Neural Geometric Level of Detail: Real-time Rendering With Implicit 3D Shapes”, Et Al 2021
- “Is MLP-Mixer a CNN in Disguise? As Part of This Blog Post, We Look at the MLP Mixer Architecture in Detail and Also Understand Why It Is Not Considered Convolution Free.”
- “TabTransformer: Tabular Data Modeling Using Contextual Embeddings”, Et Al 2020
- “Scaling down Deep Learning”, 2020
- “Image Generators With Conditionally-Independent Pixel Synthesis”, Et Al 2020
- “Fourier Neural Operator for Parametric Partial Differential Equations”, Et Al 2020
- “AFT: An Attention Free Transformer”, 2020
- “Towards Learning Convolutions from Scratch”, 2020
- “Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, 2020
- “SIREN: Implicit Neural Representations With Periodic Activation Functions”, Et Al 2020
- “Linformer: Self-Attention With Linear Complexity”, Et Al 2020
- “A Map of Object Space in Primate Inferotemporal Cortex”, Et Al 2020
- “Synthesizer: Rethinking Self-Attention in Transformer Models”, Et Al 2020
- “Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems”, Et Al 2020
- “NeRF: Representing Scenes As Neural Radiance Fields for View Synthesis”, Et Al 2020
- “ReZero Is All You Need: Fast Convergence at Large Depth”, Et Al 2020
- “Gesticulator: A Framework for Semantically-aware Speech-driven Gesture Generation”, Et Al 2020
- “Understanding the Generalization Of ‘Lottery Tickets’ In Neural Networks”, 2019
- “3D Human Pose Estimation via Human Structure-aware Fully Connected Network”, Et Al 2019
- “Finding the Needle in the Haystack With Convolutions: on the Benefits of Architectural Bias”, D’Et Al 2019
- “MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows”, Et Al 2019
- “Fixup Initialization: Residual Learning Without Normalization”, Et Al 2019
- “SwitchNet: a Neural Network Model for Forward and Inverse Scattering Problems”, 2018
- “Scalable Training of Artificial Neural Networks With Adaptive Sparse Connectivity Inspired by Network Science”, Et Al 2018
- “Deep Learning Generalizes Because the Parameter-function Map Is Biased towards Simple Functions”, Valle-Et Al 2018
- “NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations”, Et Al 2018
- “Improving Palliative Care With Deep Learning”, An Et Al 2018
- “Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery”, Et Al 2018
- “Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”, 2017 (page 3)
- “Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, 2017
- “The Shattered Gradients Problem: If Resnets Are the Answer, Then What Is the Question?”, Et Al 2017
- “Gender-From-Iris or Gender-From-Mascara?”, Et Al 2017
- “Skip Connections Eliminate Singularities”, 2017
- “Learning to Optimize”, 2016
- “Do Deep Convolutional Nets Really Need to Be Deep and Convolutional?”, Et Al 2016
- “Network Morphism”, Et Al 2016
- “Adding Gradient Noise Improves Learning for Very Deep Networks”, Et Al 2015
- “How Far Can We Go without Convolution: Improving Fully-connected Networks”, Et Al 2015
- “Tensorizing Neural Networks”, Et Al 2015
- “Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, 2015
- “The Loss Surfaces of Multilayer Networks”, Et Al 2014
- “One Weird Trick for Parallelizing Convolutional Neural Networks”, 2014
- “Do Deep Nets Really Need to Be Deep?”, 2013
- “Network In Network”, Et Al 2013
- “Deep Big Multilayer Perceptrons for Digit Recognition”, Cireşan Et Al 2012
- “Extraction De Séquences Numériques Dans Des Documents Manuscrits Quelconques”, 2006
- “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, Et Al 2003
- “NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002
- “Quantitative Analysis of Multivariate Data Using Artificial Neural Networks: A Tutorial Review and Applications to the Deconvolution of Pyrolysis Mass Spectra”, Et Al 1996
- “On the Ability of the Optimal Perceptron to Generalize”, Et Al 1990
- “Learning To Tell Two Spirals Apart”, 1988
- Miscellaneous
- Link Bibliography
See Also
Links
“Organic Reaction Mechanism Classification Using Machine Learning”, 2023
“Organic reaction mechanism classification using machine learning”, 2023-01-25 ( ; similar; bibliography)
“DataMUX: Data Multiplexing for Neural Networks”, Et Al 2023
“DataMUX: Data Multiplexing for Neural Networks”, 2023-01-13 ( ; backlinks; similar)
“Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, Et Al 2022
“Merging enzymatic and synthetic chemistry with computational synthesis planning”, 2022-12-14 ( ; similar; bibliography)
“Magic3D: High-Resolution Text-to-3D Content Creation”, Et Al 2022
“Magic3D: High-Resolution Text-to-3D Content Creation”, 2022-11-18 (similar)
“How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers”, Et Al 2022
“How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers”, 2022-11-07 ( ; backlinks; similar; bibliography)
“The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes”, Et Al 2022
“The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes”, 2022-10-11 ( ; similar)
“DreamFusion: Text-to-3D Using 2D Diffusion”, Et Al 2022
“DreamFusion: Text-to-3D using 2D Diffusion”, 2022-09-29 ( ; similar)
“Random Initializations Performing above Chance and How to Find Them”, Et Al 2022
“Random initializations performing above chance and how to find them”, 2022-09-15 (similar)
“Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?”, Et Al 2022
“Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?”, 2022-07-21 ( ; similar; bibliography)
“Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?”, Et Al 2022
“Why do tree-based models still outperform deep learning on tabular data?”, 2022-07-18 ( ; similar)
“Revisiting Pretraining Objectives for Tabular Deep Learning”, Et Al 2022
“Revisiting Pretraining Objectives for Tabular Deep Learning”, 2022-07-07 ( ; similar)
“RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt”, Et Al 2022
“RHO-LOSS: Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt”, 2022-06-14 ( ; similar; bibliography)
“MLP-3D: A MLP-like 3D Architecture With Grouped Time Mixing”, Et Al 2022
“MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing”, 2022-06-13 ( ; similar)
“Sparse Mixers: Combining MoE and Mixing to Build a More Efficient BERT”, Lee-2022
“Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT”, 2022-05-24 ( ; similar; bibliography)
“Towards Understanding Grokking: An Effective Theory of Representation Learning”, Et Al 2022
“Towards Understanding Grokking: An Effective Theory of Representation Learning”, 2022-05-20 ( ; backlinks; similar)
“Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?”, 2022
“Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?”, 2022-04-20 (similar)
“MLP-ASR: Sequence-length Agnostic All-MLP Architectures for Speech Recognition”, Et Al 2022
“MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition”, 2022-02-17 (backlinks; similar)
“Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs”, Et Al 2022
“Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs”, 2022-02-14 (similar; bibliography)
“PNLP-Mixer: an Efficient All-MLP Architecture for Language”, Et Al 2022
“pNLP-Mixer: an Efficient all-MLP Architecture for Language”, 2022-02-09 (backlinks; similar)
“Data-driven Emergence of Convolutional Structure in Neural Networks”, 2022
“Data-driven emergence of convolutional structure in neural networks”, 2022-02-01 ( ; backlinks; similar)
“When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism (ShiftViT)”, Et Al 2022
“When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism (ShiftViT)”, 2022-01-26 (backlinks; similar; bibliography)
“ConvMixer: Patches Are All You Need?”, 2022
“ConvMixer: Patches Are All You Need?”, 2022-01-24 ( ; backlinks; similar; bibliography)
“MAXIM: Multi-Axis MLP for Image Processing”, Et Al 2022
“MAXIM: Multi-Axis MLP for Image Processing”, 2022-01-09 (similar)
“Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]”, Et Al 2022
“Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [paper]”, 2022-01-06 ( ; similar)
“The GatedTabTransformer: An Enhanced Deep Learning Architecture for Tabular Modeling”, 2022
“The GatedTabTransformer: An enhanced deep learning architecture for tabular modeling”, 2022 ( ; similar)
“MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, Et Al 2021
“MLP Architectures for Vision-and-Language Modeling: An Empirical Study”, 2021-12-08 ( ; backlinks; similar)
“Noether Networks: Meta-Learning Useful Conserved Quantities”, Et Al 2021
“Noether Networks: Meta-Learning Useful Conserved Quantities”, 2021-12-06 ( ; similar)
“Zero-Shot Text-Guided Object Generation With Dream Fields”, Et Al 2021
“Zero-Shot Text-Guided Object Generation with Dream Fields”, 2021-12-02 ( ; similar)
“MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video”, Et Al 2021
“MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video”, 2021-11-24 ( ; backlinks; similar)
“MetaFormer Is Actually What You Need for Vision”, Et Al 2021
“MetaFormer is Actually What You Need for Vision”, 2021-11-22 (backlinks; similar; bibliography)
“PointMixer: MLP-Mixer for Point Cloud Understanding”, Et Al 2021
“PointMixer: MLP-Mixer for Point Cloud Understanding”, 2021-11-22 (backlinks; similar)
“Deep Learning without Shortcuts: Shaping the Kernel With Tailored Rectifiers”, Et Al 2021
“Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers”, 2021-11-18 (similar)
“ZerO Initialization: Initializing Residual Networks With Only Zeros and Ones”, Et Al 2021
“ZerO Initialization: Initializing Residual Networks with only Zeros and Ones”, 2021-10-25 (backlinks; similar)
“ADOP: Approximate Differentiable One-Pixel Point Rendering”, Et Al 2021
“ADOP: Approximate Differentiable One-Pixel Point Rendering”, 2021-10-13 ( ; similar)
“Exploring the Limits of Large Scale Pre-training”, Et Al 2021
“Exploring the Limits of Large Scale Pre-training”, 2021-10-05 ( ; similar; bibliography)
“Rapid Training of Deep Neural Networks without Skip Connections or Normalization Layers Using Deep Kernel Shaping”, Et Al 2021
“Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping”, 2021-10-05 (similar)
“Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?”, Et Al 2021
“Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?”, 2021-09-12 (backlinks; similar; bibliography)
“ConvMLP: Hierarchical Convolutional MLPs for Vision”, Et Al 2021
“ConvMLP: Hierarchical Convolutional MLPs for Vision”, 2021-09-09 (backlinks; similar; bibliography)
“Sparse-MLP: A Fully-MLP Architecture With Conditional Computation”, Et Al 2021
“Sparse-MLP: A Fully-MLP Architecture with Conditional Computation”, 2021-09-05 ( ; backlinks; similar)
“Hire-MLP: Vision MLP via Hierarchical Rearrangement”, Et Al 2021
“Hire-MLP: Vision MLP via Hierarchical Rearrangement”, 2021-08-30 (similar; bibliography)
“RaftMLP: How Much Can Be Done Without Attention and With Less Spatial Locality?”, 2021
“RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?”, 2021-08-09 (backlinks; similar; bibliography)
“S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision”, Et Al 2021
“S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision”, 2021-08-02 (similar; bibliography)
“CycleMLP: A MLP-like Architecture for Dense Prediction”, Et Al 2021
“CycleMLP: A MLP-like Architecture for Dense Prediction”, 2021-07-21 (backlinks; similar; bibliography)
“AS-MLP: An Axial Shifted MLP Architecture for Vision”, Et Al 2021
“AS-MLP: An Axial Shifted MLP Architecture for Vision”, 2021-07-18 (backlinks; similar; bibliography)
“Real-time Neural Radiance Caching for Path Tracing”, Et Al 2021
“Real-time Neural Radiance Caching for Path Tracing”, 2021-06-23 ( ; similar; bibliography)
“Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition”, Et Al 2021
“Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition”, 2021-06-23 (backlinks; similar; bibliography)
“Towards Biologically Plausible Convolutional Networks”, Et Al 2021
“Towards Biologically Plausible Convolutional Networks”, 2021-06-22 ( ; backlinks; similar)
“Well-tuned Simple Nets Excel on Tabular Datasets”, Et Al 2021
“Well-tuned Simple Nets Excel on Tabular Datasets”, 2021-06-21 ( ; backlinks; similar)
“PairConnect: A Compute-Efficient MLP Alternative to Attention”, Et Al 2021
“PairConnect: A Compute-Efficient MLP Alternative to Attention”, 2021-06-15 (backlinks; similar)
“MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”, Et Al 2021
“MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”, 2021-06-15 ( ; backlinks; similar)
“S2-MLP: Spatial-Shift MLP Architecture for Vision”, Et Al 2021
“S2-MLP: Spatial-Shift MLP Architecture for Vision”, 2021-06-14 (similar; bibliography)
“When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations”, Et Al 2021
“When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations”, 2021-06-03 (backlinks; similar; bibliography)
“Container: Context Aggregation Network”, Et Al 2021
“Container: Context Aggregation Network”, 2021-06-02 (backlinks; similar; bibliography)
“Pay Attention to MLPs”, Et Al 2021
“Pay Attention to MLPs”, 2021-05-17 ( ; similar; bibliography)
“FNet: Mixing Tokens With Fourier Transforms”, Lee-Et Al 2021
“FNet: Mixing Tokens with Fourier Transforms”, 2021-05-09 ( ; similar; bibliography)
“ResMLP: Feedforward Networks for Image Classification With Data-efficient Training”, Et Al 2021
“ResMLP: Feedforward networks for image classification with data-efficient training”, 2021-05-07 ( ; similar)
“Multi-scale Inference of Genetic Trait Architecture Using Biologically Annotated Neural Networks”, Et Al 2021
“Multi-scale Inference of Genetic Trait Architecture using Biologically Annotated Neural Networks”, 2021-05-06 ( ; backlinks; similar)
“Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet”, Melas-2021
“Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet”, 2021-05-06 ( ; backlinks; similar; bibliography)
“RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition”, Et Al 2021
“RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition”, 2021-05-05 (backlinks; similar; bibliography)
“MLP-Mixer: An All-MLP Architecture for Vision”, Et Al 2021
“MLP-Mixer: An all-MLP Architecture for Vision”, 2021-05-04 ( ; similar; bibliography)
“Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets”, Et Al 2021
“Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets”, 2021-05-01 (similar; bibliography)
“Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, 2021
“Sifting out the features by pruning: Are convolutional networks the winning lottery ticket of fully connected ones?”, 2021-04-27 ( ; backlinks; similar)
“Fully-Connected Neural Nets”, 2021
“Fully-Connected Neural Nets”, 2021-04-24 ( ; backlinks; similar; bibliography)
“Revisiting Simple Neural Probabilistic Language Models”, 2021
“Revisiting Simple Neural Probabilistic Language Models”, 2021-04-08 (backlinks; similar)
“KiloNeRF: Speeding up Neural Radiance Fields With Thousands of Tiny MLPs”, Et Al 2021
“KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs”, 2021-03-25 ( ; backlinks; similar)
“Neural Geometric Level of Detail: Real-time Rendering With Implicit 3D Shapes”, Et Al 2021
“Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes”, 2021-01-26 (backlinks; similar)
“Is MLP-Mixer a CNN in Disguise? As Part of This Blog Post, We Look at the MLP Mixer Architecture in Detail and Also Understand Why It Is Not Considered Convolution Free.”
“TabTransformer: Tabular Data Modeling Using Contextual Embeddings”, Et Al 2020
“TabTransformer: Tabular Data Modeling Using Contextual Embeddings”, 2020-12-11 ( ; similar)
“Scaling down Deep Learning”, 2020
“Scaling down Deep Learning”, 2020-12-01 ( ; backlinks; similar; bibliography)
“Image Generators With Conditionally-Independent Pixel Synthesis”, Et Al 2020
“Image Generators with Conditionally-Independent Pixel Synthesis”, 2020-11-27 ( ; backlinks; similar)
“Fourier Neural Operator for Parametric Partial Differential Equations”, Et Al 2020
“Fourier Neural Operator for Parametric Partial Differential Equations”, 2020-10-18 (backlinks; similar)
“AFT: An Attention Free Transformer”, 2020
“AFT: An Attention Free Transformer”, 2020-09-28 ( ; similar)
“Towards Learning Convolutions from Scratch”, 2020
“Towards Learning Convolutions from Scratch”, 2020-07-27 (backlinks; similar)
“Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, 2020
“Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, 2020-07-25 ( ; backlinks; similar; bibliography)
“SIREN: Implicit Neural Representations With Periodic Activation Functions”, Et Al 2020
“SIREN: Implicit Neural Representations with Periodic Activation Functions”, 2020-06-17 ( ; backlinks; similar)
“Linformer: Self-Attention With Linear Complexity”, Et Al 2020
“Linformer: Self-Attention with Linear Complexity”, 2020-06-08 ( ; similar)
“A Map of Object Space in Primate Inferotemporal Cortex”, Et Al 2020
“A map of object space in primate inferotemporal cortex”, 2020-06-03 ( ; similar)
“Synthesizer: Rethinking Self-Attention in Transformer Models”, Et Al 2020
“Synthesizer: Rethinking Self-Attention in Transformer Models”, 2020-05-02 ( ; similar; bibliography)
“Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems”, Et Al 2020
“Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems”, 2020-03-20 ( ; similar)
“NeRF: Representing Scenes As Neural Radiance Fields for View Synthesis”, Et Al 2020
“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”, 2020-03-19 ( ; backlinks; similar)
“ReZero Is All You Need: Fast Convergence at Large Depth”, Et Al 2020
“ReZero is All You Need: Fast Convergence at Large Depth”, 2020-03-10 (backlinks; similar)
“Gesticulator: A Framework for Semantically-aware Speech-driven Gesture Generation”, Et Al 2020
“Gesticulator: A framework for semantically-aware speech-driven gesture generation”, 2020-01-25 ( ; backlinks; similar)
“Understanding the Generalization Of ‘Lottery Tickets’ In Neural Networks”, 2019
“Understanding the generalization of ‘lottery tickets’ in neural networks”, 2019-11-25 ( ; backlinks; similar)
“3D Human Pose Estimation via Human Structure-aware Fully Connected Network”, Et Al 2019
“3D human pose estimation via human structure-aware fully connected network”, 2019-07-01 ( ; similar)
“Finding the Needle in the Haystack With Convolutions: on the Benefits of Architectural Bias”, D’Et Al 2019
“Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias”, 2019-06-16 (backlinks; similar)
“MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows”, Et Al 2019
“MoGlow: Probabilistic and controllable motion synthesis using normalizing flows”, 2019-05-16 ( ; backlinks; similar)
“Fixup Initialization: Residual Learning Without Normalization”, Et Al 2019
“Fixup Initialization: Residual Learning Without Normalization”, 2019-01-27 (backlinks; similar)
“SwitchNet: a Neural Network Model for Forward and Inverse Scattering Problems”, 2018
“SwitchNet: a neural network model for forward and inverse scattering problems”, 2018-10-23 (backlinks; similar)
“Scalable Training of Artificial Neural Networks With Adaptive Sparse Connectivity Inspired by Network Science”, Et Al 2018
“Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science”, 2018-06-19 (backlinks; similar)
“Deep Learning Generalizes Because the Parameter-function Map Is Biased towards Simple Functions”, Valle-Et Al 2018
“Deep learning generalizes because the parameter-function map is biased towards simple functions”, 2018-05-22 ( ; similar)
“NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations”, Et Al 2018
“NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations”, 2018-04-19 (backlinks; similar)
“Improving Palliative Care With Deep Learning”, An Et Al 2018
“Improving palliative care with deep learning”, 2018 ( ; similar)
“Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery”, Et Al 2018
“Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery”, 2018 ( ; similar)
“Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”, 2017 (page 3)
“Learning to Play Chess with Minimal Lookahead and Deep Value Neural Networks”, 2017-10-30 ( ; similar; bibliography)
“Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, 2017
“Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, 2017-05-04 ( ; backlinks; similar)
“The Shattered Gradients Problem: If Resnets Are the Answer, Then What Is the Question?”, Et Al 2017
“The Shattered Gradients Problem: If resnets are the answer, then what is the question?”, 2017-02-28 (backlinks; similar)
“Gender-From-Iris or Gender-From-Mascara?”, Et Al 2017
“Gender-From-Iris or Gender-From-Mascara?”, 2017-02-04 ( ; backlinks; similar)
“Skip Connections Eliminate Singularities”, 2017
“Skip Connections Eliminate Singularities”, 2017-01-31 (backlinks; similar)
“Learning to Optimize”, 2016
“Learning to Optimize”, 2016-06-06 ( ; backlinks; similar)
“Do Deep Convolutional Nets Really Need to Be Deep and Convolutional?”, Et Al 2016
“Do Deep Convolutional Nets Really Need to be Deep and Convolutional?”, 2016-03-17 ( ; backlinks; similar)
“Network Morphism”, Et Al 2016
“Network Morphism”, 2016-03-05 ( ; similar)
“Adding Gradient Noise Improves Learning for Very Deep Networks”, Et Al 2015
“Adding Gradient Noise Improves Learning for Very Deep Networks”, 2015-11-21 ( ; similar)
“How Far Can We Go without Convolution: Improving Fully-connected Networks”, Et Al 2015
“How far can we go without convolution: Improving fully-connected networks”, 2015-11-09 ( ; backlinks; similar)
“Tensorizing Neural Networks”, Et Al 2015
“Tensorizing Neural Networks”, 2015-09-22 ( ; backlinks; similar)
“Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, 2015
“Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, 2015-05-13 ( ; backlinks; similar)
“The Loss Surfaces of Multilayer Networks”, Et Al 2014
“The Loss Surfaces of Multilayer Networks”, 2014-11-30 (similar)
“One Weird Trick for Parallelizing Convolutional Neural Networks”, 2014
“One weird trick for parallelizing convolutional neural networks”, 2014-04-23
“Do Deep Nets Really Need to Be Deep?”, 2013
“Do Deep Nets Really Need to be Deep?”, 2013-12-21 ( ; backlinks; similar)
“Network In Network”, Et Al 2013
“Network In Network”, 2013-12-16 (backlinks; similar)
“Deep Big Multilayer Perceptrons for Digit Recognition”, Cireşan Et Al 2012
“Deep Big Multilayer Perceptrons for Digit Recognition”, 2012 (similar)
“Extraction De Séquences Numériques Dans Des Documents Manuscrits Quelconques”, 2006
“Extraction de séquences numériques dans des documents manuscrits quelconques”, 2006-12-05 (backlinks; similar)
“Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, Et Al 2003
“Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, 2003 (similar)
“NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002
“NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002-06-01 ( ; backlinks; similar)
“Quantitative Analysis of Multivariate Data Using Artificial Neural Networks: A Tutorial Review and Applications to the Deconvolution of Pyrolysis Mass Spectra”, Et Al 1996
“Quantitative Analysis of Multivariate Data Using Artificial Neural Networks: A Tutorial Review and Applications to the Deconvolution of Pyrolysis Mass Spectra”, 1996-08-01 (backlinks; similar)
“On the Ability of the Optimal Perceptron to Generalize”, Et Al 1990
“On the ability of the optimal perceptron to generalize”, 1990 (similar)
“Learning To Tell Two Spirals Apart”, 1988
“Learning To Tell Two Spirals Apart”, 1988-01 (backlinks; similar)
Miscellaneous
Link Bibliography
-
2023-bures.pdf
: “Organic Reaction Mechanism Classification Using Machine Learning”, Jordi Burés, Igor Larrosa: -
https://www.nature.com/articles/s41467-022-35422-y
: “Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, Itai Levin, Mengjie Liu, Christopher A. Voigt, Connor W. Coley: -
https://arxiv.org/abs/2211.03495
: “How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers”, Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz: -
https://arxiv.org/abs/2207.10551#google
: “Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?”, : -
https://arxiv.org/abs/2206.07137
: “RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt”, : -
https://arxiv.org/abs/2205.12399#google
: “Sparse Mixers: Combining MoE and Mixing to Build a More Efficient BERT”, James Lee-Thorp, Joshua Ainslie: -
https://arxiv.org/abs/2202.06510#microsoft
: “Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs”, Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou: -
https://arxiv.org/abs/2201.10801
: “When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism (ShiftViT)”, Guangting Wang, Yucheng Zhao, Chuanxin Tang, Chong Luo, Wenjun Zeng: -
https://arxiv.org/abs/2201.09792
: “ConvMixer: Patches Are All You Need?”, Asher Trockman, J. Zico Kolter: -
https://arxiv.org/abs/2111.11418
: “MetaFormer Is Actually What You Need for Vision”, Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan: -
https://arxiv.org/abs/2110.02095#google
: “Exploring the Limits of Large Scale Pre-training”, Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi: -
https://arxiv.org/abs/2109.05422
: “Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?”, Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng: -
https://arxiv.org/abs/2109.04454
: “ConvMLP: Hierarchical Convolutional MLPs for Vision”, Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi: -
https://arxiv.org/abs/2108.13341#huawei
: “Hire-MLP: Vision MLP via Hierarchical Rearrangement”, Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang: -
https://arxiv.org/abs/2108.04384
: “RaftMLP: How Much Can Be Done Without Attention and With Less Spatial Locality?”, Yuki Tatsunami, Masato Taki: -
https://arxiv.org/abs/2108.01072#baidu
: “S<sup>2< / sup>-MLPv2: Improved Spatial-Shift MLP Architecture for Vision”, Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li: -
https://arxiv.org/abs/2107.10224
: “CycleMLP: A MLP-like Architecture for Dense Prediction”, Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, Ping Luo: -
https://arxiv.org/abs/2107.08391
: “AS-MLP: An Axial Shifted MLP Architecture for Vision”, Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao: -
https://arxiv.org/abs/2106.12372#nvidia
: “Real-time Neural Radiance Caching for Path Tracing”, Thomas Müller, Fabrice Rousselle, Jan Novák, Alexander Keller: -
https://arxiv.org/abs/2106.12368
: “Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition”, Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng: -
https://arxiv.org/abs/2106.07477#baidu
: “S<sup>2< / sup>-MLP: Spatial-Shift MLP Architecture for Vision”, Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li: -
https://arxiv.org/abs/2106.01548
: “When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations”, Xiangning Chen, Cho-Jui Hsieh, Boqing Gong: -
https://arxiv.org/abs/2106.01401
: “Container: Context Aggregation Network”, Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi: -
https://arxiv.org/abs/2105.08050#google
: “Pay Attention to MLPs”, Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le: -
https://arxiv.org/abs/2105.03824#google
: “FNet: Mixing Tokens With Fourier Transforms”, James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon: -
https://arxiv.org/abs/2105.02723
: “Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet”, Luke Melas-Kyriazi: -
https://arxiv.org/abs/2105.01883
: “RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition”, Xiaohan Ding, Chunlong Xia, Xiangyu Zhang, Xiaojie Chu, Jungong Han, Guiguang Ding: -
https://arxiv.org/abs/2105.01601#google
: “MLP-Mixer: An All-MLP Architecture for Vision”, : -
2021-power.pdf#openai
: “Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets”, Alethea Power, Yuri Burda, Harri Edwards, Igor Babuschkin, Vedant Misra: -
fc
: “Fully-Connected Neural Nets”, Gwern Branwen: -
https://greydanus.github.io/2020/12/01/scaling-down/
: “Scaling down Deep Learning”, Sam Greydanus: -
attention
: “Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, Gwern Branwen: -
https://arxiv.org/abs/2005.00743#google
: “Synthesizer: Rethinking Self-Attention in Transformer Models”, Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng: -
2017-sabatelli.pdf#page=3
: “Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”, Matthia Sabatelli: