“‘MLP NN’ Tag”,2019-08-31 ():
![]()
Bibliography for tag
ai/nn/fully-connected, most recent first: 2 related tags, 206 annotations, & 45 links (parent).
- See Also
- Gwern
- Links
- “AUNN: Simple Implementation of Gwern’s AUNN Proposal”, 2024
- “The Slingshot Helps With Learning”, 2024
- “SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning”, et al 2024
- “NGPT: Normalized Transformer With Representation Learning on the Hypersphere”, et al 2024
- “How Feature Learning Can Improve Neural Scaling Laws”, et al 2024
- “Magika: AI-Powered Content-Type Detection”, et al 2024
- “On the Complexity of Neural Computation in Superposition”, 2024
- “Masked Mixers for Language Generation and Retrieval”, 2024
- “GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music”
- “What Matters in Transformers? Not All Attention Is Needed”, et al 2024
- “When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, et al 2024
- “Probing the Decision Boundaries of In-Context Learning in Large Language Models”, et al 2024
- “MAR: Autoregressive Image Generation without Vector Quantization”, et al 2024
- “Grokking Modular Polynomials”, et al 2024
- “Grokfast: Accelerated Grokking by Amplifying Slow Gradients”, et al 2024
- “Lateralization MLP: A Simple Brain-Inspired Architecture for Diffusion”, 2024
- “MLPs Learn In-Context”, 2024
- “Verified Neural Compressed Sensing”, et al 2024
- “Neural Redshift: Random Networks Are Not Random Functions”, et al 2024
- “Neural Spline Fields for Burst Image Fusion and Layer Separation”, et al 2023
- “SwitchHead: Accelerating Transformers With Mixture-Of-Experts Attention”, et al 2023
- “SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration”, et al 2023
- “Grokking Group Multiplication With Cosets”, et al 2023
- “Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks As an Alternative to Attention Layers in Transformers”, et al 2023
- “HyperFields: Towards Zero-Shot Generation of NeRFs from Text”, et al 2023
- “Grokking Beyond Neural Networks: An Empirical Exploration With Model Complexity”, et al 2023
- “To Grok or Not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets”, et al 2023
- “Polynomial Time Cryptanalytic Extraction of Neural Network Models”, et al 2023
- “One Wide Feedforward Is All You Need”, et al 2023
- “Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla”, et al 2023
- “Self Expanding Neural Networks”, et al 2023
- “The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks”, et al 2023
- “Scaling MLPs: A Tale of Inductive Bias”, et al 2023
- “Any Deep ReLU Network Is Shallow”, 2023
- “Does the First Letter of One’s Name Affect Life Decisions? A Natural Language Processing Examination of Nominative Determinism”, et al 2023
- “Learning and Memorization”, 2023
- “How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model”, et al 2023
- “Two-Step Training: Adjustable Sketch Colorization via Reference Image and Text Tag”, et al 2023
- “HyperDiffusion: Generating Implicit Neural Fields With Weight-Space Diffusion”, et al 2023
- “The Quantization Model of Neural Scaling”, et al 2023
- “TSMixer: An All-MLP Architecture for Time Series Forecasting”, et al 2023
- “Loss Landscapes Are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent”, et al 2023
- “A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations”, et al 2023
- “Looped Transformers As Programmable Computers”, et al 2023
- “Organic Reaction Mechanism Classification Using Machine Learning”, 2023
- “DataMUX: Data Multiplexing for Neural Networks”, et al 2023
- “Merging Enzymatic and Synthetic Chemistry With Computational Synthesis Planning”, et al 2022
- “Magic3D: High-Resolution Text-To-3D Content Creation”, et al 2022
- “How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers”, et al 2022
- “Deep Differentiable Logic Gate Networks”, et al 2022
- “The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers”, et al 2022
- “The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes”, et al 2022
- “Scaling Forward Gradient With Local Losses”, et al 2022
- “Omnigrok: Grokking Beyond Algorithmic Data”, et al 2022
- “DreamFusion: Text-To-3D Using 2D Diffusion”, et al 2022
- “
g.pt: Learning to Learn With Generative Models of Neural Network Checkpoints”, et al 2022- “Random Initializations Performing above Chance and How to Find Them”, et al 2022
- “Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?”, et al 2022
- “Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?”, et al 2022
- “Revisiting Pretraining Objectives for Tabular Deep Learning”, et al 2022
- “RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt”, et al 2022
- “MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing”, et al 2022
- “ChordMixer: A Scalable Neural Attention Model for Sequences With Different Lengths”, et al 2022
- “Sparse Mixers: Combining MoE and Mixing to Build a More Efficient BERT”, Lee-2022
- “Towards Understanding Grokking: An Effective Theory of Representation Learning”, et al 2022
- “Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention”, et al 2022
- “Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?”, 2022
- “Efficient Language Modeling With Sparse All-MLP”, et al 2022
- “HyperMixer: An MLP-Based Low Cost Alternative to Transformers”, et al 2022
- “MLP-ASR: Sequence-Length Agnostic All-MLP Architectures for Speech Recognition”, et al 2022
- “Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs”, et al 2022
- “PNLP-Mixer: an Efficient All-MLP Architecture for Language”, et al 2022
- “Data-Driven Emergence of Convolutional Structure in Neural Networks”, 2022
- “When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism (ShiftViT)”, et al 2022
- “ConvMixer: Patches Are All You Need?”, 2022
- “MAXIM: Multi-Axis MLP for Image Processing”, et al 2022
- “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets [Paper]”, et al 2022
- “The GatedTabTransformer: An Enhanced Deep Learning Architecture for Tabular Modeling”, 2022
- “MLP Architectures for Vision-And-Language Modeling: An Empirical Study”, et al 2021
- “Noether Networks: Meta-Learning Useful Conserved Quantities”, et al 2021
- “Zero-Shot Text-Guided Object Generation With Dream Fields”, et al 2021
- “MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video”, et al 2021
- “PointMixer: MLP-Mixer for Point Cloud Understanding”, et al 2021
- “MetaFormer Is Actually What You Need for Vision”, et al 2021
- “Deep Learning without Shortcuts: Shaping the Kernel With Tailored Rectifiers”, et al 2021
- “ZerO Initialization: Initializing Residual Networks With Only Zeros and Ones”, et al 2021
- “Wide Neural Networks Forget Less Catastrophically”, et al 2021
- “ADOP: Approximate Differentiable One-Pixel Point Rendering”, et al 2021
- “Rapid Training of Deep Neural Networks without Skip Connections or Normalization Layers Using Deep Kernel Shaping”, et al 2021
- “Exploring the Limits of Large Scale Pre-Training”, et al 2021
- “Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?”, et al 2021
- “ConvMLP: Hierarchical Convolutional MLPs for Vision”, et al 2021
- “Sparse-MLP: A Fully-MLP Architecture With Conditional Computation”, et al 2021
- “A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP”, et al 2021
- “Hire-MLP: Vision MLP via Hierarchical Rearrangement”, et al 2021
- “RaftMLP: How Much Can Be Done Without Attention and With Less Spatial Locality?”, 2021
- “S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision”, et al 2021
- “CycleMLP: A MLP-Like Architecture for Dense Prediction”, et al 2021
- “AS-MLP: An Axial Shifted MLP Architecture for Vision”, et al 2021
- “Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition”, et al 2021
- “Real-Time Neural Radiance Caching for Path Tracing”, et al 2021
- “Towards Biologically Plausible Convolutional Networks”, et al 2021
- “Well-Tuned Simple Nets Excel on Tabular Datasets”, et al 2021
- “MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis”, et al 2021
- “PairConnect: A Compute-Efficient MLP Alternative to Attention”, et al 2021
- “S2-MLP: Spatial-Shift MLP Architecture for Vision”, et al 2021
- “When Vision Transformers Outperform ResNets without Pre-Training or Strong Data Augmentations”, et al 2021
- “Container: Context Aggregation Network”, et al 2021
- “MixerGAN: An MLP-Based Architecture for Unpaired Image-To-Image Translation”, 2021
- “One4all User Representation for Recommender Systems in E-Commerce”, et al 2021
- “Pay Attention to MLPs”, et al 2021
- “FNet: Mixing Tokens With Fourier Transforms”, Lee- et al 2021
- “ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training”, et al 2021
- “Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet”, Melas-2021
- “Multi-Scale Inference of Genetic Trait Architecture Using Biologically Annotated Neural Networks”, et al 2021
- “RepMLP: Re-Parameterizing Convolutions into Fully-Connected Layers for Image Recognition”, et al 2021
- “MLP-Mixer: An All-MLP Architecture for Vision”, et al 2021
- “Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets”, et al 2021
- “Sifting out the Features by Pruning: Are Convolutional Networks the Winning Lottery Ticket of Fully Connected Ones?”, 2021
- “Revisiting Simple Neural Probabilistic Language Models”, 2021
- “KiloNeRF: Speeding up Neural Radiance Fields With Thousands of Tiny MLPs”, et al 2021
- “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”, et al 2021
- “Attention Is Not All You Need: Pure Attention Loses Rank Doubly Exponentially With Depth”, et al 2021
- “Clusterability in Neural Networks”, et al 2021
- “Training Larger Networks for Deep Reinforcement Learning”, et al 2021
- “Explaining Neural Scaling Laws”, et al 2021
- “Neural Geometric Level of Detail: Real-Time Rendering With Implicit 3D Shapes”, et al 2021
- “Is MLP-Mixer a CNN in Disguise? As Part of This Blog Post, We Look at the MLP Mixer Architecture in Detail and Also Understand Why It Is Not Considered Convolution Free.”
- “Transformer Feed-Forward Layers Are Key-Value Memories”, et al 2020
- “AdnFM: An Attentive DenseNet Based Factorization Machine for CTR Prediction”, et al 2020
- “TabTransformer: Tabular Data Modeling Using Contextual Embeddings”, et al 2020
- “Scaling down Deep Learning”, 2020
- “Image Generators With Conditionally-Independent Pixel Synthesis”, et al 2020
- “D2RL: Deep Dense Architectures in Reinforcement Learning”, et al 2020
- “Fourier Neural Operator for Parametric Partial Differential Equations”, et al 2020
- “AFT: An Attention Free Transformer”, 2020
- “Towards Learning Convolutions from Scratch”, 2020
- “Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains”, et al 2020
- “SIREN: Implicit Neural Representations With Periodic Activation Functions”, et al 2020
- “Linformer: Self-Attention With Linear Complexity”, et al 2020
- “A Map of Object Space in Primate Inferotemporal Cortex”, et al 2020
- “Synthesizer: Rethinking Self-Attention in Transformer Models”, et al 2020
- “Deep Learning Training in Facebook Data Centers: Design of Scale-Up and Scale-Out Systems”, et al 2020
- “NeRF: Representing Scenes As Neural Radiance Fields for View Synthesis”, et al 2020
- “Cryptanalytic Extraction of Neural Network Models”, et al 2020
- “ReZero Is All You Need: Fast Convergence at Large Depth”, et al 2020
- “Train-By-Reconnect: Decoupling Locations of Weights from Their Values (LaPerm)”, 2020
- “Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?”, et al 2020
- “Quasi-Equivalence of Width and Depth of Neural Networks”, et al 2020
- “Gesticulator: A Framework for Semantically-Aware Speech-Driven Gesture Generation”, et al 2020
- “What’s Hidden in a Randomly Weighted Neural Network?”, et al 2019
- “Understanding the Generalization of ‘Lottery Tickets’ in Neural Networks”, 2019
- “The Bouncer Problem: Challenges to Remote Explainability”, 2019
- “3D Human Pose Estimation via Human Structure-Aware Fully Connected Network”, et al 2019d
- “Finding the Needle in the Haystack With Convolutions: on the Benefits of Architectural Bias”, d’ et al 2019
- “MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows”, et al 2019
- “Fixup Initialization: Residual Learning Without Normalization”, et al 2019
- “SwitchNet: a Neural Network Model for Forward and Inverse Scattering Problems”, 2018
- “A Jamming Transition from Under-Parameterization to Over-Parameterization Affects Loss Landscape and Generalization”, et al 2018
- “Neural Arithmetic Logic Units”, et al 2018
- “The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes”, 2018
- “Scalable Training of Artificial Neural Networks With Adaptive Sparse Connectivity Inspired by Network Science”, et al 2018
- “Deep Learning Generalizes Because the Parameter-Function Map Is Biased towards Simple Functions”, Valle- et al 2018
- “Bidirectional Learning for Robust Neural Networks”, Pontes-2018
- “NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations”, et al 2018
- “Meta-Learning Update Rules for Unsupervised Representation Learning”, et al 2018
- “Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery”, et al 2018
- “Improving Palliative Care With Deep Learning”, An et al 2018
- “Learning to Play Chess With Minimal Lookahead and Deep Value Neural Networks”, 2017 (page 3)
- “Neural Collaborative Filtering”, et al 2017
- “Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, 2017
- “The Shattered Gradients Problem: If Resnets Are the Answer, Then What Is the Question?”, et al 2017
- “Gender-From-Iris or Gender-From-Mascara?”, et al 2017
- “Skip Connections Eliminate Singularities”, 2017
- “Deep Information Propagation”, et al 2016
- “Topology and Geometry of Half-Rectified Network Optimization”, 2016
- “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima”, et al 2016
- “Decoupled Neural Interfaces Using Synthetic Gradients”, et al 2016
- “Learning to Optimize”, 2016
- “Do Deep Convolutional Nets Really Need to Be Deep and Convolutional?”, et al 2016
- “Network Morphism”, et al 2016
- “Adding Gradient Noise Improves Learning for Very Deep Networks”, et al 2015
- “How Far Can We Go without Convolution: Improving Fully-Connected Networks”, et al 2015
- “BinaryConnect: Training Deep Neural Networks With Binary Weights during Propagations”, et al 2015
- “Tensorizing Neural Networks”, et al 2015
- “A Neural Attention Model for Abstractive Sentence Summarization”, et al 2015
- “Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, 2015
- “In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning”, et al 2014
- “The Loss Surfaces of Multilayer Networks”, et al 2014
- “On the Number of Linear Regions of Deep Neural Networks”, et al 2014
- “Do Deep Nets Really Need to Be Deep?”, 2013
- “On the Number of Response Regions of Deep Feed Forward Networks With Piece-Wise Linear Activations”, et al 2013
- “Network In Network”, et al 2013
- “Deep Big Multilayer Perceptrons for Digit Recognition”, Cireşan et al 2012
- “Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition”, et al 2010
- “Compositional Pattern Producing Networks: A Novel Abstraction of Development”, 2007
- “Extraction De Séquences Numériques Dans Des Documents Manuscrits Quelconques”, 2006
- “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, et al 2003
- “NEAT: Evolving Neural Networks through Augmenting Topologies”, 2002
- “DARPA and the Quest for Machine Intelligence, 1983–10199331ya”, 2002
- “Quantitative Analysis of Multivariate Data Using Artificial Neural Networks: A Tutorial Review and Applications to the Deconvolution of Pyrolysis Mass Spectra”, et al 1996
- “Statistical Mechanics of Generalization”, 1996
- “On the Ability of the Optimal Perceptron to Generalize”, et al 1990
- “Learning To Tell Two Spirals Apart”, 1988
- “Learning Internal Representations by Error Propagation”, et al 1986
- “Neural Networks and Physical Systems With Emergent Collective Computational Abilities”, 1982
- Wikipedia
- Miscellaneous
- Bibliography