“‘MoE NN’ Tag”,2019-09-28 (; backlinks):
![]()
Bibliography for tag
ai/scaling/mixture-of-experts, most recent first: 1 related tag, 76 annotations, & 13 links (parent).
- See Also
- Links
- “Mixture of Parrots: Experts Improve Memorization More Than Reasoning”, et al 2024
- “Upcycling Large Language Models into Mixture of Experts”, et al 2024
- “Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget”, et al 2024
- “Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google”, 2024
- “JetMoE: Reaching LLaMA-2 Performance With 0.1M Dollars”, et al 2024
- “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-2024
- “Mixture-Of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models”, et al 2024
- “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-Training”, et al 2024
- “Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models”, et al 2024
- “MoE-Mamba: Efficient Selective State Space Models With Mixture of Experts”, et al 2024
- “Mixtral of Experts”, et al 2024
- “Fast Inference of Mixture-Of-Experts Language Models With Offloading”, 2023
- “LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment”, et al 2023
- “SwitchHead: Accelerating Transformers With Mixture-Of-Experts Attention”, et al 2023
- “Exponentially Faster Language Modeling”, 2023
- “Sparse Universal Transformer”, et al 2023
- “Fast Feedforward Networks”, 2023
- “Non-Determinism in GPT-4 Is Caused by Sparse MoE”, 152334H 2023
- “From Sparse to Soft Mixtures of Experts”, et al 2023
- “Brainformers: Trading Simplicity for Efficiency”, et al 2023
- “CodeCompose: A Large-Scale Industrial Deployment of AI-Assisted Code Authoring”, et al 2023
- “Bridging Discrete and Backpropagation: Straight-Through and Beyond”, et al 2023
- “Scaling Expert Language Models With Unsupervised Domain Discovery”, et al 2023
- “Sparse MoE As the New Dropout: Scaling Dense and Self-Slimmable Transformers”, et al 2023
- “AltUp: Alternating Updates for Efficient Transformers”, et al 2023
- “Sparse Upcycling: Training Mixture-Of-Experts from Dense Checkpoints”, et al 2022
- “MegaBlocks: Efficient Sparse Training With Mixture-Of-Experts”, et al 2022
- “Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production”, et al 2022
- “EDiff-I: Text-To-Image Diffusion Models With an Ensemble of Expert Denoisers”, et al 2022
- “AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers”, et al 2022
- “A Review of Sparse Expert Models in Deep Learning”, et al 2022
- “Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?”, et al 2022
- “MoEC: Mixture of Expert Clusters”, et al 2022
- “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, et al 2022
- “Uni-Perceiver-MoE: Learning Sparse Generalist Models With Conditional MoEs”, et al 2022
- “Tutel: Adaptive Mixture-Of-Experts at Scale”, et al 2022
- “Gating Dropout: Communication-Efficient Regularization for Sparsely Activated Transformers”, et al 2022
- “Sparse Mixers: Combining MoE and Mixing to Build a More Efficient BERT”, Lee-2022
- “One Model, Multiple Modalities: A Sparsely Activated Approach for Text, Sound, Image, Video and Code”, et al 2022
- “InCoder: A Generative Model for Code Infilling and Synthesis”, et al 2022
- “WuDaoMM: A Large-Scale Multi-Modal Dataset for Pre-Training Models”, et al 2022
- “Efficient Language Modeling With Sparse All-MLP”, et al 2022
- “Mixture-Of-Experts With Expert Choice Routing”, et al 2022
- “ST-MoE: Designing Stable and Transferable Sparse Expert Models”, et al 2022
- “WuDao 2.0 With Its Lead Creator, Tang Jie”, et al 2022
- “DeepSpeed-MoE: Advancing Mixture-Of-Experts Inference and Training to Power Next-Generation AI Scale”, et al 2022
- “U.S. vs. China Rivalry Boosts Tech—And Tensions: Militarized AI Threatens a New Arms Race”, 2021
- “Efficient Large Scale Language Modeling With Mixtures of Experts”, et al 2021
- “GLaM: Efficient Scaling of Language Models With Mixture-Of-Experts”, et al 2021
- “Beyond Distillation: Task-Level Mixture-Of-Experts (TaskMoE) for Efficient Inference”, et al 2021
- “Scalable and Efficient MoE Training for Multitask Multilingual Models”, et al 2021
- “Sparse-MLP: A Fully-MLP Architecture With Conditional Computation”, et al 2021
- “Go Wider Instead of Deeper”, et al 2021
- “MCL-GAN: Generative Adversarial Networks With Multiple Specialized Discriminators”, 2021
- “CPM-2: Large-Scale Cost-Effective Pre-Trained Language Models”, et al 2021
- “V-MoE: Scaling Vision With Sparse Mixture of Experts”, et al 2021
- “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, 2021
- “Exploring Sparse Expert Models and Beyond”, et al 2021
- “RetGen: A Joint Framework for Retrieval and Grounded Text Generation Modeling”, et al 2021
- “Carbon Emissions and Large Neural Network Training”, et al 2021
- “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, 2021
- “Coordination Among Neural Modules Through a Shared Global Workspace”, et al 2021
- “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, et al 2021
- “GShard: Scaling Giant Models With Conditional Computation and Automatic Sharding”, et al 2020
- “Efficient Content-Based Sparse Attention With Routing Transformers”, et al 2020
- “One Model To Learn Them All”, et al 2017
- “Hard Mixtures of Experts for Large Scale Weakly Supervised Vision”, et al 2017
- “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer”, et al 2017
- “Conditional Computation in Neural Networks for Faster Models”, et al 2015
- “Distilling the Knowledge in a Neural Network”, et al 2015
- “Learning Factored Representations in a Deep Mixture of Experts”, et al 2013
- “Mixture of Experts: a Literature Survey”, 2012
- “Introduction to CPM”
- “GTC Spring 2021 Keynote With NVIDIA CEO Jensen Huang”
- “GTC 2021 Keynote With NVIDIA CEO Jensen Huang: NVIDIA CEO Jensen Huang Delivers the #GTC21 Keynote, Where He Introduced Amazing Breakthroughs in Building Virtual Worlds With NVIDIA Omniverse; in Advancing Enterprise Computing With New NVIDIA DGX Systems and Software; in Turning the Data Center into the New Unit of Computing With the New NVIDIA Grace CPU, BlueField-3 DPU, and DOCA 1.0 SDK; in Broadening the Reach of AI to All Companies and Industries With NVIDIA EGX and Aerial 5G; and in Transforming Transportation With NVIDIA DRIVE Orin and Atlan.”
- lepikhin
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography