“‘Knowledge Distillation’ Tag”,2019-12-16 (; backlinks):
![]()
Bibliography for tag
ai/nn/sparsity/knowledge-distillation, most recent first: 3 related tags, 167 annotations, & 19 links (parent).
- See Also
- Gwern
- Links
- “LoLCATs: On Low-Rank Linearizing of Large Language Models”, et al 2024
- “The Mamba in the Llama: Distilling and Accelerating Hybrid Models”, et al 2024
- “Gemma 2: Improving Open Language Models at a Practical Size”, et al 2024
- “Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99%”, et al 2024
- “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, et al 2024
- “Streamlining Redundant Layers to Compress Large Language Models”, et al 2024
- “SDXS: Real-Time One-Step Latent Diffusion Models With Image Conditions”, et al 2024
- “Do Not Worry If You Do Not Have Data: Building Pretrained Language Models Using Translationese”, et al 2024
- “CLLMs: Consistency Large Language Models”, et al 2024
- “Bridging the Gap: Sketch to Color Diffusion Model With Semantic Prompt Learning”, et al 2024
- “Improving Text Embeddings With Large Language Models”, et al 2023
- “ReST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent”, et al 2023
- “ByteDance Is Secretly Using OpenAI’s Tech to Build a Competitor”, 2023
- “SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration”, et al 2023
- “Beyond Human Data: Scaling Self-Training for Problem-Solving With Language Models (ReSTEM)”, et al 2023
- “Generative Models: What Do They Know? Do They Know Things? Let’s Find Out!”, et al 2023
- “Efficient Transformer Knowledge Distillation: A Performance Review”, et al 2023
- “Implicit Chain-Of-Thought Reasoning via Knowledge Distillation”, et al 2023
- “Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling”, et al 2023
- “HyperFields: Towards Zero-Shot Generation of NeRFs from Text”, et al 2023
- “Polynomial Time Cryptanalytic Extraction of Neural Network Models”, et al 2023
- “OSD: Online Speculative Decoding”, et al 2023
- “ReST: Reinforced Self-Training (ReST) for Language Modeling”, et al 2023
- “Composable Function-Preserving Expansions for Transformer Architectures”, 2023
- “Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, et al 2023
- “Explaining Competitive-Level Programming Solutions Using LLMs”, et al 2023
- “GKD: Generalized Knowledge Distillation for Auto-Regressive Sequence Models”, et al 2023
- “WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia”, et al 2023
- “VanillaNet: the Power of Minimalism in Deep Learning”, et al 2023
- “Mimetic Initialization of Self-Attention Layers”, 2023
- “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”, 2023
- “Dr. LLaMa: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation”, et al 2023
- “Distilling Step-By-Step! Outperforming Larger Language Models With Less Training Data and Smaller Model Sizes”, et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, et al 2023
- “Learning Agile Soccer Skills for a Bipedal Robot With Deep Reinforcement Learning”, et al 2023
- “A Cookbook of Self-Supervised Learning”, et al 2023
- “KD-DLGAN: Data Limited Image Generation via Knowledge Distillation”, et al 2023
- “TRACT: Denoising Diffusion Models With Transitive Closure Time-Distillation”, et al 2023
- “Learning Humanoid Locomotion With Transformers”, et al 2023
- “Consistency Models”, et al 2023
- “ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics”, et al 2023
- “Scaling Vision Transformers to 22 Billion Parameters”, et al 2023
- “BMT: Binarized Neural Machine Translation”, et al 2023
- “Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, 2023
- “TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models”, et al 2023
- “Sparse Upcycling: Training Mixture-Of-Experts from Dense Checkpoints”, et al 2022
- “Solving Math Word Problems With Process & Outcome-Based Feedback”, et al 2022
- “Distilled DeepConsensus: Knowledge Distillation for Fast and Accurate DNA Sequence Correction”, et al 2022
- “MaskDistill: A Unified View of Masked Image Modeling”, 2022
- “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, et al 2022
- “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, et al 2022
- “EDiff-I: Text-To-Image Diffusion Models With an Ensemble of Expert Denoisers”, et al 2022
- “Fast DistilBERT on CPUs”, et al 2022
- “Large Language Models Can Self-Improve”, et al 2022
- “Exclusive Supermask Subnetwork Training for Continual Learning”, 2022
- “The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes”, et al 2022
- “On Distillation of Guided Diffusion Models”, et al 2022
- “Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, et al 2022
- “Omnigrok: Grokking Beyond Algorithmic Data”, et al 2022
- “Human-Level Atari 200× Faster”, et al 2022
- “On the Effectiveness of Compact Biomedical Transformers (✱BioBERT)”, et al 2022
- “Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members”, et al 2022
- “Re2G: Retrieve, Rerank, Generate”, et al 2022
- “Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, Fitz et al 2022
- “SBERT Studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features”, 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, et al 2022
- “Dataset Condensation via Efficient Synthetic-Data Parameterization”, et al 2022
- “UViM: A Unified Modeling Approach for Vision With Learned Guiding Codes”, et al 2022
- “Dialog Inpainting: Turning Documents into Dialogues”, et al 2022
- “Solving ImageNet: a Unified Scheme for Training Any Backbone to Top Results”, et al 2022
- “STaR: Bootstrapping Reasoning With Reasoning”, et al 2022
- “Knowledge Distillation: Bad Models Can Be Good Role Models”, et al 2022
- “PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression”, et al 2022
- “Self-Distilled StyleGAN: Towards Generation from Internet Photos”, et al 2022
- “AutoDistil: Few-Shot Task-Agnostic Neural Architecture Search for Distilling Large Language Models”, et al 2022
- “DeepSpeed-MoE: Advancing Mixture-Of-Experts Inference and Training to Power Next-Generation AI Scale”, et al 2022
- “Microdosing: Knowledge Distillation for GAN Based Compression”, et al 2022
- “ERNIE 3.0 Titan: Exploring Larger-Scale Knowledge Enhanced Pre-Training for Language Understanding and Generation”, et al 2021
- “Amortized Noisy Channel Neural Machine Translation”, et al 2021
- “Causal Distillation for Language Models”, et al 2021
- “Extrapolating from a Single Image to a Thousand Classes Using Distillation”, 2021
- “Prune Once for All: Sparse Pre-Trained Language Models”, et al 2021
- “Training Verifiers to Solve Math Word Problems”, et al 2021
- “Wav2CLIP: Learning Robust Audio Representations From CLIP”, et al 2021
- “When in Doubt, Summon the Titans: Efficient Inference With Large Models”, et al 2021
- “Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora”, et al 2021
- “Symbolic Knowledge Distillation: from General Language Models to Commonsense Models”, et al 2021
- “Language Modeling via Learning to Rank”, et al 2021
- “Beyond Pick-And-Place: Tackling Robotic Stacking of Diverse Shapes”, et al 2021
- “Unsupervised Neural Machine Translation With Generative Language Models Only”, et al 2021
- “OTTER: Data Efficient Language-Supervised Zero-Shot Recognition With Optimal Transport Distillation”, et al 2021
- “Progressive Distillation for Fast Sampling of Diffusion Models”, 2021
- “On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis”, et al 2021
- “ZSD-YOLO: Zero-Shot YOLO Detection Using Vision-Language Knowledge Distillation”, 2021
- “Beyond Distillation: Task-Level Mixture-Of-Experts (TaskMoE) for Efficient Inference”, et al 2021
- “SPLADE V2: Sparse Lexical and Expansion Model for Information Retrieval”, et al 2021
- “KroneckerBERT: Learning Kronecker Decomposition for Pre-Trained Language Models via Knowledge Distillation”, et al 2021
- “Multi-Task Self-Training for Learning General Representations”, et al 2021
- “Dataset Distillation With Infinitely Wide Convolutional Networks”, et al 2021
- “Knowledge-Adaptation Priors”, 2021
- “Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better”, 2021
- “Knowledge Distillation: A Good Teacher Is Patient and Consistent”, et al 2021
- “ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training”, et al 2021
- “DINO: Emerging Properties in Self-Supervised Vision Transformers”, et al 2021
- “Zero-Shot Detection via Vision and Language Knowledge Distillation”, et al 2021
- “Data-Efficient Language-Supervised Zero-Shot Learning With Self-Distillation”, et al 2021
- “ALD: Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation”, 2021
- “KiloNeRF: Speeding up Neural Radiance Fields With Thousands of Tiny MLPs”, et al 2021
- “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, 2021
- “Distilling Large Language Models into Tiny and Effective Students Using PQRNN”, et al 2021
- “Training Data-Efficient Image Transformers & Distillation through Attention”, et al 2020
- “Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning”, Allen-2020
- “Towards Playing Full MOBA Games With Deep Reinforcement Learning”, et al 2020
- “A Primer in BERTology: What We Know about How BERT Works”, et al 2020
- “Dataset Meta-Learning from Kernel Ridge-Regression”, et al 2020
- “TernaryBERT: Distillation-Aware Ultra-Low Bit BERT”, et al 2020
- “SimCLRv2: Big Self-Supervised Models Are Strong Semi-Supervised Learners”, et al 2020
- “Movement Pruning: Adaptive Sparsity by Fine-Tuning”, et al 2020
- “General Purpose Text Embeddings from Pre-Trained Language Models for Scalable Inference”, et al 2020
- “Cryptanalytic Extraction of Neural Network Models”, et al 2020
- “MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers”, et al 2020
- “Towards a Conversational Agent That Can Chat About…Anything”, 2020
- “Understanding the Generalization of ‘Lottery Tickets’ in Neural Networks”, 2019
- “Self-Training With Noisy Student Improves ImageNet Classification”, et al 2019
- “On Warm-Starting Neural Network Training”, 2019
- “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter”, et al 2019
- “TinyBERT: Distilling BERT for Natural Language Understanding”, et al 2019
- “Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, 2019
- “Well-Read Students Learn Better: On the Importance of Pre-Training Compact Models”, et al 2019
- “ICML 2019 Notes”, 2019
- “NoGAN: Decrappification, DeOldification, and Super Resolution”, et al 2019
- “Mask-Predict: Parallel Decoding of Conditional Masked Language Models”, et al 2019
- “Distilling Policy Distillation”, et al 2019
- “Compressing GANs Using Knowledge Distillation”, et al 2019
- “Neural Probabilistic Motor Primitives for Humanoid Control”, et al 2018
- “Dataset Distillation”, et al 2018
- “Exploration by Random Network Distillation”, et al 2018
- “OCD: Optimal Completion Distillation for Sequence Learning”, et al 2018
- “Network Recasting: A Universal Method for Network Architecture Transformation”, et al 2018
- “ClariNet: Parallel Wave Generation in End-To-End Text-To-Speech”, et al 2018
- “Self-Net: Lifelong Learning via Continual Self-Modeling”, et al 2018
- “Self-Distillation: Born Again Neural Networks”, et al 2018
- “Kickstarting Deep Reinforcement Learning”, et al 2018
- “Faster Gaze Prediction With Dense Networks and Fisher Pruning”, et al 2018
- “Parallel WaveNet: Fast High-Fidelity Speech Synthesis”, et al 2017
- “Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN”, et al 2017
- “Policy Optimization by Genetic Distillation”, 2017
- “N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning”, et al 2017
- “Training Shallow and Thin Networks for Acceleration via Knowledge Distillation With Conditional Adversarial Networks”, et al 2017
- “Distral: Robust Multitask Reinforcement Learning”, et al 2017
- “Biased Importance Sampling for Deep Neural Network Training”, 2017
- “Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer”, 2016
- “FractalNet: Ultra-Deep Neural Networks without Residuals”, et al 2016
- “Do Deep Convolutional Nets Really Need to Be Deep and Convolutional?”, et al 2016
- “Face Model Compression by Distilling Knowledge from Neurons”, et al 2016
- “Policy Distillation”, et al 2015
- “Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning”, et al 2015
- “Net2Net: Accelerating Learning via Knowledge Transfer”, et al 2015
- “Bayesian Dark Knowledge”, et al 2015
- “Distilling the Knowledge in a Neural Network”, et al 2015
- “FitNets: Hints for Thin Deep Nets”, et al 2014
- “Do Deep Nets Really Need to Be Deep?”, 2013
- “Model Compression”, 2006
- “Learning Complex, Extended Sequences Using the Principle of History Compression”, 1992
- “Dota 2 With Large Scale Deep Reinforcement Learning § Pg11”, 2024 (page 11 org openai)
- “Google DeepMind’s Grandmaster-Level Chess Without Search”
- “From Vision to Language: Semi-Supervised Learning in Action…at Scale”
- Wikipedia
- Miscellaneous
- Bibliography