“‘Transformer’ Tag”,2019-12-16 ():
![]()
Bibliography for tag
ai/nn/transformer, most recent first: 52 related tags, 387 annotations, & 37 links (parent).
- See Also
- Links
- “Gemma 2: Improving Open Language Models at a Practical Size”, et al 2024
- “Investigating the Ability of LLMs to Recognize Their Own Writing”, 2024
- “Revealing Fine-Grained Values and Opinions in Large Language Models”, et al 2024
- “BERTs Are Generative In-Context Learners”, 2024
- “Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks”, et al 2024
- “Grokfast: Accelerated Grokking by Amplifying Slow Gradients”, et al 2024
- “Not All Language Model Features Are Linear”, et al 2024
- “You Only Cache Once: Decoder-Decoder Architectures for Language Models”, et al 2024
- “Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge”, et al 2024
- “Chinchilla Scaling: A Replication Attempt”, et al 2024
- “Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?”, et al 2024
- “Conformer-1: Robust ASR via Large-Scale Semi-Supervised Bootstrapping”, et al 2024
- “MiniCPM: Unveiling the Potential of Small Language Models With Scalable Training Strategies”, et al 2024
- “Language Models Accurately Infer Correlations between Psychological Items and Scales from Text Alone”, 2024
- “Privacy Backdoors: Stealing Data With Corrupted Pretrained Models”, Feng & 2024
- “Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs”, 2024
- “A Study in Dataset Pruning for Image Super-Resolution”, et al 2024
- “AI and Memory Wall”, et al 2024
- “Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey”, et al 2024
- “Inflection-2.5: Meet the World’s Best Personal AI”, 2024
- “LTE: Training Neural Networks from Scratch With Parallel Low-Rank Adapters”, et al 2024
- “Beyond A✱: Better Planning With Transformers via Search Dynamics Bootstrapping (Searchformer)”, et al 2024
- “KARL: Knowledge-Aware Retrieval and Representations Aid Retention and Learning in Students”, et al 2024
- “Do Llamas Work in English? On the Latent Language of Multilingual Transformers”, et al 2024
- “DE-COP: Detecting Copyrighted Content in Language Models Training Data”, et al 2024
- “Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift”, et al 2024
- “The Manga Whisperer: Automatically Generating Transcriptions for Comics”, 2024
- “A Philosophical Introduction to Language Models—Part I: Continuity With Classic Debates”, 2024
- “Solving Olympiad Geometry without Human Demonstrations”, et al 2024
- “Real-Time AI & The Future of AI Hardware”, 2023
- “Seamless: Multilingual Expressive and Streaming Speech Translation”, et al 2023
- “Scaling Transformer Neural Networks for Skillful and Reliable Medium-Range Weather Forecasting”, et al 2023
- “The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning”, et al 2023
- “GIVT: Generative Infinite-Vocabulary Transformers”, et al 2023
- “Sequential Modeling Enables Scalable Learning for Large Vision Models”, et al 2023
- “DiLoCo: Distributed Low-Communication Training of Language Models”, et al 2023
- “CogVLM: Visual Expert for Pretrained Language Models”, et al 2023
- “GLaMM: Pixel Grounding Large Multimodal Model”, et al 2023
- “Don’t Make Your LLM an Evaluation Benchmark Cheater”, et al 2023
- “ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models”, et al 2023
- “EELBERT: Tiny Models through Dynamic Embeddings”, et al 2023
- “LLM-FP4: 4-Bit Floating-Point Quantized Transformers”, et al 2023
- “Will Releasing the Weights of Large Language Models Grant Widespread Access to Pandemic Agents?”, et al 2023
- “Model Merging by Uncertainty-Based Gradient Matching”, et al 2023
- “To Grok or Not to Grok: Disentangling Generalization and Memorization on Corrupted Algorithmic Datasets”, et al 2023
- “Sparse Universal Transformer”, et al 2023
- “Sheared LLaMA: Accelerating Language Model Pre-Training via Structured Pruning”, et al 2023
- “Language Models Represent Space and Time”, 2023
- “DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation”, et al 2023
- “Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions”, et al 2023
- “Demystifying RCE Vulnerabilities in LLM-Integrated Apps”, et al 2023
- “A Pooled Cell Painting CRISPR Screening Platform Enables de Novo Inference of Gene Function by Self-Supervised Deep Learning”, et al 2023
- “Nougat: Neural Optical Understanding for Academic Documents”, et al 2023
- “SeamlessM4T: Massively Multilingual & Multimodal Machine Translation”, et al 2023
- “Predicting Brain Activity Using Transformers”, et al 2023
- “Copy Is All You Need”, et al 2023
- “HEADLINES: A Massive Scale Semantic Similarity Dataset of Historical English”, 2023
- “Expanding the Methodological Toolbox: Machine-Based Item Desirability Ratings As an Alternative to Human-Based Ratings”, 2023
- “OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”, et al 2023
- “RGD: Stochastic Re-Weighted Gradient Descent via Distributionally Robust Optimization”, et al 2023
- “SequenceMatch: Imitation Learning for Autoregressive Sequence Modeling With Backtracking”, 2023
- “Using Sequences of Life-Events to Predict Human Lives”, et al 2023
- “Binary and Ternary Natural Language Generation”, et al 2023
- “AWQ: Activation-Aware Weight Quantization for LLM Compression and Acceleration”, et al 2023
- “The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora With Web Data, and Web Data Only”, et al 2023
- “Learning Transformer Programs”, et al 2023
- “FERMAT: An Alternative to Accuracy for Numerical Reasoning”, 2023
- “Translatotron 3: Speech to Speech Translation With Monolingual Data”, et al 2023
- “Deep Learning Based Forecasting: a Case Study from the Online Fashion Industry”, et al 2023
- “Scaling Laws for Language Encoding Models in FMRI”, et al 2023
- “DarkBERT: A Language Model for the Dark Side of the Internet”, et al 2023
- “Mitigating Lies in Vision-Language Models”, et al 2023
- “VendorLink: An NLP Approach for Identifying & Linking Vendor Migrants & Potential Aliases on Darknet Markets”, et al 2023
- “Visual Instruction Tuning”, et al 2023
- “Segment Anything”, et al 2023
- “A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision”, et al 2023
- “When and How Artificial Intelligence Augments Employee Creativity”, et al 2023
- “Trained on 100 Million Words and Still in Shape: BERT Meets British National Corpus”, et al 2023
- “Mitigating YouTube Recommendation Polarity Using BERT and K-Means Clustering”, et al 2023
- “Model Scale versus Domain Knowledge in Statistical Forecasting of Chaotic Systems”, 2023
- “Tag2Text: Guiding Vision-Language Model via Image Tagging”, et al 2023
- “The Man of Your Dreams For $300, Replika Sells an AI Companion Who Will Never Die, Argue, or Cheat—Until His Algorithm Is Updated”, Singh-2023
- “Towards Democratizing Joint-Embedding Self-Supervised Learning”, et al 2023
- “MUX-PLMs: Pre-Training Language Models With Data Multiplexing”, et al 2023
- “Optical Transformers”, et al 2023
- “Scaling Vision Transformers to 22 Billion Parameters”, et al 2023
- “BMT: Binarized Neural Machine Translation”, et al 2023
- “V1T: Large-Scale Mouse V1 Response Prediction Using a Vision Transformer”, et al 2023
- “The BabyLM Challenge: Sample-Efficient Pretraining on a Developmentally Plausible Corpus”, et al 2023
- “SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient”, et al 2023
- “XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, et al 2023
- “ClimaX: A Foundation Model for Weather and Climate”, et al 2023
- “DataMUX: Data Multiplexing for Neural Networks”, et al 2023
- “Progress Measures for Grokking via Mechanistic Interpretability”, et al 2023
- “Scaling Laws for Generative Mixed-Modal Language Models”, et al 2023
- “Vision Transformers Are Good Mask Auto-Labelers”, et al 2023
- “Why Do Nearest Neighbor Language Models Work?”, et al 2023
- “Cramming: Training a Language Model on a Single GPU in One Day”, 2022
- “Less Is More: Parameter-Free Text Classification With Gzip”, et al 2022
- “NBC-Softmax: Darkweb Author Fingerprinting and Migration Tracking”, et al 2022
- “What Do Vision Transformers Learn? A Visual Exploration”, et al 2022
- “POM: A Principal Odor Map Unifies Diverse Tasks in Human Olfactory Perception”, et al 2022
- “MAGVIT: Masked Generative Video Transformer”, et al 2022
- “VindLU: A Recipe for Effective Video-And-Language Pretraining”, et al 2022
- “Text Embeddings by Weakly-Supervised Contrastive Pre-Training”, et al 2022
- “Discovering Latent Knowledge in Language Models Without Supervision”, et al 2022
- “NPM: Nonparametric Masked Language Modeling”, et al 2022
- “BARTSmiles: Generative Masked Language Models for Molecular Representations”, et al 2022
- “RGB No More: Minimally-Decoded JPEG Vision Transformers”, 2022
- “Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models”, et al 2022
- “A Deep Learning and Digital Archaeology Approach for Mosquito Repellent Discovery”, et al 2022
- “GENIUS: Sketch-Based Language Model Pre-Training via Extreme and Selective Masking for Text Generation and Augmentation”, et al 2022
- “UniSumm: Unified Few-Shot Summarization With Multi-Task Pre-Training and Prefix-Tuning”, et al 2022
- “Uni-Perceiver V2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks”, et al 2022
- “Distilled DeepConsensus: Knowledge Distillation for Fast and Accurate DNA Sequence Correction”, et al 2022
- “Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, et al 2022
- “OneFormer: One Transformer to Rule Universal Image Segmentation”, et al 2022
- “Characterizing Intrinsic Compositionality in Transformers With Tree Projections”, et al 2022
- “Fast DistilBERT on CPUs”, et al 2022
- “n-Gram Is Back: Residual Learning of Neural Text Generation With n-Gram Language Model”, et al 2022
- “Same Pre-Training Loss, Better Downstream: Implicit Bias Matters for Language Models”, et al 2022
- “The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers”, et al 2022
- “Noise-Robust De-Duplication at Scale”, et al 2022
- “Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, et al 2022
- “Improving Sample Quality of Diffusion Models Using Self-Attention Guidance”, et al 2022
- “Semantic Scene Descriptions As an Objective of Human Vision”, et al 2022
- “SetFit: Efficient Few-Shot Learning Without Prompts”, et al 2022
- “A Generalist Neural Algorithmic Learner”, et al 2022
- “Machine Reading, Fast and Slow: When Do Models “Understand” Language?”, et al 2022
- “On the Effectiveness of Compact Biomedical Transformers (✱BioBERT)”, et al 2022
- “Analyzing Transformers in Embedding Space”, et al 2022
- “ASR2K: Speech Recognition for Around 2,000 Languages without Audio”, et al 2022
- “MeloForm: Generating Melody With Musical Form Based on Expert Systems and Neural Networks”, et al 2022
- “CorpusBrain: Pre-Train a Generative Retrieval Model for Knowledge-Intensive Language Tasks”, et al 2022
- “PatchDropout: Economizing Vision Transformers Using Patch Dropout”, et al 2022
- “Why Do Tree-Based Models Still Outperform Deep Learning on Tabular Data?”, et al 2022
- “Re2G: Retrieve, Rerank, Generate”, et al 2022
- “Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling”, 2022
- “TabPFN: Meta-Learning a Real-Time Tabular AutoML Method For Small Data”, et al 2022
- “Neural Networks and the Chomsky Hierarchy”, et al 2022
- “Do Loyal Users Enjoy Better Recommendations? Understanding Recommender Accuracy from a Time Perspective”, et al 2022
- “Transfer Learning With Deep Tabular Models”, et al 2022
- “BertNet: Harvesting Knowledge Graphs from Pretrained Language Models”, et al 2022
- “ProGen2: Exploring the Boundaries of Protein Language Models”, et al 2022
- “SBERT Studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features”, 2022
- “RHO-LOSS: Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt”, et al 2022
- “LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling”, et al 2022
- “Language Models Are General-Purpose Interfaces”, et al 2022
- “Uni-Perceiver-MoE: Learning Sparse Generalist Models With Conditional MoEs”, et al 2022
- “Reconstructing the Cascade of Language Processing in the Brain Using the Internal Computations of a Transformer-Based Language Model”, et al 2022
- “A Neural Corpus Indexer for Document Retrieval”, et al 2022
- “XTC: Extreme Compression for Pre-Trained Transformers Made Simple and Efficient”, et al 2022
- “Toward a Realistic Model of Speech Processing in the Brain With Self-Supervised Learning”, et al 2022
- “Text2Human: Text-Driven Controllable Human Image Generation”, et al 2022
- “Anime Character Recognition Using Intermediate Features Aggregation”, et al 2022
- “Towards Learning Universal Hyperparameter Optimizers With Transformers”, et al 2022
- “FLEURS: Few-Shot Learning Evaluation of Universal Representations of Speech”, et al 2022
- “HTPS: HyperTree Proof Search for Neural Theorem Proving”, et al 2022
- “On the Paradox of Learning to Reason from Data”, et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, et al 2022
- “UViM: A Unified Modeling Approach for Vision With Learned Guiding Codes”, et al 2022
- “Tradformer: A Transformer Model of Traditional Music Transcriptions”, 2022
- “Continual Pre-Training Mitigates Forgetting in Language and Vision”, et al 2022
- “PLAID: An Efficient Engine for Late Interaction Retrieval”, et al 2022
- “Few-Shot Parameter-Efficient Fine-Tuning Is Better and Cheaper Than In-Context Learning”, et al 2022
- “SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, et al 2022
- “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, et al 2022
- “A Challenging Benchmark of Anime Style Recognition”, et al 2022
- “Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, et al 2022
- “Masked Siamese Networks for Label-Efficient Learning”, et al 2022
- “DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning”, et al 2022
- “Language Models That Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion”, et al 2022
- “On Embeddings for Numerical Features in Tabular Deep Learning”, et al 2022
- “In-Context Learning and Induction Heads”, et al 2022
- “LiteTransformerSearch: Training-Free Neural Architecture Search for Efficient Language Models”, et al 2022
- “Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, et al 2022
- “OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-To-Sequence Learning Framework”, et al 2022
- “TACTiS: Transformer-Attentional Copulas for Time Series”, et al 2022
- “AutoDistil: Few-Shot Task-Agnostic Neural Architecture Search for Distilling Large Language Models”, et al 2022
- “FIGARO: Generating Symbolic Music With Fine-Grained Artistic Control”, et al 2022
- “Robust Contrastive Learning against Noisy Views”, et al 2022
- “HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning”, et al 2022
- “A Mathematical Framework for Transformer Circuits”, et al 2021
- “PFNs: Transformers Can Do Bayesian Inference”, et al 2021
- “XGLM: Few-Shot Learning With Multilingual Language Models”, et al 2021
- “An Empirical Investigation of the Role of Pre-Training in Lifelong Learning”, et al 2021
- “AI Improvements in Chemical Calculations”, 2021
- “You Only Need One Model for Open-Domain Question Answering”, et al 2021
- “Human Parity on CommonsenseQA: Augmenting Self-Attention With External Attention”, et al 2021
- “ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction”, et al 2021
- “Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks”, et al 2021
- “Inducing Causal Structure for Interpretable Neural Networks (IIT)”, et al 2021
- “OCR-Free Document Understanding Transformer”, et al 2021
- “FQ-ViT: Fully Quantized Vision Transformer without Retraining”, et al 2021
- “Semi-Supervised Music Tagging Transformer”, et al 2021
- “LEMON: Scaling Up Vision-Language Pre-Training for Image Captioning”, et al 2021
- “UNICORN: Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling”, et al 2021
- “Compositional Transformers for Scene Generation”, 2021
- “It’s About Time: Analog Clock Reading in the Wild”, et al 2021
- “XLS-R: Self-Supervised Cross-Lingual Speech Representation Learning at Scale”, et al 2021
- “A Survey of Visual Transformers”, et al 2021
- “Improving Visual Quality of Image Synthesis by A Token-Based Generator With Transformers”, et al 2021
- “The Efficiency Misnomer”, et al 2021
- “STransGAN: An Empirical Study on Transformer in GANs”, et al 2021
- “Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora”, et al 2021
- “The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail”, 2021
- “Palette: Image-To-Image Diffusion Models”, et al 2021
- “Transformers Are Meta-Reinforcement Learners”, 2021
- “Autoregressive Latent Video Prediction With High-Fidelity Image Generator”, et al 2021
- “Skill Induction and Planning With Latent Language”, et al 2021
- “Text2Brain: Synthesis of Brain Activation Maps from Free-Form Text Query”, et al 2021
- “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”, et al 2021
- “BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition”, et al 2021
- “TrOCR: Transformer-Based Optical Character Recognition With Pre-Trained Models”, et al 2021
- “MeLT: Message-Level Transformer With Masked Document Representations As Pre-Training for Stance Detection”, et al 2021
- “KroneckerBERT: Learning Kronecker Decomposition for Pre-Trained Language Models via Knowledge Distillation”, et al 2021
- “Block Pruning For Faster Transformers”, et al 2021
- “The Sensory Neuron As a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning”, 2021
- “DeepConsensus: Gap-Aware Sequence Transformers for Sequence Correction”, et al 2021
- “A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP”, et al 2021
- “Data and Parameter Scaling Laws for Neural Machine Translation”, et al 2021
- “ImageBART: Bidirectional Context With Multinomial Diffusion for Autoregressive Image Synthesis”, et al 2021
- “Modeling Protein Using Large-Scale Pretrain Language Model”, et al 2021
- “Billion-Scale Pretraining With Vision Transformers for Multi-Task Visual Representations”, et al 2021
- “EVA: An Open-Domain Chinese Dialogue System With Large-Scale Generative Pre-Training”, et al 2021
- “Internet-Augmented Dialogue Generation”, et al 2021
- “HTLM: Hyper-Text Pre-Training and Prompting of Language Models”, et al 2021
- “SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking”, et al 2021
- “ViTGAN: Training GANs With Vision Transformers”, et al 2021
- “ARM-Net: Adaptive Relation Modeling Network for Structured Data”, et al 2021
- “SCARF: Self-Supervised Contrastive Learning Using Random Feature Corruption”, et al 2021
- “Charformer: Fast Character Transformers via Gradient-Based Subword Tokenization”, et al 2021
- “BitFit: Simple Parameter-Efficient Fine-Tuning for Transformer-Based Masked Language-Models”, et al 2021
- “Revisiting the Calibration of Modern Neural Networks”, et al 2021
- “Scaling Laws for Acoustic Models”, 2021
- “CoAtNet: Marrying Convolution and Attention for All Data Sizes”, et al 2021
- “Chasing Sparsity in Vision Transformers: An End-To-End Exploration”, et al 2021
- “Tabular Data: Deep Learning Is Not All You Need”, Shwartz-2021
- “Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning”, et al 2021
- “Exploring Transfer Learning Techniques for Named Entity Recognition in Noisy User-Generated Text”, 2021
- “SegFormer: Simple and Efficient Design for Semantic Segmentation With Transformers”, et al 2021
- “Maximizing 3-D Parallelism in Distributed Training for Huge Neural Networks”, et al 2021
- “One4all User Representation for Recommender Systems in E-Commerce”, et al 2021
- “QASPER: A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers”, et al 2021
- “MathBERT: A Pre-Trained Model for Mathematical Formula Understanding”, et al 2021
- “MDETR—Modulated Detection for End-To-End Multi-Modal Understanding”, et al 2021
- “XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond”, et al 2021
- “[Ali Released PLUG: 27 Billion Parameters, the Largest Pre-Trained Language Model in the Chinese Community]”, 2021
- “SimCSE: Simple Contrastive Learning of Sentence Embeddings”, et al 2021
- “Robust Open-Vocabulary Translation from Visual Text Representations”, et al 2021
- “Memorization versus Generalization in Pre-Trained Language Models”, et al 2021
- “Retrieval Augmentation Reduces Hallucination in Conversation”, et al 2021
- “Gradient-Based Adversarial Attacks against Text Transformers”, et al 2021
- “TSDAE: Using Transformer-Based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning”, et al 2021
- “Machine Translation Decoding beyond Beam Search”, et al 2021
- “An Empirical Study of Training Self-Supervised Vision Transformers”, et al 2021
- “ChinAI #137: Year 3 of ChinAI: Reflections on the Newsworthiness of Machine Translation”, 2021
- “GPV-1: Towards General Purpose Vision Systems”, et al 2021
- “DeepViT: Towards Deeper Vision Transformer”, et al 2021
- “ConViT: Improving Vision Transformers With Soft Convolutional Inductive Biases”, d’ et al 2021
- “Get Your Vitamin C! Robust Fact Verification With Contrastive Evidence (VitaminC)”, et al 2021
- “Learning from Videos to Understand the World”, et al 2021
- “Are NLP Models Really Able to Solve Simple Math Word Problems?”, et al 2021
- “CANINE: Pre-Training an Efficient Tokenization-Free Encoder for Language Representation”, et al 2021
- “TransGAN: Two Transformers Can Make One Strong GAN”, et al 2021
- “Baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotemporal Modeling”, 2021
- “ViLT: Vision-And-Language Transformer Without Convolution or Region Supervision”, et al 2021
- “Video Transformer Network”, et al 2021
- “Tokens-To-Token ViT: Training Vision Transformers from Scratch on ImageNet”, et al 2021
- “BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn from Massive Amounts of EEG Data”, et al 2021
- “Bottleneck Transformers for Visual Recognition”, et al 2021
- “DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition”, et al 2021
- “UPDeT: Universal Multi-Agent Reinforcement Learning via Policy Decoupling With Transformers”, et al 2021
- “MSR-VTT: A Large Video Description Dataset for Bridging Video and Language”, et al 2021
- “XMC-GAN: Cross-Modal Contrastive Learning for Text-To-Image Generation”, et al 2021
- “Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words”, et al 2021
- “Training Data-Efficient Image Transformers & Distillation through Attention”, et al 2020
- “VQ-GAN: Taming Transformers for High-Resolution Image Synthesis”, et al 2020
- “Object-Based Attention for Spatio-Temporal Reasoning: Outperforming Neuro-Symbolic Models With Flexible Distributed Architectures”, et al 2020
- “Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences”, et al 2020
- “Progressively Stacking 2.0: A Multi-Stage Layerwise Training Method for BERT Training Speedup”, et al 2020
- “TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game”, et al 2020
- “A Recurrent Vision-And-Language BERT for Navigation”, et al 2020
- “A Primer in BERTology: What We Know about How BERT Works”, et al 2020
- “CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, et al 2020
- “TernaryBERT: Distillation-Aware Ultra-Low Bit BERT”, et al 2020
- “Weird AI Yankovic: Generating Parody Lyrics”, 2020
- “It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners”, Schick & 2020
- “DeepSpeed: Extreme-Scale Model Training for Everyone”, et al 2020
- “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing”, et al 2020
- “CoVoST 2 and Massively Multilingual Speech-To-Text Translation”, et al 2020
- “Modern Hopfield Networks and Attention for Immune Repertoire Classification”, et al 2020
- “Hopfield Networks Is All You Need”, et al 2020
- “Can Neural Networks Acquire a Structural Bias from Raw Linguistic Data?”, 2020
- “DeepSinger: Singing Voice Synthesis With Data Mined From the Web”, et al 2020
- “Data Movement Is All You Need: A Case Study on Optimizing Transformers”, et al 2020
- “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations”, et al 2020
- “PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training”, et al 2020
- “Learning to Learn With Feedback and Local Plasticity”, Lindsey & Litwin-2020
- “Improving GAN Training With Probability Ratio Clipping and Sample Reweighting”, et al 2020
- “DeBERTa: Decoding-Enhanced BERT With Disentangled Attention”, et al 2020
- “DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations”, et al 2020
- “DETR: End-To-End Object Detection With Transformers”, et al 2020
- “Open-Retrieval Conversational Question Answering”, et al 2020
- “TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data”, et al 2020
- “ForecastQA: A Question Answering Challenge for Event Forecasting With Temporal Text Data”, et al 2020
- “VLN-BERT: Improving Vision-And-Language Navigation With Image-Text Pairs from the Web”, et al 2020
- “Blender: A State-Of-The-Art Open Source Chatbot”, et al 2020
- “General Purpose Text Embeddings from Pre-Trained Language Models for Scalable Inference”, et al 2020
- “Recipes for Building an Open-Domain Chatbot”, et al 2020
- “Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks”, et al 2020
- “On the Effect of Dropping Layers of Pre-Trained Transformer Models”, et al 2020
- “Rapformer: Conditional Rap Lyrics Generation With Denoising Autoencoders”, et al 2020
- “TAPAS: Weakly Supervised Table Parsing via Pre-Training”, et al 2020
- “A Hundred Visions and Revisions”, 2020
- “Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited”, et al 2020
- “AraBERT: Transformer-Based Model for Arabic Language Understanding”, et al 2020
- “MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers”, et al 2020
- “GNS: Learning to Simulate Complex Physics With Graph Networks”, Sanchez- et al 2020
- “Do We Need Zero Training Loss After Achieving Zero Training Error?”, et al 2020
- “Bayesian Deep Learning and a Probabilistic Perspective of Generalization”, 2020
- “Transformers As Soft Reasoners over Language”, et al 2020
- “Towards a Conversational Agent That Can Chat About…Anything”, 2020
- “Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference”, Schick & 2020
- “Improving Transformer Optimization Through Better Initialization”, 2020
- “VIME: Extending the Success of Self-Supervised and Semi-Supervised Learning to Tabular Domain”, et al 2020
- “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”, et al 2019
- “Mastering Complex Control in MOBA Games With Deep Reinforcement Learning”, et al 2019
- “PEGASUS: Pre-Training With Extracted Gap-Sentences for Abstractive Summarization”, et al 2019
- “Encoding Musical Style With Transformer Autoencoders”, et al 2019
- “Deep Double Descent: We Show That the Double Descent Phenomenon Occurs in CNNs, ResNets, and Transformers: Performance First Improves, Then Gets Worse, and Then Improves Again With Increasing Model Size, Data Size, or Training Time”, et al 2019
- “Detecting GAN Generated Errors”, et al 2019
- “SimpleBooks: Long-Term Dependency Book Dataset With Simplified English Vocabulary for Word-Level Language Modeling”, 2019
- “Unsupervised Cross-Lingual Representation Learning at Scale”, et al 2019
- “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter”, et al 2019
- “TinyBERT: Distilling BERT for Natural Language Understanding”, et al 2019
- “Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, et al 2019
- “PubMedQA: A Dataset for Biomedical Research Question Answering”, et al 2019
- “Frustratingly Easy Natural Question Answering”, et al 2019
- “Distributionally Robust Language Modeling”, et al 2019
- “Language Models As Knowledge Bases?”, et al 2019
- “Encode, Tag, Realize: High-Precision Text Editing”, et al 2019
- “Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks”, 2019
- “Well-Read Students Learn Better: On the Importance of Pre-Training Compact Models”, et al 2019
- “TabNet: Attentive Interpretable Tabular Learning”, 2019
- “StructBERT: Incorporating Language Structures into Pre-Training for Deep Language Understanding”, et al 2019
- “What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models”, 2019
- “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, et al 2019
- “Theoretical Limitations of Self-Attention in Neural Sequence Models”, 2019
- “Energy and Policy Considerations for Deep Learning in NLP”, et al 2019
- “Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned”, et al 2019
- “HellaSwag: Can a Machine Really Finish Your Sentence?”, et al 2019
- “UniLM: Unified Language Model Pre-Training for Natural Language Understanding and Generation”, et al 2019
- “MASS: Masked Sequence to Sequence Pre-Training for Language Generation”, et al 2019
- “Mask-Predict: Parallel Decoding of Conditional Masked Language Models”, et al 2019
- “Large Batch Optimization for Deep Learning: Training BERT in 76 Minutes”, et al 2019
- “LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game”, et al 2019
- “Insertion Transformer: Flexible Sequence Generation via Insertion Operations”, et al 2019
- “Adapter: Parameter-Efficient Transfer Learning for NLP”, et al 2019
- “Learning and Evaluating General Linguistic Intelligence”, et al 2019
- “BioBERT: a Pre-Trained Biomedical Language Representation Model for Biomedical Text Mining”, et al 2019
- “Efficient Training of BERT by Progressively Stacking”, et al 2019
- “Bayesian Layers: A Module for Neural Network Uncertainty”, et al 2018
- “Blockwise Parallel Decoding for Deep Autoregressive Models”, et al 2018
- “Object Hallucination in Image Captioning”, et al 2018
- “Self-Attention Generative Adversarial Networks”, et al 2018
- “Universal Sentence Encoder”, et al 2018
- “Self-Attention With Relative Position Representations”, et al 2018
- “Learning Longer-Term Dependencies in RNNs With Auxiliary Losses”, et al 2018
- “Generating Structured Music through Self-Attention”, et al 2018
- “GPipe: Easy Scaling With Micro-Batch Pipeline Parallelism § Pg4”, 2018 (page 4 org google)
- “A Simple Neural Attentive Meta-Learner”, et al 2017
- “Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer”, 2016
- “QRNNs: Quasi-Recurrent Neural Networks”, et al 2016
- “Gaussian Error Linear Units (GELUs)”, 2016
- “Pointer Networks”, et al 2015
- “No Physics? No Problem. AI Weather Forecasting Is Already Making Huge Strides.”
- “Huggingface:
transformersRepo”, 2024- “Transformers in Vision”
- “The Illustrated GPT-2 (Visualizing Transformer Language Models)”
- “The Illustrated Transformer”
- “Autoregressive Long-Context Music Generation With Perceiver AR”
- “The Transformer—Attention Is All You Need.”
- “Understanding BERT Transformer: Attention Isn’t All You Need”, 2024
- “Etched Is Making the Biggest Bet in AI”
- “Was Linguistic A.I. Created by Accident?”
- “Transformers Are a Very Exciting Family of Machine Learning Architectures”, 2024
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography