- See Also
-
Links
- “DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI”, Zhang et al 2023
- “No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models”, Kaddour et al 2023
- “GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models”, Agarwal et al 2023
- “PaLI-X: On Scaling up a Multilingual Vision and Language Model”, Chen et al 2023
- “SoundStorm: Efficient Parallel Audio Generation”, Borsos et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
- “TANGO: Text-to-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023
- “Learning to Compress Prompts With Gist Tokens”, Mu et al 2023
- “BiLD: Big Little Transformer Decoder”, Kim et al 2023
- “Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech With Minimal Supervision”, Kharitonov et al 2023
- “BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models”, Li et al 2023
- “InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
- “Muse: Text-To-Image Generation via Masked Generative Transformers”, Chang et al 2023
- “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Su et al 2022
- “Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor”, Honovich et al 2022
- “ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, Chai et al 2022
- “Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, Komatsuzaki et al 2022
- “Fast Inference from Transformers via Speculative Decoding”, Leviathan et al 2022
- “I Can’t Believe There’s No Images! Learning Visual Tasks Using Only Language Data”, Gu et al 2022
- “TART: Task-aware Retrieval With Instructions”, Asai et al 2022
- “BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, Muennighoff et al 2022
- “EDiff-I: Text-to-Image Diffusion Models With an Ensemble of Expert Denoisers”, Balaji et al 2022
- “ProMoT: Preserving In-Context Learning Ability in Large Language Model Fine-tuning”, Wang et al 2022
- “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Chakrabarty et al 2022
- “FLAN: Scaling Instruction-Finetuned Language Models”, Chung et al 2022
- “Table-To-Text Generation and Pre-training With TabT5”, Andrejczuk et al 2022
- “GLM-130B: An Open Bilingual Pre-trained Model”, Zeng et al 2022
- “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Patel et al 2022
- “FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation”, Hofstätter et al 2022
- “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Chen et al 2022
- “Training a T5 Using Lab-sized Resources”, Ciosici & Derczynski 2022
- “PEER: A Collaborative Language Model”, Schick et al 2022
- “Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization”, He et al 2022
- “RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
- “Forecasting Future World Events With Neural Networks”, Zou et al 2022
- “RST: ReStructured Pre-training”, Yuan & Liu 2022
- “Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, FitzGerald et al 2022
- “Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
- “CT0: Fine-tuned Language Models Are Continual Learners”, Scialom et al 2022
- “EdiT5: Semi-Autoregressive Text-Editing With T5 Warm-Start”, Mallinson et al 2022
- “Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, Saharia et al 2022
- “Automated Crossword Solving”, Wallace et al 2022
- “Unifying Language Learning Paradigms”, Tay et al 2022
- “Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, Wang et al 2022
- “What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, Wang et al 2022
- “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Zhu et al 2022
- “Pathways: Asynchronous Distributed Dataflow for ML”, Barham et al 2022
- “HyperPrompt: Prompt-based Task-Conditioning of Transformers”, He et al 2022
- “UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training”, Khashabi et al 2022
- “Using Natural Language Prompts for Machine Translation”, Garcia & Firat 2022
- “Mixture-of-Experts With Expert Choice Routing”, Zhou et al 2022
- “InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
- “Reasoning Like Program Executors”, Pi et al 2022
- “FRUIT: Faithfully Reflecting Updated Information in Text”, IV et al 2021
- “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Guo et al 2021
- “Large Dual Encoders Are Generalizable Retrievers”, Ni et al 2021
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Rae et al 2021
- “ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, Aribandi et al 2021
- “Fast Model Editing at Scale”, Mitchell et al 2021
- “T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Sanh et al 2021
- “Can Machines Learn Morality? The Delphi Experiment”, Jiang et al 2021
- “LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, Qin & Joty 2021
- “Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, Tay et al 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
- “General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
- “Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, Ni et al 2021
- “Time-Aware Language Models As Temporal Knowledge Bases”, Dhingra et al 2021
- “Implicit Representations of Meaning in Neural Language Models”, Li et al 2021
- “Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Liang et al 2021
- “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
- “Carbon Emissions and Large Neural Network Training”, Patterson et al 2021
- “The Power of Scale for Parameter-Efficient Prompt Tuning”, Lester et al 2021
- “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Lourie et al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
- “VL-T5: Unifying Vision-and-Language Tasks via Text Generation”, Cho et al 2021
- “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Fedus et al 2021
- “MT5: A Massively Multilingual Pre-trained Text-to-text Transformer”, Xue et al 2020
- “TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling”, Riley et al 2020
- “MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
- “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Elnaggar et al 2020
- “Leveraging Passage Retrieval With Generative Models for Open Domain Question Answering”, Izacard & Grave 2020
- “UnifiedQA: Crossing Format Boundaries With a Single QA System”, Khashabi et al 2020
- “TTTTTackling WinoGrande Schemas”, Lin et al 2020
- “How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, Roberts et al 2020
- “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
- “T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Raffel et al 2019
- colinraffel
- “Transformer-VAE for Program Synthesis”
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI”, Zhang et al 2023
“DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI”
“No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models”, Kaddour et al 2023
“No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models”
“GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models”, Agarwal et al 2023
“GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models”
“PaLI-X: On Scaling up a Multilingual Vision and Language Model”, Chen et al 2023
“PaLI-X: On Scaling up a Multilingual Vision and Language Model”
“SoundStorm: Efficient Parallel Audio Generation”, Borsos et al 2023
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”
“TANGO: Text-to-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023
“TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model”
“Learning to Compress Prompts With Gist Tokens”, Mu et al 2023
“BiLD: Big Little Transformer Decoder”, Kim et al 2023
“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech With Minimal Supervision”, Kharitonov et al 2023
“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech with Minimal Supervision”
“BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models”, Li et al 2023
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”
“Muse: Text-To-Image Generation via Masked Generative Transformers”, Chang et al 2023
“Muse: Text-To-Image Generation via Masked Generative Transformers”
“One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Su et al 2022
“One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”
“Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor”, Honovich et al 2022
“Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor”
“ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, Chai et al 2022
“ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”
“Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, Komatsuzaki et al 2022
“Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”
“Fast Inference from Transformers via Speculative Decoding”, Leviathan et al 2022
“I Can’t Believe There’s No Images! Learning Visual Tasks Using Only Language Data”, Gu et al 2022
“I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data”
“TART: Task-aware Retrieval With Instructions”, Asai et al 2022
“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, Muennighoff et al 2022
“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”
“EDiff-I: Text-to-Image Diffusion Models With an Ensemble of Expert Denoisers”, Balaji et al 2022
“eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers”
“ProMoT: Preserving In-Context Learning Ability in Large Language Model Fine-tuning”, Wang et al 2022
“ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning”
“Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Chakrabarty et al 2022
“Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)”
“FLAN: Scaling Instruction-Finetuned Language Models”, Chung et al 2022
“Table-To-Text Generation and Pre-training With TabT5”, Andrejczuk et al 2022
“GLM-130B: An Open Bilingual Pre-trained Model”, Zeng et al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”, Patel et al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”
“FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation”, Hofstätter et al 2022
“FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation”
“PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Chen et al 2022
“Training a T5 Using Lab-sized Resources”, Ciosici & Derczynski 2022
“PEER: A Collaborative Language Model”, Schick et al 2022
“Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization”, He et al 2022
“Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization”
“RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
“Forecasting Future World Events With Neural Networks”, Zou et al 2022
“RST: ReStructured Pre-training”, Yuan & Liu 2022
“Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, FitzGerald et al 2022
“Boosting Search Engines With Interactive Agents”, Ciaramita et al 2022
“CT0: Fine-tuned Language Models Are Continual Learners”, Scialom et al 2022
“EdiT5: Semi-Autoregressive Text-Editing With T5 Warm-Start”, Mallinson et al 2022
“EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start”
“Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, Saharia et al 2022
“Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”
“Automated Crossword Solving”, Wallace et al 2022
“Unifying Language Learning Paradigms”, Tay et al 2022
“Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, Wang et al 2022
“Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, Wang et al 2022
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”
“ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Zhu et al 2022
“ByT5 model for massively multilingual grapheme-to-phoneme conversion”
“Pathways: Asynchronous Distributed Dataflow for ML”, Barham et al 2022
“HyperPrompt: Prompt-based Task-Conditioning of Transformers”, He et al 2022
“HyperPrompt: Prompt-based Task-Conditioning of Transformers”
“UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training”, Khashabi et al 2022
“UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training”
“Using Natural Language Prompts for Machine Translation”, Garcia & Firat 2022
“Mixture-of-Experts With Expert Choice Routing”, Zhou et al 2022
“InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
“InPars: Data Augmentation for Information Retrieval using Large Language Models”
“Reasoning Like Program Executors”, Pi et al 2022
“FRUIT: Faithfully Reflecting Updated Information in Text”, IV et al 2021
“LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Guo et al 2021
“LongT5: Efficient Text-To-Text Transformer for Long Sequences”
“Large Dual Encoders Are Generalizable Retrievers”, Ni et al 2021
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Rae et al 2021
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”
“ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, Aribandi et al 2021
“ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”
“Fast Model Editing at Scale”, Mitchell et al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Sanh et al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”
“Can Machines Learn Morality? The Delphi Experiment”, Jiang et al 2021
“LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, Qin & Joty 2021
“LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”
“Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, Tay et al 2021
“Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
“General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
“Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, Ni et al 2021
“Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”
“Time-Aware Language Models As Temporal Knowledge Bases”, Dhingra et al 2021
“Implicit Representations of Meaning in Neural Language Models”, Li et al 2021
“Implicit Representations of Meaning in Neural Language Models”
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Liang et al 2021
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”
“ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
“ByT5: Towards a token-free future with pre-trained byte-to-byte models”
“Carbon Emissions and Large Neural Network Training”, Patterson et al 2021
“The Power of Scale for Parameter-Efficient Prompt Tuning”, Lester et al 2021
“UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Lourie et al 2021
“UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
“Investigating the Limitations of the Transformers with Simple Arithmetic Tasks”
“VL-T5: Unifying Vision-and-Language Tasks via Text Generation”, Cho et al 2021
“VL-T5: Unifying Vision-and-Language Tasks via Text Generation”
“Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Fedus et al 2021
“Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”
“MT5: A Massively Multilingual Pre-trained Text-to-text Transformer”, Xue et al 2020
“mT5: A massively multilingual pre-trained text-to-text transformer”
“TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling”, Riley et al 2020
“TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling”
“MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
“ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Elnaggar et al 2020
“Leveraging Passage Retrieval With Generative Models for Open Domain Question Answering”, Izacard & Grave 2020
“Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering”
“UnifiedQA: Crossing Format Boundaries With a Single QA System”, Khashabi et al 2020
“UnifiedQA: Crossing Format Boundaries With a Single QA System”
“TTTTTackling WinoGrande Schemas”, Lin et al 2020
“How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, Roberts et al 2020
“How Much Knowledge Can You Pack Into the Parameters of a Language Model?”
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”
“T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Raffel et al 2019
“T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”
colinraffel
“Transformer-VAE for Program Synthesis”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
language-models
multilingual-models
interactive-agents
neural-forecasting
ethics
Miscellaneous
-
/doc/ai/nn/transformer/t5/2022-patel-figure2-mt5fewshotpromptingbootstrapselfdistillationprocess.png
-
/doc/ai/nn/transformer/t5/2021-tay-figure1-t5pretrainingvsfinetuningtransferscaling.png
-
/doc/ai/nn/transformer/t5/2019-raffel-figure6-effectsofdatasetduplicationont5traininglosscurves.png
-
https://colab.research.google.com/drive/1-ROO7L09EupLFLQM-TWgDHa5-FIOdLLh
-
https://github.com/google-research/google-research/tree/master/ul2
-
https://twitter.com/RamaswmySridhar/status/1621870497070981121
-
https://www.forbes.com/sites/rashishrivastava/2023/04/11/writer-generative-ai/
Link Bibliography
-
https://arxiv.org/abs/2307.06440
: “No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models”, Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner -
https://arxiv.org/abs/2305.09636#google
: “SoundStorm: Efficient Parallel Audio Generation”, Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi -
https://arxiv.org/abs/2304.13731
: “TANGO: Text-to-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria -
https://arxiv.org/abs/2304.08467
: “Learning to Compress Prompts With Gist Tokens”, Jesse Mu, Xiang Lisa Li, Noah Goodman -
https://arxiv.org/abs/2301.12597#salesforce
: “BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models”, Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi -
https://arxiv.org/abs/2301.00704#google
: “Muse: Text-To-Image Generation via Masked Generative Transformers”, -
https://arxiv.org/abs/2212.09741
: “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, -
https://arxiv.org/abs/2212.05055#google
: “Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, -
https://arxiv.org/abs/2211.01324#nvidia
: “EDiff-I: Text-to-Image Diffusion Models With an Ensemble of Expert Denoisers”, -
https://arxiv.org/abs/2210.13669
: “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Tuhin Chakrabarty, Vishakh Padmakumar, He He -
https://arxiv.org/abs/2210.11416#google
: “FLAN: Scaling Instruction-Finetuned Language Models”, -
https://arxiv.org/abs/2210.02414#baai
: “GLM-130B: An Open Bilingual Pre-trained Model”, -
https://arxiv.org/abs/2209.14500
: “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch -
https://arxiv.org/abs/2208.11663#facebook
: “PEER: A Collaborative Language Model”, -
https://arxiv.org/abs/2208.09770#microsoft
: “Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization”, -
https://arxiv.org/abs/2206.15474
: “Forecasting Future World Events With Neural Networks”, -
https://arxiv.org/abs/2206.07808#amazon
: “Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, -
https://openreview.net/forum?id=0ZbPmmB61g#google
: “Boosting Search Engines With Interactive Agents”, -
https://arxiv.org/abs/2205.12393
: “CT0: Fine-tuned Language Models Are Continual Learners”, Thomas Scialom, Tuhin Chakrabarty, Smaranda Muresan -
https://arxiv.org/abs/2205.12209#google
: “EdiT5: Semi-Autoregressive Text-Editing With T5 Warm-Start”, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn -
https://arxiv.org/abs/2205.11487#google
: “Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, -
https://arxiv.org/abs/2205.09665#bair
: “Automated Crossword Solving”, Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein -
https://arxiv.org/abs/2205.05131#google
: “Unifying Language Learning Paradigms”, -
https://arxiv.org/abs/2204.07705
: “Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, -
https://arxiv.org/abs/2204.03067
: “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Jian Zhu, Cong Zhang, David Jurgens -
https://arxiv.org/abs/2203.00759
: “HyperPrompt: Prompt-based Task-Conditioning of Transformers”, -
https://arxiv.org/abs/2202.11822#google
: “Using Natural Language Prompts for Machine Translation”, Xavier Garcia, Orhan Firat -
https://arxiv.org/abs/2202.09368#google
: “Mixture-of-Experts With Expert Choice Routing”, -
https://arxiv.org/abs/2201.11473#microsoft
: “Reasoning Like Program Executors”, Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen -
https://arxiv.org/abs/2112.07916#google
: “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang -
https://arxiv.org/abs/2112.07899#google
: “Large Dual Encoders Are Generalizable Retrievers”, -
https://arxiv.org/abs/2112.11446#deepmind
: “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, -
https://arxiv.org/abs/2110.11309
: “Fast Model Editing at Scale”, Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning -
https://arxiv.org/abs/2109.10686#google
: “Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, -
https://arxiv.org/abs/2109.07958
: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Stephanie Lin, Jacob Hilton, Owain Evans -
https://arxiv.org/abs/2109.02593#allen
: “General-Purpose Question-Answering With Macaw”, Oyvind Tafjord, Peter Clark -
https://arxiv.org/abs/2108.08877#google
: “Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang -
https://arxiv.org/abs/2106.00737
: “Implicit Representations of Meaning in Neural Language Models”, Belinda Z. Li, Maxwell Nye, Jacob Andreas -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel -
https://arxiv.org/abs/2104.10350#google
: “Carbon Emissions and Large Neural Network Training”, -
https://arxiv.org/abs/2104.08691#google
: “The Power of Scale for Parameter-Efficient Prompt Tuning”, Brian Lester, Rami Al-Rfou, Noah Constant -
https://arxiv.org/abs/2103.13009#allen
: “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Nicholas Lourie, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi -
https://arxiv.org/abs/2102.13019
: “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Rodrigo Nogueira, Zhiying Jiang, Jimmy Li -
https://arxiv.org/abs/2101.03961#google
: “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, William Fedus, Barret Zoph, Noam Shazeer -
https://arxiv.org/abs/2009.03300
: “MMLU: Measuring Massive Multitask Language Understanding”, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt -
https://arxiv.org/abs/2007.06225
: “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”,