- See Also
-
Links
- “BiLD: Big Little Transformer Decoder”, Et Al 2023
- “Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech With Minimal Supervision”, Et Al 2023
- “BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models”, Et Al 2023
- “InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Et Al 2023
- “Muse: Text-To-Image Generation via Masked Generative Transformers”, Et Al 2023
- “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Et Al 2022
- “Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor”, Et Al 2022
- “ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, Et Al 2022
- “Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, Et Al 2022
- “I Can’t Believe There’s No Images! Learning Visual Tasks Using Only Language Data”, Et Al 2022
- “TART: Task-aware Retrieval With Instructions”, Et Al 2022
- “BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, Et Al 2022
- “EDiff-I: Text-to-Image Diffusion Models With an Ensemble of Expert Denoisers”, Et Al 2022
- “ProMoT: Preserving In-Context Learning Ability in Large Language Model Fine-tuning”, Et Al 2022
- “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Et Al 2022
- “FLAN: Scaling Instruction-Finetuned Language Models”, Et Al 2022
- “Table-To-Text Generation and Pre-training With TabT5”, Et Al 2022
- “GLM-130B: An Open Bilingual Pre-trained Model”, Et Al 2022
- “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Et Al 2022
- “FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation”, Et Al 2022
- “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Et Al 2022
- “Training a T5 Using Lab-sized Resources”, 2022
- “PEER: A Collaborative Language Model”, Et Al 2022
- “RST: ReStructured Pre-training”, 2022
- “Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, FitzEt Al 2022
- “Boosting Search Engines With Interactive Agents”, Et Al 2022
- “CT0: Fine-tuned Language Models Are Continual Learners”, Et Al 2022
- “EdiT5: Semi-Autoregressive Text-Editing With T5 Warm-Start”, Et Al 2022
- “Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, Et Al 2022
- “Automated Crossword Solving”, Et Al 2022
- “Unifying Language Learning Paradigms”, Et Al 2022
- “TK-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, Et Al 2022
- “What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, Et Al 2022
- “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Et Al 2022
- “Pathways: Asynchronous Distributed Dataflow for ML”, Et Al 2022
- “HyperPrompt: Prompt-based Task-Conditioning of Transformers”, Et Al 2022
- “Using Natural Language Prompts for Machine Translation”, 2022
- “Mixture-of-Experts With Expert Choice Routing”, Et Al 2022
- “InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Et Al 2022
- “Reasoning Like Program Executors”, Et Al 2022
- “FRUIT: Faithfully Reflecting Updated Information in Text”, IV Et Al 2021
- “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Et Al 2021
- “Large Dual Encoders Are Generalizable Retrievers”, Et Al 2021
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Et Al 2021
- “ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, Et Al 2021
- “Fast Model Editing at Scale”, Et Al 2021
- “T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Et Al 2021
- “LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, 2021
- “Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, Et Al 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Et Al 2021
- “General-Purpose Question-Answering With Macaw”, 2021
- “Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, Et Al 2021
- “Time-Aware Language Models As Temporal Knowledge Bases”, Et Al 2021
- “Implicit Representations of Meaning in Neural Language Models”, Et Al 2021
- “Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Et Al 2021
- “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Et Al 2021
- “Carbon Emissions and Large Neural Network Training”, Et Al 2021
- “The Power of Scale for Parameter-Efficient Prompt Tuning”, Et Al 2021
- “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Et Al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Et Al 2021
- “VL-T5: Unifying Vision-and-Language Tasks via Text Generation”, Et Al 2021
- “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Et Al 2021
- “MT5: A Massively Multilingual Pre-trained Text-to-text Transformer”, Et Al 2020
- “TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling”, Et Al 2020
- “MMLU: Measuring Massive Multitask Language Understanding”, Et Al 2020
- “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Et Al 2020
- “TTTTTackling WinoGrande Schemas”, Et Al 2020
- “How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, Et Al 2020
- “T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Et Al 2019
- “I Recently Came across Https://arxiv.org/abs/2004.08900, Which ‘Assumes 2-3 Runs’ Of T5-11B. In Fact, We Trained T5-11B Once. That’s Why We Spend 35 Pages Figuring out How We Should Train Before We Start Training. You Don’t Want to Mess up a Training Run That Big.”
- “Transformer-VAE for Program Synthesis”
- Miscellaneous
- Link Bibliography
See Also
Links
“BiLD: Big Little Transformer Decoder”, Et Al 2023
“BiLD: Big Little Transformer Decoder”, 2023-02-15 ( ; similar)
“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech With Minimal Supervision”, Et Al 2023
“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech with Minimal Supervision”, 2023-02-07 ( ; similar)
“BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models”, Et Al 2023
“BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models”, 2023-01-30 ( ; similar; bibliography)
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Et Al 2023
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, 2023-01-08 ( ; similar)
“Muse: Text-To-Image Generation via Masked Generative Transformers”, Et Al 2023
“Muse: Text-To-Image Generation via Masked Generative Transformers”, 2023-01-02 ( ; similar; bibliography)
“One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, Et Al 2022
“One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, 2022-12-19 ( ; similar; bibliography)
“Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor”, Et Al 2022
“Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor”, 2022-12-19 ( ; similar)
“ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, Et Al 2022
“ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages”, 2022-12-13 ( ; similar)
“Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, Et Al 2022
“Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, 2022-12-09 ( ; similar; bibliography)
“I Can’t Believe There’s No Images! Learning Visual Tasks Using Only Language Data”, Et Al 2022
“I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data”, 2022-11-17 ( ; similar)
“TART: Task-aware Retrieval With Instructions”, Et Al 2022
“TART: Task-aware Retrieval with Instructions”, 2022-11-16 ( ; similar)
“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, Et Al 2022
“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, 2022-11-03 ( ; similar)
“EDiff-I: Text-to-Image Diffusion Models With an Ensemble of Expert Denoisers”, Et Al 2022
“eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers”, 2022-11-02 ( ; similar; bibliography)
“ProMoT: Preserving In-Context Learning Ability in Large Language Model Fine-tuning”, Et Al 2022
“ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning”, 2022-11-01 ( ; similar)
“Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Et Al 2022
“Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)”, 2022-10-25 ( ; backlinks; similar; bibliography)
“FLAN: Scaling Instruction-Finetuned Language Models”, Et Al 2022
“FLAN: Scaling Instruction-Finetuned Language Models”, 2022-10-20 ( ; similar; bibliography)
“Table-To-Text Generation and Pre-training With TabT5”, Et Al 2022
“Table-To-Text generation and pre-training with TabT5”, 2022-10-17 ( ; similar)
“GLM-130B: An Open Bilingual Pre-trained Model”, Et Al 2022
“GLM-130B: An Open Bilingual Pre-trained Model”, 2022-10-05 ( ; similar; bibliography)
“SAP: Bidirectional Language Models Are Also Few-shot Learners”, Et Al 2022
“SAP: Bidirectional Language Models Are Also Few-shot Learners”, 2022-09-29 ( ; similar; bibliography)
“FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation”, Et Al 2022
“FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation”, 2022-09-28 ( ; similar)
“PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Et Al 2022
“PaLI: A Jointly-Scaled Multilingual Language-Image Model”, 2022-09-14 ( ; similar)
“Training a T5 Using Lab-sized Resources”, 2022
“Training a T5 Using Lab-sized Resources”, 2022-08-25 (similar)
“PEER: A Collaborative Language Model”, Et Al 2022
“PEER: A Collaborative Language Model”, 2022-08-24 (similar; bibliography)
“RST: ReStructured Pre-training”, 2022
“RST: reStructured Pre-training”, 2022-06-22 ( ; similar)
“Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, FitzEt Al 2022
“Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, 2022-06-15 ( ; similar; bibliography)
“Boosting Search Engines With Interactive Agents”, Et Al 2022
“Boosting Search Engines with Interactive Agents”, 2022-06-04 ( ; similar; bibliography)
“CT0: Fine-tuned Language Models Are Continual Learners”, Et Al 2022
“CT0: Fine-tuned Language Models are Continual Learners”, 2022-05-24 ( ; similar; bibliography)
“EdiT5: Semi-Autoregressive Text-Editing With T5 Warm-Start”, Et Al 2022
“EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start”, 2022-05-24 (similar; bibliography)
“Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, Et Al 2022
“Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”, 2022-05-23 ( ; similar; bibliography)
“Automated Crossword Solving”, Et Al 2022
“Automated Crossword Solving”, 2022-05-19 ( ; similar; bibliography)
“Unifying Language Learning Paradigms”, Et Al 2022
“Unifying Language Learning Paradigms”, 2022-05-10 ( ; similar; bibliography)
“TK-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, Et Al 2022
“Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, 2022-04-16 ( ; backlinks; similar; bibliography)
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, Et Al 2022
“What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?”, 2022-04-12 ( ; backlinks; similar)
“ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Et Al 2022
“ByT5 model for massively multilingual grapheme-to-phoneme conversion”, 2022-04-06 ( ; similar; bibliography)
“Pathways: Asynchronous Distributed Dataflow for ML”, Et Al 2022
“Pathways: Asynchronous Distributed Dataflow for ML”, 2022-03-23 ( ; similar)
“HyperPrompt: Prompt-based Task-Conditioning of Transformers”, Et Al 2022
“HyperPrompt: Prompt-based Task-Conditioning of Transformers”, 2022-03-01 ( ; similar; bibliography)
“Using Natural Language Prompts for Machine Translation”, 2022
“Using natural language prompts for machine translation”, 2022-02-23 ( ; similar; bibliography)
“Mixture-of-Experts With Expert Choice Routing”, Et Al 2022
“Mixture-of-Experts with Expert Choice Routing”, 2022-02-18 ( ; similar; bibliography)
“InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Et Al 2022
“InPars: Data Augmentation for Information Retrieval using Large Language Models”, 2022-02-10 ( ; backlinks; similar)
“Reasoning Like Program Executors”, Et Al 2022
“Reasoning Like Program Executors”, 2022-01-27 ( ; similar; bibliography)
“FRUIT: Faithfully Reflecting Updated Information in Text”, IV Et Al 2021
“FRUIT: Faithfully Reflecting Updated Information in Text”, 2021-12-16 ( ; similar)
“LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Et Al 2021
“LongT5: Efficient Text-To-Text Transformer for Long Sequences”, 2021-12-15 ( ; similar; bibliography)
“Large Dual Encoders Are Generalizable Retrievers”, Et Al 2021
“Large Dual Encoders Are Generalizable Retrievers”, 2021-12-15 ( ; similar; bibliography)
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Et Al 2021
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, 2021-12-08 ( ; similar; bibliography)
“ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, Et Al 2021
“ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning”, 2021-11-22 ( ; similar)
“Fast Model Editing at Scale”, Et Al 2021
“Fast Model Editing at Scale”, 2021-10-21 ( ; similar; bibliography)
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Et Al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, 2021-10-15 ( ; backlinks; similar)
“LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, 2021
“LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5”, 2021-10-14 ( ; similar)
“Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, Et Al 2021
“Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, 2021-09-22 ( ; similar; bibliography)
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Et Al 2021
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, 2021-09-08 ( ; backlinks; similar; bibliography)
“General-Purpose Question-Answering With Macaw”, 2021
“General-Purpose Question-Answering with Macaw”, 2021-09-06 ( ; similar; bibliography)
“Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, Et Al 2021
“Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, 2021-08-19 ( ; similar; bibliography)
“Time-Aware Language Models As Temporal Knowledge Bases”, Et Al 2021
“Time-Aware Language Models as Temporal Knowledge Bases”, 2021-06-29 ( ; similar)
“Implicit Representations of Meaning in Neural Language Models”, Et Al 2021
“Implicit Representations of Meaning in Neural Language Models”, 2021-06-01 (similar; bibliography)
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, Et Al 2021
“Explainable Multi-hop Verbal Reasoning Through Internal Monologue”, 2021-06 ( ; similar)
“ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Et Al 2021
“ByT5: Towards a token-free future with pre-trained byte-to-byte models”, 2021-05-28 ( ; similar; bibliography)
“Carbon Emissions and Large Neural Network Training”, Et Al 2021
“Carbon Emissions and Large Neural Network Training”, 2021-04-21 ( ; similar; bibliography)
“The Power of Scale for Parameter-Efficient Prompt Tuning”, Et Al 2021
“The Power of Scale for Parameter-Efficient Prompt Tuning”, 2021-04-18 ( ; similar; bibliography)
“UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Et Al 2021
“UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, 2021-03-24 ( ; similar; bibliography)
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Et Al 2021
“Investigating the Limitations of the Transformers with Simple Arithmetic Tasks”, 2021-02-25 ( ; backlinks; similar; bibliography)
“VL-T5: Unifying Vision-and-Language Tasks via Text Generation”, Et Al 2021
“VL-T5: Unifying Vision-and-Language Tasks via Text Generation”, 2021-02-04 (backlinks; similar)
“Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, Et Al 2021
“Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”, 2021-01-11 ( ; similar; bibliography)
“MT5: A Massively Multilingual Pre-trained Text-to-text Transformer”, Et Al 2020
“mT5: A massively multilingual pre-trained text-to-text transformer”, 2020-10-22 ( ; similar)
“TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling”, Et Al 2020
“TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling”, 2020-10-08 ( ; similar)
“MMLU: Measuring Massive Multitask Language Understanding”, Et Al 2020
“MMLU: Measuring Massive Multitask Language Understanding”, 2020-09-07 ( ; backlinks; similar; bibliography)
“ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, Et Al 2020
“ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, 2020-07-13 ( ; backlinks; similar; bibliography)
“TTTTTackling WinoGrande Schemas”, Et Al 2020
“TTTTTackling WinoGrande Schemas”, 2020-03-18 ( ; similar)
“How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, Et Al 2020
“How Much Knowledge Can You Pack Into the Parameters of a Language Model?”, 2020-02-10 ( ; similar)
“T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Et Al 2019
“T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”, 2019-10-23 ( ; similar)
“I Recently Came across Https://arxiv.org/abs/2004.08900, Which ‘Assumes 2-3 Runs’ Of T5-11B. In Fact, We Trained T5-11B Once. That’s Why We Spend 35 Pages Figuring out How We Should Train Before We Start Training. You Don’t Want to Mess up a Training Run That Big.”
“Transformer-VAE for Program Synthesis”
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2301.12597#salesforce
: “BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models”, Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi: -
https://arxiv.org/abs/2301.00704#google
: “Muse: Text-To-Image Generation via Masked Generative Transformers”, : -
https://arxiv.org/abs/2212.09741
: “One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)”, : -
https://arxiv.org/abs/2212.05055#google
: “Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints”, : -
https://arxiv.org/abs/2211.01324#nvidia
: “EDiff-I: Text-to-Image Diffusion Models With an Ensemble of Expert Denoisers”, : -
https://arxiv.org/abs/2210.13669
: “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Tuhin Chakrabarty, Vishakh Padmakumar, He He: -
https://arxiv.org/abs/2210.11416#google
: “FLAN: Scaling Instruction-Finetuned Language Models”, : -
https://arxiv.org/abs/2210.02414#baai
: “GLM-130B: An Open Bilingual Pre-trained Model”, : -
https://arxiv.org/abs/2209.14500
: “SAP: Bidirectional Language Models Are Also Few-shot Learners”, Ajay Patel, Bryan Li, Mohammad Sadegh Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch: -
https://arxiv.org/abs/2208.11663#facebook
: “PEER: A Collaborative Language Model”, : -
https://arxiv.org/abs/2206.07808#amazon
: “Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems”, : -
https://openreview.net/forum?id=0ZbPmmB61g#google
: “Boosting Search Engines With Interactive Agents”, : -
https://arxiv.org/abs/2205.12393#facebook
: “CT0: Fine-tuned Language Models Are Continual Learners”, Thomas Scialom, Tuhin Chakrabarty, Smaranda Muresan: -
https://arxiv.org/abs/2205.12209#google
: “EdiT5: Semi-Autoregressive Text-Editing With T5 Warm-Start”, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn: -
https://arxiv.org/abs/2205.11487#google
: “Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, : -
https://arxiv.org/abs/2205.09665#bair
: “Automated Crossword Solving”, Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein: -
https://arxiv.org/abs/2205.05131#google
: “Unifying Language Learning Paradigms”, : -
https://arxiv.org/abs/2204.07705
: “T<em>k< / em>-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, : -
https://arxiv.org/abs/2204.03067
: “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Jian Zhu, Cong Zhang, David Jurgens: -
https://arxiv.org/abs/2203.00759
: “HyperPrompt: Prompt-based Task-Conditioning of Transformers”, : -
https://arxiv.org/abs/2202.11822#google
: “Using Natural Language Prompts for Machine Translation”, Xavier Garcia, Orhan Firat: -
https://arxiv.org/abs/2202.09368#google
: “Mixture-of-Experts With Expert Choice Routing”, : -
https://arxiv.org/abs/2201.11473#microsoft
: “Reasoning Like Program Executors”, Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen: -
https://arxiv.org/abs/2112.07916#google
: “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang: -
https://arxiv.org/abs/2112.07899#google
: “Large Dual Encoders Are Generalizable Retrievers”, : -
https://arxiv.org/abs/2112.11446#deepmind
: “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, : -
https://arxiv.org/abs/2110.11309
: “Fast Model Editing at Scale”, Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning: -
https://arxiv.org/abs/2109.10686#google
: “Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers”, : -
https://arxiv.org/abs/2109.07958
: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Stephanie Lin, Jacob Hilton, Owain Evans: -
https://arxiv.org/abs/2109.02593#allen
: “General-Purpose Question-Answering With Macaw”, Oyvind Tafjord, Peter Clark: -
https://arxiv.org/abs/2108.08877#google
: “Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models”, Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang: -
https://arxiv.org/abs/2106.00737
: “Implicit Representations of Meaning in Neural Language Models”, Belinda Z. Li, Maxwell Nye, Jacob Andreas: -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel: -
https://arxiv.org/abs/2104.10350#google
: “Carbon Emissions and Large Neural Network Training”, : -
https://arxiv.org/abs/2104.08691#google
: “The Power of Scale for Parameter-Efficient Prompt Tuning”, Brian Lester, Rami Al-Rfou, Noah Constant: -
https://arxiv.org/abs/2103.13009#allen
: “UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark”, Nicholas Lourie, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi: -
https://arxiv.org/abs/2102.13019
: “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Rodrigo Nogueira, Zhiying Jiang, Jimmy Li: -
https://arxiv.org/abs/2101.03961#google
: “Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity”, William Fedus, Barret Zoph, Noam Shazeer: -
https://arxiv.org/abs/2009.03300
: “MMLU: Measuring Massive Multitask Language Understanding”, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt: -
https://arxiv.org/abs/2007.06225
: “ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing”, :