ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Time Vectors: Time is Encoded in the Weights of Finetuned Language Models
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
UT5: Pretraining Non autoregressive T5 with unrolled denoising
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech with Minimal Supervision
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
Muse: Text-To-Image Generation via Masked Generative Transformers
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data
BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)
SAP: Bidirectional Language Models Are Also Few-shot Learners
FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
Limitations of Language Models in Arithmetic and Symbolic Induction
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
ByT5 model for massively multilingual grapheme-to-phoneme conversion
HyperPrompt: Prompt-based Task-Conditioning of Transformers
UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training
InPars: Data Augmentation for Information Retrieval using Large Language Models
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
T0: Multitask Prompted Training Enables Zero-Shot Task Generalization
LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Implicit Representations of Meaning in Neural Language Models
Explainable Multi-hop Verbal Reasoning Through Internal Monologue
ByT5: Towards a token-free future with pre-trained byte-to-byte models
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
VL-T5: Unifying Vision-and-Language Tasks via Text Generation
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
mT5: A massively multilingual pre-trained text-to-text transformer
TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling
ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
UnifiedQA: Crossing Format Boundaries With a Single QA System
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
What Happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives
I Recently Came across Https://arxiv.org/abs/2004.08900, Which ‘Assumes 2-3 Runs’ of T5-11B. In Fact, We Trained T5-11B once. That’s Why We Spend 35 Pages Figuring out How We Should Train Before We Start Training. You Don’t Want to Mess up a Training Run That Big.
2022-patel-figure2-mt5fewshotpromptingbootstrapselfdistillationprocess.png
2022-scialom-figure2-t0languagemodelpreservesperformanceofpreviouslylearnedtasksasnewtasksareintroducedwithminimalrehearsalsolvingcontinuallearning.jpg
2022-scialom-table5-ablationofsheerparameterscalevsscaleduppretraininginenablingcontinuallearningwithoutforgetting.png
2022-wallace-figure2-berkeleycrosswordsolversolutionpipelineofqatoloopybeliefpropagationtobyt5trefinement.jpg
2021-tay-figure1-t5pretrainingvsfinetuningtransferscaling.png
2021-liu-figure1-characterawarevsbpeblindedimagegenerationoftextinsideanimagedemonstratingthatcharacterawaremodelsgeneratetextwell.png
2021-liu-figure12-randomsamplesforwritingthewordexquisiteusingbyt5vst5showingbyt5usuallyright.jpg
2021-liu-figure4-accuracyof10imagegenerationmodelsondrawingtextshowsbyt5best.png
2021-liu-table1-spellingtestforbyt5vst5vspalmshowsbyt5spellsmuchbetter.png
2019-raffel-figure6-effectsofdatasetduplicationont5traininglosscurves.png
https://colab.research.google.com/drive/1-ROO7L09EupLFLQM-TWgDHa5-FIOdLLh
https://github.com/THUDM/ChatGLM2-6B/blob/main/README_EN.md
https://github.com/google-research/google-research/tree/master/ul2
https://threadreaderapp.com/thread/1187161460033458177.html
https://www.forbes.com/sites/rashishrivastava/2023/04/11/writer-generative-ai/
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
https%253A%252F%252Farxiv.org%252Fabs%252F2310.03214%2523google.html
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2305.09636%2523google.html
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
https%253A%252F%252Farxiv.org%252Fabs%252F2305.02301%2523google.html
TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2301.12597%2523salesforce.html
Muse: Text-To-Image Generation via Masked Generative Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2301.00704%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2212.10562%2523google.html
One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
https%253A%252F%252Farxiv.org%252Fabs%252F2212.05055%2523google.html
BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
https%253A%252F%252Farxiv.org%252Fabs%252F2211.01324%2523nvidia.html
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)
https%253A%252F%252Farxiv.org%252Fabs%252F2210.11416%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.02414%2523baai.html
SAP: Bidirectional Language Models Are Also Few-shot Learners
https%253A%252F%252Farxiv.org%252Fabs%252F2208.11663%2523facebook.html
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
https%253A%252F%252Farxiv.org%252Fabs%252F2208.09770%2523microsoft.html
Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
https%253A%252F%252Farxiv.org%252Fabs%252F2206.07808%2523amazon.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html
EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start
https%253A%252F%252Farxiv.org%252Fabs%252F2205.12209%2523google.html
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
https%253A%252F%252Farxiv.org%252Fabs%252F2205.11487%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.09665%2523bair.html
https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html
Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
ByT5 model for massively multilingual grapheme-to-phoneme conversion
HyperPrompt: Prompt-based Task-Conditioning of Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2202.11822%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.09368%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2201.11473%2523microsoft.html
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification
https%253A%252F%252Farxiv.org%252Fabs%252F2201.05320%2523allen.html
https%253A%252F%252Farxiv.org%252Fabs%252F2112.07899%2523google.html
LongT5: Efficient Text-To-Text Transformer for Long Sequences
https%253A%252F%252Farxiv.org%252Fabs%252F2112.07916%2523google.html
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2109.10686%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2109.02593%2523allen.html
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
https%253A%252F%252Farxiv.org%252Fabs%252F2108.08877%2523google.html
Implicit Representations of Meaning in Neural Language Models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
https%253A%252F%252Farxiv.org%252Fabs%252F2105.13626%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2104.10350%2523google.html
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
https%253A%252F%252Farxiv.org%252Fabs%252F2103.13009%2523allen.html
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
https%253A%252F%252Farxiv.org%252Fabs%252F2101.03961%2523google.html
ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing
Wikipedia Bibliography: