Bibliography:

  1. ‘Transformer’ tag

  2. Chronos: Learning the Language of Time Series

  3. ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

  4. How to Train Data-Efficient LLMs

  5. Time Vectors: Time is Encoded in the Weights of Finetuned Language Models

  6. Rich Human Feedback for Text-to-Image Generation

  7. Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

  8. Instruction-tuning Aligns LLMs to the Human Brain

  9. PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

  10. UT5: Pretraining Non autoregressive T5 with unrolled denoising

  11. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  12. MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

  13. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

  14. RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models

  15. Learning to Model the World with Language

  16. DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

  17. No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

  18. GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

  19. PaLI-X: On Scaling up a Multilingual Vision and Language Model

  20. Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

  21. SoundStorm: Efficient Parallel Audio Generation

  22. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

  23. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

  24. TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

  25. Learning to Compress Prompts with Gist Tokens

  26. BiLD: Big Little Transformer Decoder

  27. Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech with Minimal Supervision

  28. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

  29. InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

  30. Muse: Text-To-Image Generation via Masked Generative Transformers

  31. Character-Aware Models Improve Visual Text Rendering

  32. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

  33. One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)

  34. ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

  35. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

  36. Fast Inference from Transformers via Speculative Decoding

  37. I Can’t Believe There’s No Images! Learning Visual Tasks Using only Language Data

  38. TART: Task-aware Retrieval with Instructions

  39. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  40. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

  41. ProMoT: Preserving In-Context Learning ability in Large Language Model Fine-tuning

  42. Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)

  43. FLAN: Scaling Instruction-Finetuned Language Models

  44. Table-To-Text generation and pre-training with TabT5

  45. GLM-130B: An Open Bilingual Pre-trained Model

  46. SAP: Bidirectional Language Models Are Also Few-shot Learners

  47. FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

  48. PaLI: A Jointly-Scaled Multilingual Language-Image Model

  49. Training a T5 Using Lab-sized Resources

  50. PEER: A Collaborative Language Model

  51. Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

  52. Limitations of Language Models in Arithmetic and Symbolic Induction

  53. RealTime QA: What’s the Answer Right Now?

  54. Forecasting Future World Events with Neural Networks

  55. RST: reStructured Pre-training

  56. Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

  57. Boosting Search Engines with Interactive Agents

  58. EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

  59. CT0: Fine-tuned Language Models are Continual Learners

  60. Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

  61. Automated Crossword Solving

  62. Unifying Language Learning Paradigms

  63. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  64. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

  65. ByT5 model for massively multilingual grapheme-to-phoneme conversion

  66. Pathways: Asynchronous Distributed Dataflow for ML

  67. HyperPrompt: Prompt-based Task-Conditioning of Transformers

  68. Using natural language prompts for machine translation

  69. UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training

  70. Mixture-of-Experts with Expert Choice Routing

  71. InPars: Data Augmentation for Information Retrieval using Large Language Models

  72. Reasoning Like Program Executors

  73. CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

  74. QuALITY: Question Answering with Long Input Texts, Yes!

  75. FRUIT: Faithfully Reflecting Updated Information in Text

  76. Large Dual Encoders Are Generalizable Retrievers

  77. LongT5: Efficient Text-To-Text Transformer for Long Sequences

  78. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

  79. ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

  80. Fast Model Editing at Scale

  81. T0: Multitask Prompted Training Enables Zero-Shot Task Generalization

  82. LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5

  83. Can Machines Learn Morality? The Delphi Experiment

  84. Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

  85. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  86. General-Purpose Question-Answering with Macaw

  87. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

  88. Time-Aware Language Models as Temporal Knowledge Bases

  89. Implicit Representations of Meaning in Neural Language Models

  90. Explainable Multi-hop Verbal Reasoning Through Internal Monologue

  91. ByT5: Towards a token-free future with pre-trained byte-to-byte models

  92. Carbon Emissions and Large Neural Network Training

  93. The Power of Scale for Parameter-Efficient Prompt Tuning

  94. UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

  95. GLM: General Language Model Pretraining with Autoregressive Blank Infilling

  96. Investigating the Limitations of the Transformers with Simple Arithmetic Tasks

  97. VL-T5: Unifying Vision-and-Language Tasks via Text Generation

  98. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

  99. mT5: A massively multilingual pre-trained text-to-text transformer

  100. TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling

  101. MMLU: Measuring Massive Multitask Language Understanding

  102. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing

  103. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

  104. UnifiedQA: Crossing Format Boundaries With a Single QA System

  105. TTTTTackling WinoGrande Schemas

  106. How Much Knowledge Can You Pack Into the Parameters of a Language Model?

  107. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

  108. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  109. Colin Raffel

  110. Transformer-VAE for Program Synthesis

  111. Jason Wei

  112. What Happened to BERT & T5? On Transformer Encoders, PrefixLM and Denoising Objectives

  113. 9070c645ad4e38c52aceee032690150a0ca89c74.html

  114. I Recently Came across Https://arxiv.org/abs/2004.08900, Which ‘Assumes 2-3 Runs’ of T5-11B. In Fact, We Trained T5-11B once. That’s Why We Spend 35 Pages Figuring out How We Should Train Before We Start Training. You Don’t Want to Mess up a Training Run That Big.

  115. design#future-tag-features

    [Transclude the forward-link's context]

  116. 2022-patel-figure2-mt5fewshotpromptingbootstrapselfdistillationprocess.png

  117. 2022-scialom-figure2-t0languagemodelpreservesperformanceofpreviouslylearnedtasksasnewtasksareintroducedwithminimalrehearsalsolvingcontinuallearning.jpg

  118. 2022-scialom-table5-ablationofsheerparameterscalevsscaleduppretraininginenablingcontinuallearningwithoutforgetting.png

  119. 2022-wallace-figure2-berkeleycrosswordsolversolutionpipelineofqatoloopybeliefpropagationtobyt5trefinement.jpg

  120. 2021-tay-figure1-t5pretrainingvsfinetuningtransferscaling.png

  121. 2021-liu-figure1-characterawarevsbpeblindedimagegenerationoftextinsideanimagedemonstratingthatcharacterawaremodelsgeneratetextwell.png

  122. 2021-liu-figure12-randomsamplesforwritingthewordexquisiteusingbyt5vst5showingbyt5usuallyright.jpg

  123. 2021-liu-figure4-accuracyof10imagegenerationmodelsondrawingtextshowsbyt5best.png

  124. 2021-liu-table1-spellingtestforbyt5vst5vspalmshowsbyt5spellsmuchbetter.png

  125. 2019-raffel-figure6-effectsofdatasetduplicationont5traininglosscurves.png

  126. https://aclanthology.org/2023.findings-emnlp.18/

  127. https://blog.eleuther.ai/pile-t5/

  128. https://brianfitzgerald.xyz/prompt-augmentation/

  129. d21cd89c68eff7615a5309cb935eacdc1eb6a396.html

  130. https://colab.research.google.com/drive/1-ROO7L09EupLFLQM-TWgDHa5-FIOdLLh

  131. https://github.com/PiotrNawrot/nanoT5

  132. https://github.com/THUDM/ChatGLM2-6B/blob/main/README_EN.md

  133. https://github.com/deep-floyd/IF

  134. https://github.com/google-research/byt5

  135. https://github.com/google-research/google-research/tree/master/ul2

  136. https://github.com/mbzuai-nlp/LaMini-LM

  137. https://huggingface.co/models?search=flan-t5

  138. https://threadreaderapp.com/thread/1187161460033458177.html

  139. b0b5fb745f18bcf043ce09c45acce25e32d9ff53.html

  140. https://www.forbes.com/sites/rashishrivastava/2023/04/11/writer-generative-ai/

  141. https://www.yitay.net/blog/flan-ul2-20b

  142. 74e1dfc2edfbce0680fcff8fc41e913d9b05e793.html

  143. https://x.com/Mascobot/status/1618246707267141632

  144. https://x.com/RamaswmySridhar/status/1621870497070981121

  145. https://x.com/ShayneRedford/status/1620805305801261058

  146. https://x.com/_jasonwei/status/1621333297891790848

  147. https://x.com/abacaj/status/1618050431657324545

  148. https://x.com/marktenenholtz/status/1787893010753015841

  149. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  150. Jason Wei

  151. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03214%2523google.html

  152. No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models

  153. https%253A%252F%252Farxiv.org%252Fabs%252F2307.06440.html

  154. SoundStorm: Efficient Parallel Audio Generation

  155. https%253A%252F%252Farxiv.org%252Fabs%252F2305.09636%2523google.html

  156. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

  157. https%253A%252F%252Farxiv.org%252Fabs%252F2305.02301%2523google.html

  158. TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

  159. https%253A%252F%252Farxiv.org%252Fabs%252F2304.13731.html

  160. Learning to Compress Prompts with Gist Tokens

  161. https%253A%252F%252Farxiv.org%252Fabs%252F2304.08467.html

  162. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

  163. https%253A%252F%252Farxiv.org%252Fabs%252F2301.12597%2523salesforce.html

  164. Muse: Text-To-Image Generation via Masked Generative Transformers

  165. https%253A%252F%252Farxiv.org%252Fabs%252F2301.00704%2523google.html

  166. Character-Aware Models Improve Visual Text Rendering

  167. William Chan

  168. https%253A%252F%252Farxiv.org%252Fabs%252F2212.10562%2523google.html

  169. One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)

  170. Yizhong Wang—University of Washington

  171. Luke Zettlemoyer

  172. https%253A%252F%252Farxiv.org%252Fabs%252F2212.09741.html

  173. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

  174. Yi Tay

  175. Neil Houlsby

  176. https%253A%252F%252Farxiv.org%252Fabs%252F2212.05055%2523google.html

  177. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  178. Thomas Wang

  179. Stella Biderman

  180. Teven Le Scao

  181. Sheng Shen’s Homepage

  182. Colin Raffel

  183. https%253A%252F%252Farxiv.org%252Fabs%252F2211.01786.html

  184. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

  185. https%253A%252F%252Farxiv.org%252Fabs%252F2211.01324%2523nvidia.html

  186. Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)

  187. https%253A%252F%252Farxiv.org%252Fabs%252F2210.13669.html

  188. FLAN: Scaling Instruction-Finetuned Language Models

  189. Barret Zoph

  190. Yi Tay

  191. Jason Wei

  192. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11416%2523google.html

  193. GLM-130B: An Open Bilingual Pre-trained Model

  194. https%253A%252F%252Farxiv.org%252Fabs%252F2210.02414%2523baai.html

  195. SAP: Bidirectional Language Models Are Also Few-shot Learners

  196. Colin Raffel

  197. https%253A%252F%252Farxiv.org%252Fabs%252F2209.14500.html

  198. PEER: A Collaborative Language Model

  199. https%253A%252F%252Farxiv.org%252Fabs%252F2208.11663%2523facebook.html

  200. Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

  201. Jianfeng Gao at Microsoft Research

  202. https%253A%252F%252Farxiv.org%252Fabs%252F2208.09770%2523microsoft.html

  203. Forecasting Future World Events with Neural Networks

  204. Andy Zou

  205. Mantas Mazeika

  206. Jacob Steinhardt

  207. Owain Evans, AI Alignment Researcher

  208. https://people.eecs.berkeley.edu/~hendrycks/

  209. https%253A%252F%252Farxiv.org%252Fabs%252F2206.15474.html

  210. Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

  211. https%253A%252F%252Farxiv.org%252Fabs%252F2206.07808%2523amazon.html

  212. Boosting Search Engines with Interactive Agents

  213. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html

  214. EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start

  215. https%253A%252F%252Farxiv.org%252Fabs%252F2205.12209%2523google.html

  216. CT0: Fine-tuned Language Models are Continual Learners

  217. https%253A%252F%252Farxiv.org%252Fabs%252F2205.12393.html

  218. Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

  219. William Chan

  220. Tim Salimans

  221. Jonathan Ho

  222. https%253A%252F%252Farxiv.org%252Fabs%252F2205.11487%2523google.html

  223. Automated Crossword Solving

  224. https%253A%252F%252Farxiv.org%252Fabs%252F2205.09665%2523bair.html

  225. Unifying Language Learning Paradigms

  226. Yi Tay

  227. Neil Houlsby

  228. https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html

  229. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  230. Yizhong Wang—University of Washington

  231. Noah A. Smith

  232. Hannaneh Hajishirzi—University of Washington

  233. https%253A%252F%252Farxiv.org%252Fabs%252F2204.07705.html

  234. ByT5 model for massively multilingual grapheme-to-phoneme conversion

  235. https%253A%252F%252Farxiv.org%252Fabs%252F2204.03067.html

  236. HyperPrompt: Prompt-based Task-Conditioning of Transformers

  237. Yi Tay

  238. https%253A%252F%252Farxiv.org%252Fabs%252F2203.00759.html

  239. Using natural language prompts for machine translation

  240. https%253A%252F%252Farxiv.org%252Fabs%252F2202.11822%2523google.html

  241. Mixture-of-Experts with Expert Choice Routing

  242. https%253A%252F%252Farxiv.org%252Fabs%252F2202.09368%2523google.html

  243. Reasoning Like Program Executors

  244. https%253A%252F%252Farxiv.org%252Fabs%252F2201.11473%2523microsoft.html

  245. CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

  246. https%253A%252F%252Farxiv.org%252Fabs%252F2201.05320%2523allen.html

  247. Large Dual Encoders Are Generalizable Retrievers

  248. https%253A%252F%252Farxiv.org%252Fabs%252F2112.07899%2523google.html

  249. LongT5: Efficient Text-To-Text Transformer for Long Sequences

  250. https%253A%252F%252Farxiv.org%252Fabs%252F2112.07916%2523google.html

  251. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

  252. Karen Simonyan

  253. https://x.com/jekbradbury

  254. Koray Kavukcuoglu

  255. https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html

  256. Fast Model Editing at Scale

  257. https%253A%252F%252Farxiv.org%252Fabs%252F2110.11309.html

  258. Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

  259. Yi Tay

  260. https%253A%252F%252Farxiv.org%252Fabs%252F2109.10686%2523google.html

  261. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  262. Jacob Hilton's Homepage

  263. Owain Evans, AI Alignment Researcher

  264. https%253A%252F%252Farxiv.org%252Fabs%252F2109.07958.html

  265. General-Purpose Question-Answering with Macaw

  266. https%253A%252F%252Farxiv.org%252Fabs%252F2109.02593%2523allen.html

  267. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

  268. https%253A%252F%252Farxiv.org%252Fabs%252F2108.08877%2523google.html

  269. Implicit Representations of Meaning in Neural Language Models

  270. Jacob Andreas @ MIT

  271. https%253A%252F%252Farxiv.org%252Fabs%252F2106.00737.html

  272. ByT5: Towards a token-free future with pre-trained byte-to-byte models

  273. Colin Raffel

  274. https%253A%252F%252Farxiv.org%252Fabs%252F2105.13626%2523google.html

  275. Carbon Emissions and Large Neural Network Training

  276. https%253A%252F%252Farxiv.org%252Fabs%252F2104.10350%2523google.html

  277. UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

  278. https%253A%252F%252Farxiv.org%252Fabs%252F2103.13009%2523allen.html

  279. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

  280. Barret Zoph

  281. https%253A%252F%252Farxiv.org%252Fabs%252F2101.03961%2523google.html

  282. MMLU: Measuring Massive Multitask Language Understanding

  283. https://people.eecs.berkeley.edu/~hendrycks/

  284. Steven's Web Thoughts

  285. Andy Zou

  286. Mantas Mazeika

  287. Jacob Steinhardt

  288. https%253A%252F%252Farxiv.org%252Fabs%252F2009.03300.html

  289. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing

  290. Llion Jones

  291. https%253A%252F%252Farxiv.org%252Fabs%252F2007.06225.html