Bibliography:

  1. ‘GPT’ tag

  2. ‘ML dataset’ tag

  3. ‘Codex’ tag

  4. ‘inner monologue (AI)’ tag

  5. ‘LW surveys’ tag

  6. ‘Decision Transformer’ tag

  7. SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

  8. Instruction Following without Instruction Tuning

  9. Hermes 3 Technical Report

  10. State Soup: In-Context Skill Learning, Retrieval and Mixing

  11. Auto Evol-Instruct: Automatic Instruction Evolving for Large Language Models

  12. Instruction Modeling: Instruction Tuning With Loss Over Instructions

  13. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

  14. Best Practices and Lessons Learned on Synthetic Data for Language Models

  15. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

  16. Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

  17. COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

  18. MetaAligner: Conditional Weak-to-Strong Correction for Generalizable Multi-Objective Alignment of Language Models

  19. Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

  20. StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

  21. How to Train Data-Efficient LLMs

  22. Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling

  23. WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

  24. VILA: On Pre-training for Visual Language Models

  25. Instruction-tuning Aligns LLMs to the Human Brain

  26. R-Tuning: Teaching Large Language Models to Refuse Unknown Questions

  27. When ‘A Helpful Assistant’ Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models

  28. Language Models are Super Mario (DARE): Absorbing Abilities from Homologous Models as a Free Lunch

  29. ChipNeMo: Domain-Adapted LLMs for Chip Design

  30. Mistral-7B

  31. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

  32. LLaVA-1.5: Improved Baselines with Visual Instruction Tuning

  33. UltraFeedback: Boosting Language Models with High-quality Feedback

  34. AceGPT, Localizing Large Language Models in Arabic

  35. Can Programming Languages Boost Each Other via Instruction Tuning?

  36. DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

  37. LLaMA-2: Open Foundation and Fine-Tuned Chat Models

  38. AlpaGasus: Training A Better Alpaca with Fewer Data

  39. Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

  40. Lost in the Middle: How Language Models Use Long Contexts

  41. On the Exploitability of Instruction Tuning

  42. ChessGPT: Bridging Policy Learning and Language Modeling

  43. Dr. LLaMa: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation

  44. SELF-ALIGN: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

  45. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

  46. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

  47. WizardLM: Empowering Large Language Models to Follow Complex Instructions

  48. TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

  49. Phoenix: Democratizing ChatGPT across Languages

  50. How well do Large Language Models perform in Arithmetic tasks?

  51. Larger language models do in-context learning differently

  52. LLaMa-1: Open and Efficient Foundation Language Models

  53. How Does In-Context Learning Help Prompt Tuning?

  54. Med-PaLM: Large Language Models Encode Clinical Knowledge

  55. Self-Instruct: Aligning Language Models with Self-Generated Instructions

  56. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

  57. One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)

  58. HALIE: Evaluating Human-Language Model Interaction

  59. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  60. Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)

  61. FLAN: Scaling Instruction-Finetuned Language Models

  62. Language Models are Multilingual Chain-of-Thought Reasoners

  63. LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging

  64. Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

  65. Few-shot Adaptation Works with UnpredicTable Data

  66. RST: reStructured Pre-training

  67. InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

  68. CT0: Fine-tuned Language Models are Continual Learners

  69. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  70. What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

  71. UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training

  72. Reasoning Like Program Executors

  73. ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization

  74. ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

  75. MetaICL: Learning to Learn In Context

  76. T0: Multitask Prompted Training Enables Zero-Shot Task Generalization

  77. FLAN: Finetuned Language Models Are Zero-Shot Learners

  78. Cross-Task Generalization via Natural Language Crowdsourcing Instructions

  79. CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

  80. Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

  81. Muppet: Massive Multi-task Representations with Pre-Finetuning

  82. UnifiedQA: Crossing Format Boundaries With a Single QA System

  83. The Natural Language Decathlon: Multitask Learning as Question Answering

  84. No Robots: Look Ma, an instruction dataset that wasn’t generated by GPTs!

  85. The RetroInstruct Guide To Synthetic Text Data

  86. 496263b26aa6d7d6f161844fdec698493f2b0773.html

  87. 2023-wu-figure5-humanevaluationofinstructionfinetunedmodelsbysizeon114tasksvsgpt35turboteacher.jpg

  88. 2022-chung-figure2-1836tasksforinstructionfinetuningflanpalm.png

  89. 2022-chung-figure4-scalingofinstructionfinetuningbymodelsizeandtaskcount.png

  90. 2022-chung-mainresultsfigurecodexvsdavincivspalmvsflanpalm.jpg

  91. 2022-chung-table1-average5shotmmluscoresforflanpalmshatteringmetaculushypermindforecastsaboutaiprogress.png

  92. 2022-chung-table2-thesmallcostofinstructiontuningtrainingvsoriginaltrainingcost.png

  93. 2022-gupta-figure4-instructdialinstructiontunedmodelperformanceincreaseswithnumberoftrainingtasksshowingblessingsofscale.jpg

  94. 2022-su-figure5-instructormodelsbenefitfromlongerdetaileddescriptionofdesiredembeddingfunctionality.png

  95. 2022-su-figure6-instructormodelsbenefitfromscalingupmodelsize.png

  96. 2022-wang-figure5-scalingtrendsofmodelsbynumberoftrainingtasksvsdatapointspertask.jpg

  97. 2022-xu-figure1-zeroprompttaskscalingvsmodelscalingonauc.png

  98. 2021-aghajanyan-figure1-prefinetuningscalingwithdatasetn.jpg

  99. https://crfm.stanford.edu/2023/03/13/alpaca.html

  100. https://eugeneyan.com/writing/synthetic/

  101. https://github.com/bigscience-workshop/architecture-objective

  102. https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints

  103. https://github.com/mbzuai-nlp/LaMini-LM

  104. https://huggingface.co/google/flan-t5-large

  105. https://research.google/blog/introducing-flan-more-generalizable-language-models-with-instruction-fine-tuning/

  106. https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html

  107. 58587257bd961586b4cb49f2880fef5e960154f4.html

  108. https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

  109. 53d164f364ffab504f17aff8295f450568ce50bc.html

  110. https://www.yitay.net/blog/flan-ul2-20b

  111. 74e1dfc2edfbce0680fcff8fc41e913d9b05e793.html

  112. https://x.com/ShayneRedford/status/1620805305801261058

  113. https://x.com/austinc3301/status/1861084272431390819

  114. https://x.com/fluffykittnmeow/status/1729072654420680908

  115. SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

  116. https%253A%252F%252Farxiv.org%252Fabs%252F2410.10629%2523nvidia.html

  117. StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

  118. https%253A%252F%252Farxiv.org%252Fabs%252F2402.16671.html

  119. Rephrasing the Web (WARP): A Recipe for Compute and Data-Efficient Language Modeling

  120. https%253A%252F%252Farxiv.org%252Fabs%252F2401.16380%2523apple.html

  121. Mistral-7B

  122. Teven Le Scao

  123. Thomas Wang

  124. https%253A%252F%252Farxiv.org%252Fabs%252F2310.06825%2523mistral.html

  125. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

  126. https%253A%252F%252Farxiv.org%252Fabs%252F2310.05736.html

  127. UltraFeedback: Boosting Language Models with High-quality Feedback

  128. Ning Ding

  129. https%253A%252F%252Farxiv.org%252Fabs%252F2310.01377.html

  130. AceGPT, Localizing Large Language Models in Arabic

  131. https%253A%252F%252Farxiv.org%252Fabs%252F2309.12053.html

  132. AlpaGasus: Training A Better Alpaca with Fewer Data

  133. https%253A%252F%252Farxiv.org%252Fabs%252F2307.08701%2523samsung.html

  134. Dr. LLaMa: Improving Small Language Models in Domain-Specific QA via Generative Data Augmentation

  135. https%253A%252F%252Farxiv.org%252Fabs%252F2305.07804.html

  136. SELF-ALIGN: Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

  137. https://www.cs.cmu.edu/~./yiming/

  138. https%253A%252F%252Farxiv.org%252Fabs%252F2305.03047%2523ibm.html

  139. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

  140. https%253A%252F%252Farxiv.org%252Fabs%252F2305.02301%2523google.html

  141. WizardLM: Empowering Large Language Models to Follow Complex Instructions

  142. https%253A%252F%252Farxiv.org%252Fabs%252F2304.12244.html

  143. TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

  144. https%253A%252F%252Farxiv.org%252Fabs%252F2304.13731.html

  145. How well do Large Language Models perform in Arithmetic tasks?

  146. https%253A%252F%252Farxiv.org%252Fabs%252F2304.02015%2523alibaba.html

  147. Larger language models do in-context learning differently

  148. Jason Wei

  149. Yi Tay

  150. https%253A%252F%252Farxiv.org%252Fabs%252F2303.03846%2523google.html

  151. Med-PaLM: Large Language Models Encode Clinical Knowledge

  152. Jason Wei

  153. https%253A%252F%252Farxiv.org%252Fabs%252F2212.13138%2523google.html

  154. Self-Instruct: Aligning Language Models with Self-Generated Instructions

  155. Yizhong Wang—University of Washington

  156. Hannaneh Hajishirzi—University of Washington

  157. https%253A%252F%252Farxiv.org%252Fabs%252F2212.10560.html

  158. One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)

  159. Yizhong Wang—University of Washington

  160. Luke Zettlemoyer

  161. https%253A%252F%252Farxiv.org%252Fabs%252F2212.09741.html

  162. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  163. Thomas Wang

  164. Stella Biderman

  165. Teven Le Scao

  166. Sheng Shen’s Homepage

  167. Colin Raffel

  168. https%253A%252F%252Farxiv.org%252Fabs%252F2211.01786.html

  169. Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)

  170. https%253A%252F%252Farxiv.org%252Fabs%252F2210.13669.html

  171. FLAN: Scaling Instruction-Finetuned Language Models

  172. Barret Zoph

  173. Yi Tay

  174. Jason Wei

  175. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11416%2523google.html

  176. Language Models are Multilingual Chain-of-Thought Reasoners

  177. Yi Tay

  178. Jason Wei

  179. https%253A%252F%252Farxiv.org%252Fabs%252F2210.03057%2523google.html

  180. Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

  181. Jianfeng Gao at Microsoft Research

  182. https%253A%252F%252Farxiv.org%252Fabs%252F2208.09770%2523microsoft.html

  183. CT0: Fine-tuned Language Models are Continual Learners

  184. https%253A%252F%252Farxiv.org%252Fabs%252F2205.12393.html

  185. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  186. Yizhong Wang—University of Washington

  187. Noah A. Smith

  188. Hannaneh Hajishirzi—University of Washington

  189. https%253A%252F%252Farxiv.org%252Fabs%252F2204.07705.html

  190. Reasoning Like Program Executors

  191. https%253A%252F%252Farxiv.org%252Fabs%252F2201.11473%2523microsoft.html

  192. ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization

  193. Zhilin Yang

  194. https%253A%252F%252Farxiv.org%252Fabs%252F2201.06910.html