Bibliography:

  1. ‘GPT’ tag

  2. ‘GPT-2 fiction’ tag

  3. ‘GPT-2 nonfiction’ tag

  4. /doc/ai/nn/transformer/gpt/2/poetry

  5. Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

  6. Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review

  7. Improving Pretraining Data Using Perplexity Correlations

  8. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

  9. Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

  10. The Scaling Law in Stellar Light Curves

  11. From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

  12. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

  13. Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

  14. Test-Time Augmentation to solve ARC

  15. σ-GPTs: A New Approach to Autoregressive Models

  16. Language Imbalance Can Boost Cross-lingual Generalization

  17. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

  18. Do language models plan ahead for future tokens?

  19. Neural Redshift: Random Networks are not Random Functions

  20. A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

  21. Mission: Impossible Language Models

  22. A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

  23. Language Model Alignment with Elastic Reset

  24. Eliciting Language Model Behaviors using Reverse Language Models

  25. Controlled Text Generation via Language Model Arithmetic

  26. Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

  27. Tokenizer Choice For LLM Training: Negligible or Crucial?

  28. What OpenAI Really Wants

  29. Linearity of Relation Decoding in Transformer Language Models

  30. Accelerating LLM Inference with Staged Speculative Decoding

  31. Stay on topic with Classifier-Free Guidance

  32. Likelihood-Based Diffusion Language Models

  33. Mimetic Initialization of Self-Attention Layers

  34. How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

  35. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

  36. Tractable Control for Autoregressive Language Generation

  37. How Does In-Context Learning Help Prompt Tuning?

  38. MarioGPT: Open-Ended Text2Level Generation through Large Language Models

  39. GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities

  40. Geographic and Geopolitical Biases of Language Models

  41. Structured Prompting: Scaling In-Context Learning to 1,000 Examples

  42. Contrastive Decoding: Open-ended Text Generation as Optimization

  43. Contrastive Search Is What You Need For Neural Text Generation

  44. Perfectly Secure Steganography Using Minimum Entropy Coupling

  45. Fine-Tuning Pre-trained Transformers into Decaying Fast Weights

  46. Semantic reconstruction of continuous language from non-invasive brain recordings

  47. Deep language algorithms predict semantic comprehension from brain activity

  48. Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain

  49. DIRECTOR: Generator-Classifiers For Supervised Language Modeling

  50. Offline RL for Natural Language Generation with Implicit Language Q Learning

  51. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

  52. Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

  53. AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling

  54. FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers

  55. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

  56. Time Control: Language modeling via stochastic processes

  57. Quantifying and alleviating political bias in language models

  58. Controllable Natural Language Generation with Contrastive Prefixes

  59. LID: Pre-Trained Language Models for Interactive Decision-Making

  60. Typical Decoding for Natural Language Generation

  61. Can Wikipedia Help Offline Reinforcement Learning?

  62. ClipCap: CLIP Prefix for Image Captioning

  63. Mapping Language Models to Grounded Conceptual Spaces

  64. Relating Neural Text Degeneration to Exposure Bias

  65. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  66. Scarecrow: A Framework for Scrutinizing Machine Text

  67. LoRA: Low-Rank Adaptation of Large Language Models

  68. Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development

  69. GPT-J-6B: 6B JAX-Based Transformer

  70. LHOPT: A Generalizable Approach to Learning Optimizers

  71. A hierarchy of linguistic predictions during natural language comprehension

  72. Why are tar.xz files 15× smaller when using Python’s tar library compared to macOS tar?

  73. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  74. Prefix-Tuning: Optimizing Continuous Prompts for Generation

  75. Bot-Adversarial Dialogue for Safe Conversational Agents

  76. Extracting Training Data from Large Language Models

  77. NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

  78. Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation

  79. GeDi: Generative Discriminator Guided Sequence Generation

  80. Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

  81. Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

  82. The Chess Transformer: Mastering Play using Generative Language Models

  83. true_poetry: Poetry generator by GPT-2 with meter and rhyme constraints

  84. TREC CAsT 2019: The Conversational Assistance Track Overview

  85. OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’

  86. Reducing Non-Normative Text Generation from Language Models

  87. How Novelists Use Generative Language Models: An Exploratory User Study

  88. Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric

  89. Controlling Text Generation with Plug and Play Language Models

  90. AI Dungeon 2

  91. Release Strategies and the Social Impacts of Language Models

  92. GPT-2: 1.5B Release

  93. GPT-2 Folk Music

  94. Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior

  95. Fine-Tuning GPT-2 from Human Preferences

  96. Fine-Tuning Language Models from Human Preferences

  97. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

  98. lm-human-preferences

  99. How To Make Custom AI-Generated Text With GPT-2

  100. OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too

  101. Universal Adversarial Triggers for Attacking and Analyzing NLP

  102. GPT-2: 6-Month Follow-Up

  103. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  104. Addendum: Evaluation of My Model

  105. Replicating GPT-2-1.5B

  106. Unraveling the JPEG: JPEG images are everywhere in our digital lives, but behind the veil of familiarity lie algorithms that remove details that are imperceptible to the human eye. This produces the highest visual quality with the smallest file size—but what does that look like? Let’s see what our eyes can’t see!

  107. Some Pretty Impressive Machine-Learning Generated Poetry Courtesy of GPT-2

  108. LM Explorer (alpha)

  109. GPT-2 As Step Toward General Intelligence

  110. Language Models are Unsupervised Multitask Learners

  111. Better Language Models and Their Implications

  112. Talk To Transformer

  113. Notes on a New Philosophy of Empirical Science

  114. Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers

  115. Timm S. Mueller

  116. c387f7a63c648997e42dc66c053538d4fc1fd517.html

  117. The Difficulties of Text Generation Using Autoregressive Language Models: A Brief Overview

  118. 8fcbeadddf8de25abf54dc7705ae3f5756fb6059.html

  119. Let’s Reproduce GPT-2 (1.6B): One 8×H100 Node, 24 Hours, $672

  120. Alec Radford

  121. TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs

  122. design#future-tag-features

    [Transclude the forward-link's context]

  123. 2020-nadeem-figure1-gpt2samplingqualityvsdiversity.png

  124. 2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png

  125. http://neoscientists.org/~tmueller/binsort/

  126. 9802872843f29498229b95588e7bdebd7d86cad8.html

  127. https://bellard.org/nncp/

  128. https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce

  129. https://github.com/ak9250/gpt-2-colab

  130. https://github.com/ckolivas/lrzip

  131. https://github.com/minimaxir/gpt-2-simple

  132. https://github.com/montemac/activation_additions

  133. https://reasoning-tokens.ghost.io/reasoning-tokens/

  134. https://wiki.archlinux.org/title/Lrzip

  135. https://www.aiweirdness.com/d-and-d-character-bios-now-making-19-03-15/

  136. 1aa51d8bb21d9ae486bb07f4ddeacb998b466a24.html

  137. https://www.lesswrong.com/posts/CNPvESPru3XNqsw7A/what-s-up-with-all-the-non-mormons-weirdly-specific

  138. https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms

  139. https://www.reddit.com/r/mlscaling/comments/1d3a793/andrej_karpathy_gpt2_124m_in_llmc_in_90_minutes/

  140. 4e4720f6192b7f8ba2b0207ec8aea99560162b9b.html

  141. https://x.com/karpathy/status/1859305141385691508

  142. Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

  143. https%253A%252F%252Farxiv.org%252Fabs%252F2410.00179.html

  144. From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

  145. https%253A%252F%252Farxiv.org%252Fabs%252F2405.14838.html

  146. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

  147. https%253A%252F%252Farxiv.org%252Fabs%252F2405.15071.html

  148. Test-Time Augmentation to solve ARC

  149. https%253A%252F%252Flab42.global%252Fcommunity-interview-jack-cole%252F.html

  150. Language Model Alignment with Elastic Reset

  151. Aaron Courville

  152. https%253A%252F%252Farxiv.org%252Fabs%252F2312.07551.html

  153. What OpenAI Really Wants

  154. https%253A%252F%252Fwww.wired.com%252Fstory%252Fwhat-openai-really-wants%252F.html

  155. Stay on topic with Classifier-Free Guidance

  156. Stella Biderman

  157. https%253A%252F%252Farxiv.org%252Fabs%252F2306.17806%2523eleutherai.html

  158. Mimetic Initialization of Self-Attention Layers

  159. https%253A%252F%252Farxiv.org%252Fabs%252F2305.09828.html

  160. MarioGPT: Open-Ended Text2Level Generation through Large Language Models

  161. https%253A%252F%252Farxiv.org%252Fabs%252F2302.05981.html

  162. GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities

  163. https%253A%252F%252Farxiv.org%252Fabs%252F2301.04408.html

  164. Contrastive Decoding: Open-ended Text Generation as Optimization

  165. Percy Liang

  166. Luke Zettlemoyer

  167. Mike Lewis

  168. https%253A%252F%252Farxiv.org%252Fabs%252F2210.15097.html

  169. Contrastive Search Is What You Need For Neural Text Generation

  170. https%253A%252F%252Farxiv.org%252Fabs%252F2210.14140.html

  171. Fine-Tuning Pre-trained Transformers into Decaying Fast Weights

  172. https%253A%252F%252Farxiv.org%252Fabs%252F2210.04243.html

  173. Deep language algorithms predict semantic comprehension from brain activity

  174. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41598-022-20460-9.html

  175. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

  176. Tri Dao

  177. Stefano Ermon

  178. https%253A%252F%252Farxiv.org%252Fabs%252F2205.14135.html

  179. FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers

  180. https%253A%252F%252Faclanthology.org%252F2022.acl-short.43.pdf.html

  181. Quantifying and alleviating political bias in language models

  182. Jason Wei

  183. %252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252F2%252F2022-liu-3.pdf.html

  184. ClipCap: CLIP Prefix for Image Captioning

  185. https%253A%252F%252Farxiv.org%252Fabs%252F2111.09734.html

  186. Mapping Language Models to Grounded Conceptual Spaces

  187. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DgJcEM8sxHK.html

  188. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  189. Jacob Hilton's Homepage

  190. Owain Evans, AI Alignment Researcher

  191. https%253A%252F%252Farxiv.org%252Fabs%252F2109.07958.html

  192. Scarecrow: A Framework for Scrutinizing Machine Text

  193. Noah A. Smith

  194. https%253A%252F%252Farxiv.org%252Fabs%252F2107.01294%2523allen.html

  195. LoRA: Low-Rank Adaptation of Large Language Models

  196. https%253A%252F%252Farxiv.org%252Fabs%252F2106.09685%2523microsoft.html

  197. Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development

  198. https%253A%252F%252Fosf.io%252Fpreprints%252Fpsyarxiv%252Fm6s28%252F.html

  199. GPT-J-6B: 6B JAX-Based Transformer

  200. https%253A%252F%252Farankomatsuzaki.wordpress.com%252F2021%252F06%252F04%252Fgpt-j%252F.html

  201. LHOPT: A Generalizable Approach to Learning Optimizers

  202. https%253A%252F%252Farxiv.org%252Fabs%252F2106.00958%2523openai.html

  203. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  204. Leo Gao

  205. Stella Biderman

  206. https://x.com/NoaNabeshima

  207. https://x.com/theshawwn

  208. https%253A%252F%252Farxiv.org%252Fabs%252F2101.00027%2523eleutherai.html

  209. Prefix-Tuning: Optimizing Continuous Prompts for Generation

  210. Percy Liang

  211. https%253A%252F%252Farxiv.org%252Fabs%252F2101.00190.html

  212. Bot-Adversarial Dialogue for Safe Conversational Agents

  213. https%253A%252F%252Faclanthology.org%252F2021.naacl-main.235.pdf%2523facebook.html

  214. TREC CAsT 2019: The Conversational Assistance Track Overview

  215. https%253A%252F%252Farxiv.org%252Fabs%252F2003.13624.html

  216. OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’

  217. https%253A%252F%252Fwww.newsweek.com%252Fopenai-text-generator-gpt-2-video-game-walkthrough-most-tedious-1488334.html

  218. How Novelists Use Generative Language Models: An Exploratory User Study

  219. Alex Calderwood

  220. %252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Ffiction%252F2020-calderwood.pdf.html

  221. Controlling Text Generation with Plug and Play Language Models

  222. https%253A%252F%252Fwww.uber.com%252Fblog%252Fpplm%252F.html

  223. AI Dungeon 2

  224. https%253A%252F%252Fplay.aidungeon.com%252Fmain%252Fhome.html

  225. GPT-2 Folk Music

  226. Gwern.net Homepage

    [Transclude the forward-link's context]

  227. https://x.com/theshawwn

  228. %252Fgpt-2-music.html

  229. Fine-Tuning GPT-2 from Human Preferences

  230. Alec Radford

  231. https%253A%252F%252Fopenai.com%252Fresearch%252Ffine-tuning-gpt-2.html

  232. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

  233. https%253A%252F%252Farxiv.org%252Fabs%252F1909.08053%2523nvidia.html

  234. How To Make Custom AI-Generated Text With GPT-2

  235. https%253A%252F%252Fminimaxir.com%252F2019%252F09%252Fhowto-gpt2%252F.html

  236. OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too

  237. https%253A%252F%252Fmedium.com%252F%40vanya_cohen%252Fopengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc.html

  238. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  239. https%253A%252F%252Fnv-adlr.github.io%252FMegatronLM.html

  240. Replicating GPT-2-1.5B

  241. https%253A%252F%252Fmedium.com%252F%40NPCollapse%252Freplicating-gpt2-1-5b-86454a7f26af.html

  242. Better Language Models and Their Implications

  243. Alec Radford

  244. https://jack-clark.net/about/

  245. Miles Brundage—About Me

  246. https%253A%252F%252Fopenai.com%252Findex%252Fbetter-language-models%252F.html

  247. TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs

  248. https%253A%252F%252Fsites.research.google%252Ftrc%252F.html