- See Also
-
Links
- “Language Imbalance Can Boost Cross-Lingual Generalization”, Schäfer et al 2024
- “Do Language Models Plan Ahead for Future Tokens?”, Wu et al 2024
- “Controlled Text Generation via Language Model Arithmetic”, Dekoninck et al 2023
- “Tokenizer Choice For LLM Training: Negligible or Crucial?”, Ali et al 2023
- “What OpenAI Really Wants”, Levy 2023
- “Linearity of Relation Decoding in Transformer Language Models”, Hernandez et al 2023
- “Accelerating LLM Inference With Staged Speculative Decoding”, Spector & Re 2023
- “Stay on Topic With Classifier-Free Guidance”, Sanchez et al 2023
- “How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model”, Hanna et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
- “Tractable Control for Autoregressive Language Generation”, Zhang et al 2023
- “How Does In-Context Learning Help Prompt Tuning?”, Sun et al 2023
- “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Sudhakaran et al 2023
- “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Bommarito et al 2023
- “Geographic and Geopolitical Biases of Language Models”, Faisal & Anastasopoulos 2022
- “Structured Prompting: Scaling In-Context Learning to 1,000 Examples”, Hao et al 2022
- “Contrastive Decoding: Open-Ended Text Generation As Optimization”, Li et al 2022
- “Contrastive Search Is What You Need For Neural Text Generation”, Su & Collier 2022
- “Perfectly Secure Steganography Using Minimum Entropy Coupling”, Witt et al 2022
- “Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, Mao 2022
- “Semantic Reconstruction of Continuous Language from Non-Invasive Brain Recordings”, Tang et al 2022
- “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Caucheteux et al 2022
- “Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, Goldstein et al 2022
- “DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, Arora et al 2022
- “Offline RL for Natural Language Generation With Implicit Language Q Learning”, Snell et al 2022
- “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
- “FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, Hofmann et al 2022
- “Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, Geva et al 2022
- “Time Control: Language Modeling via Stochastic Processes”, Wang et al 2022
- “Quantifying and Alleviating Political Bias in Language Models”, Liu et al 2022c
- “Controllable Natural Language Generation With Contrastive Prefixes”, Qian et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
- “ClipCap: CLIP Prefix for Image Captioning”, Mokady et al 2021
- “Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
- “Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
- “Scarecrow: A Framework for Scrutinizing Machine Text”, Dou et al 2021
- “LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
- “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Götz et al 2021
- “GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
- “LHOPT: A Generalizable Approach to Learning Optimizers”, Almeida et al 2021
- “A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, Heilbron et al 2021
- “Why Are Tar.xz Files 15× Smaller When Using Python’s Tar Library Compared to MacOS Tar?”, Lindestøkke 2021
- “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
- “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021
- “Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
- “Extracting Training Data from Large Language Models”, Carlini et al 2020
- “NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, Lu et al 2020
- “Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
- “GeDi: Generative Discriminator Guided Sequence Generation”, Krause et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020
- “The Chess Transformer: Mastering Play Using Generative Language Models”, Noever et al 2020
- “True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-Stay 2020
- “TREC CAsT 2019: The Conversational Assistance Track Overview”, Dalton et al 2020
- “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”, Whalen 2020
- “Reducing Non-Normative Text Generation from Language Models”, Peng et al 2020
- “How Novelists Use Generative Language Models: An Exploratory User Study”, Calderwood et al 2020
- “Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
- “Controlling Text Generation With Plug and Play Language Models”, Liu et al 2019
- “AI Dungeon 2”, Walton 2019
- “Release Strategies and the Social Impacts of Language Models”, Solaiman et al 2019
- “GPT-2: 1.5B Release”, Solaiman et al 2019
- “GPT-2 Folk Music”, Gwern & Presser 2019
- “Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
- “Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
- “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Shoeybi et al 2019
- “Lm-Human-Preferences”, Ziegler et al 2019
- “How To Make Custom AI-Generated Text With GPT-2”, Woolf 2019
- “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Gokaslan & Cohen 2019
- “Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
- “GPT-2: 6-Month Follow-Up”, OpenAI 2019
- “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
- “Addendum: Evaluation of My Model”, Leahy 2019
- “Replicating GPT-2-1.5B”, Leahy 2019
- “Unraveling the JPEG: JPEG Images Are Everywhere in Our Digital Lives, but behind the Veil of Familiarity Lie Algorithms That Remove Details That Are Imperceptible to the Human Eye. This Produces the Highest Visual Quality With the Smallest File Size—But What Does That Look Like? Let’s See What Our Eyes Can’t See!”, Shehata 2019
- “LM Explorer (alpha)”, Intelligence 2019
- “GPT-2 As Step Toward General Intelligence”, Alexander 2019
- “Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
- “Better Language Models and Their Implications”, Radford et al 2019
- “Talk To Transformer”, King 2019
- “Notes on a New Philosophy of Empirical Science”, Burfoot 2011
- “Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers”, Ren et al 2010
- “Timm S. Mueller”
- “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC 2024
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Language Imbalance Can Boost Cross-Lingual Generalization”, Schäfer et al 2024
“Do Language Models Plan Ahead for Future Tokens?”, Wu et al 2024
“Controlled Text Generation via Language Model Arithmetic”, Dekoninck et al 2023
“Tokenizer Choice For LLM Training: Negligible or Crucial?”, Ali et al 2023
“What OpenAI Really Wants”, Levy 2023
“Linearity of Relation Decoding in Transformer Language Models”, Hernandez et al 2023
Linearity of Relation Decoding in Transformer Language Models
“Accelerating LLM Inference With Staged Speculative Decoding”, Spector & Re 2023
“Stay on Topic With Classifier-Free Guidance”, Sanchez et al 2023
“How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model”, Hanna et al 2023
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
“Tractable Control for Autoregressive Language Generation”, Zhang et al 2023
“How Does In-Context Learning Help Prompt Tuning?”, Sun et al 2023
“MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Sudhakaran et al 2023
MarioGPT: Open-Ended Text2Level Generation through Large Language Models
“GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Bommarito et al 2023
GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities
“Geographic and Geopolitical Biases of Language Models”, Faisal & Anastasopoulos 2022
“Structured Prompting: Scaling In-Context Learning to 1,000 Examples”, Hao et al 2022
Structured Prompting: Scaling In-Context Learning to 1,000 Examples
“Contrastive Decoding: Open-Ended Text Generation As Optimization”, Li et al 2022
Contrastive Decoding: Open-ended Text Generation as Optimization
“Contrastive Search Is What You Need For Neural Text Generation”, Su & Collier 2022
Contrastive Search Is What You Need For Neural Text Generation
“Perfectly Secure Steganography Using Minimum Entropy Coupling”, Witt et al 2022
Perfectly Secure Steganography Using Minimum Entropy Coupling
“Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, Mao 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
“Semantic Reconstruction of Continuous Language from Non-Invasive Brain Recordings”, Tang et al 2022
Semantic reconstruction of continuous language from non-invasive brain recordings
“Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Caucheteux et al 2022
Deep language algorithms predict semantic comprehension from brain activity
“Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, Goldstein et al 2022
“DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, Arora et al 2022
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
“Offline RL for Natural Language Generation With Implicit Language Q Learning”, Snell et al 2022
Offline RL for Natural Language Generation with Implicit Language Q Learning
“FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
“FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, Hofmann et al 2022
“Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, Geva et al 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
“Time Control: Language Modeling via Stochastic Processes”, Wang et al 2022
“Quantifying and Alleviating Political Bias in Language Models”, Liu et al 2022c
Quantifying and alleviating political bias in language models
“Controllable Natural Language Generation With Contrastive Prefixes”, Qian et al 2022
Controllable Natural Language Generation with Contrastive Prefixes
“LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
LID: Pre-Trained Language Models for Interactive Decision-Making
“Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
“ClipCap: CLIP Prefix for Image Captioning”, Mokady et al 2021
“Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
“Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
“Scarecrow: A Framework for Scrutinizing Machine Text”, Dou et al 2021
“LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
“Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Götz et al 2021
“GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
“LHOPT: A Generalizable Approach to Learning Optimizers”, Almeida et al 2021
“A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, Heilbron et al 2021
A hierarchy of linguistic predictions during natural language comprehension
“Why Are Tar.xz Files 15× Smaller When Using Python’s Tar Library Compared to MacOS Tar?”, Lindestøkke 2021
Why are tar.xz files 15× smaller when using Python’s tar library compared to macOS tar?
“The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
“Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021
“Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
“Extracting Training Data from Large Language Models”, Carlini et al 2020
“NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, Lu et al 2020
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
“Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation
“GeDi: Generative Discriminator Guided Sequence Generation”, Krause et al 2020
“Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
“The Chess Transformer: Mastering Play Using Generative Language Models”, Noever et al 2020
The Chess Transformer: Mastering Play using Generative Language Models
“True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-Stay 2020
true_poetry: Poetry generator by GPT-2 with meter and rhyme constraints
“TREC CAsT 2019: The Conversational Assistance Track Overview”, Dalton et al 2020
TREC CAsT 2019: The Conversational Assistance Track Overview
“OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”, Whalen 2020
OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'
“Reducing Non-Normative Text Generation from Language Models”, Peng et al 2020
“How Novelists Use Generative Language Models: An Exploratory User Study”, Calderwood et al 2020
How Novelists Use Generative Language Models: An Exploratory User Study
“Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
“Controlling Text Generation With Plug and Play Language Models”, Liu et al 2019
Controlling Text Generation with Plug and Play Language Models
“AI Dungeon 2”, Walton 2019
“Release Strategies and the Social Impacts of Language Models”, Solaiman et al 2019
Release Strategies and the Social Impacts of Language Models
“GPT-2: 1.5B Release”, Solaiman et al 2019
“GPT-2 Folk Music”, Gwern & Presser 2019
“Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior
“Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
“Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Shoeybi et al 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
“Lm-Human-Preferences”, Ziegler et al 2019
“How To Make Custom AI-Generated Text With GPT-2”, Woolf 2019
“OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Gokaslan & Cohen 2019
“Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
Universal Adversarial Triggers for Attacking and Analyzing NLP
“GPT-2: 6-Month Follow-Up”, OpenAI 2019
“MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
“Addendum: Evaluation of My Model”, Leahy 2019
“Replicating GPT-2-1.5B”, Leahy 2019
“Unraveling the JPEG: JPEG Images Are Everywhere in Our Digital Lives, but behind the Veil of Familiarity Lie Algorithms That Remove Details That Are Imperceptible to the Human Eye. This Produces the Highest Visual Quality With the Smallest File Size—But What Does That Look Like? Let’s See What Our Eyes Can’t See!”, Shehata 2019
“LM Explorer (alpha)”, Intelligence 2019
“GPT-2 As Step Toward General Intelligence”, Alexander 2019
“Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
“Better Language Models and Their Implications”, Radford et al 2019
“Talk To Transformer”, King 2019
“Notes on a New Philosophy of Empirical Science”, Burfoot 2011
“Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers”, Ren et al 2010
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers
“Timm S. Mueller”
“TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC 2024
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
inference-optimization
contextual-learning
model-parallelism
language-generation
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/2/2020-nadeem-figure1-gpt2samplingqualityvsdiversity.png
: -
https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce
-
https://www.aiweirdness.com/d-and-d-character-bios-now-making-19-03-15/
-
https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms
Link Bibliography
-
https://www.wired.com/story/what-openai-really-wants/
: “What OpenAI Really Wants”, Steven Levy -
https://arxiv.org/abs/2306.17806#eleutherai
: “Stay on Topic With Classifier-Free Guidance”, Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman -
https://arxiv.org/abs/2302.05981
: “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Shyam Sudhakaran, Miguel González-Duque, Claire Glanois, Matthias Freiberger, Elias Najarro, Sebastian Risi -
https://arxiv.org/abs/2301.04408
: “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Jillian Bommarito, Michael Bommarito, Daniel Martin Katz, Jessica Katz -
https://arxiv.org/abs/2210.15097
: “Contrastive Decoding: Open-Ended Text Generation As Optimization”, -
https://arxiv.org/abs/2210.14140
: “Contrastive Search Is What You Need For Neural Text Generation”, Yixuan Su, Nigel Collier -
https://arxiv.org/abs/2210.04243
: “Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, Huanru Henry Mao -
https://www.nature.com/articles/s41598-022-20460-9
: “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Charlotte Caucheteux, Alexandre Gramfort, Jean-Rémi King -
https://arxiv.org/abs/2205.14135
: “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré -
https://aclanthology.org/2022.acl-short.43.pdf
: “FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, Valentin Hofmann, Hinrich Schuetze, Janet Pierrehumbert -
2022-liu-3.pdf
: “Quantifying and Alleviating Political Bias in Language Models”, Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Soroush Vosoughi -
https://arxiv.org/abs/2111.09734
: “ClipCap: CLIP Prefix for Image Captioning”, Ron Mokady, Amir Hertz, Amit H. Bermano -
https://openreview.net/forum?id=gJcEM8sxHK
: “Mapping Language Models to Grounded Conceptual Spaces”, Roma Patel, Ellie Pavlick -
https://arxiv.org/abs/2109.07958
: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Stephanie Lin, Jacob Hilton, Owain Evans -
https://arxiv.org/abs/2107.01294#allen
: “Scarecrow: A Framework for Scrutinizing Machine Text”, Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, <a href=https://nasmith.github.io/>N. A. Smith</a>, Yejin Choi -
https://arxiv.org/abs/2106.09685#microsoft
: “LoRA: Low-Rank Adaptation of Large Language Models”, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen -
https://osf.io/preprints/psyarxiv/m6s28/
: “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Friedrich Götz, Rakoen Maertens, Sander van der Linden -
https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/
: “GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI -
https://arxiv.org/abs/2106.00958#openai
: “LHOPT: A Generalizable Approach to Learning Optimizers”, Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba -
https://arxiv.org/abs/2101.00027#eleutherai
: “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, -
https://arxiv.org/abs/2101.00190
: “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Xiang Lisa Li, Percy Liang -
https://aclanthology.org/2021.naacl-main.235.pdf#facebook
: “Bot-Adversarial Dialogue for Safe Conversational Agents”, Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, Emily Dinan -
https://www.newsweek.com/openai-text-generator-gpt-2-video-game-walkthrough-most-tedious-1488334
: “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”, Andrew Whalen -
2020-calderwood.pdf
: “How Novelists Use Generative Language Models: An Exploratory User Study”, Alex Calderwood, Vivian Qiu, Katy Ilonka Gero, Lydia B. Chilton -
https://www.uber.com/blog/pplm/
: “Controlling Text Generation With Plug and Play Language Models”, Rosanne Liu, Sumanth Dathathri, Andrea Madotto, Piero Molino, Jason Yosinski -
https://play.aidungeon.com/main/home
: “AI Dungeon 2”, Nick Walton -
gpt-2-music
: “GPT-2 Folk Music”, Gwern, Shawn Presser -
https://arxiv.org/abs/1909.08053#nvidia
: “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro -
https://minimaxir.com/2019/09/howto-gpt2/
: “How To Make Custom AI-Generated Text With GPT-2”, Max Woolf -
https://medium.com/@vanya_cohen/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
: “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Aaron Gokaslan, Vanya Cohen -
https://nv-adlr.github.io/MegatronLM
: “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, NVID I. A. ADLR -
https://medium.com/@NPCollapse/replicating-gpt2-1-5b-86454a7f26af
: “Replicating GPT-2-1.5B”, Connor Leahy -
https://openai.com/research/better-language-models
: “Better Language Models and Their Implications”, Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever -
https://sites.research.google/trc/
: “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC