Update: Upgrading to 1.5B GPT-2, and adding 22 new subreddit-bots
https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/
The Illustrated GPT-2 (Visualizing Transformer Language Models)
Character-Level Language Modeling with Deeper Self-Attention
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL—Combining Transformers and RNNs Into a State-Of-The-Art Language Model
Understanding BERT Transformer: Attention Isn’t All You Need
Transformers are a very exciting family of machine learning architectures
The Transformer Family: Attention and Self-Attention · Multi-Head Self-Attention · Transformer · Adaptive Computation Time (ACT) · Improved Attention Span: (Longer Attention Span (Transformer-XL) / Adaptive Attention Span / Localized Attention Span (Image Transformer)) · Less Time and Memory Cost: (Sparse Attention Matrix Factorization (Sparse Transformers) / Locality-Sensitive Hashing (Reformer)) · Make It Recurrent (Universal Transformer) · Stabilization for RL (GTrXL)
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Karpathy/minGPT: A Minimal PyTorch Re-Implementation of the OpenAI GPT (Generative Pretrained Transformer) Training
Efficient Attention: Breaking The Quadratic Transformer Bottleneck
GPT-1: Improving Language Understanding with Unsupervised Learning
Humans Who Are Not Concentrating Are Not General Intelligences
https://colab.research.google.com/drive/1BXry0kcm869-RVHHiY6NZmY9uBzbkf1Q
XLNet: Generalized Autoregressive Pretraining for Language Understanding
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
https://colab.research.google.com/drive/1-ROO7L09EupLFLQM-TWgDHa5-FIOdLLh
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Talking to Myself or How I Trained GPT-2-1.5b for Rubber Ducking Using My Facebook Chat Data: Using Only Google Colab
https://www.reddit.com/r/slatestarcodex/comments/as8ke7/an_eternal_howl/
FridAI: ‘Water, water, everywhere’, as read by Artificial Intelligence
https://www.reddit.com/r/MachineLearning/comments/coc09l/p_these_lyrics_do_not_exist/
Testing The Limits of GROVER The Neural Fake News Detector. Can It Write Fiction? Can It Write Riddles?
https://www.reddit.com/r/SubSimulatorGPT2Meta/comments/ccvspt/update_experimenting_with_generating_hybrid/
CTRL: A Conditional Transformer Language Model For Controllable Generation
Conditional Transformer Language Model for Controllable Generation
https://papergains.co/pdfs/Transformer_Poetry-978-1-7341647-0-1.pdf#page=3
345M-GPT-2 After James Wright: Can AI Generate Convincing Contemporary Poetry?
Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric
Nshepperd/gpt-2: Code for the Paper "Language Models Are Unsupervised Multitask Learners"
ConnorJL/GPT2: An Implementation of Training for GPT-2, Supports TPUs
A Small Module Meant for Use in Text Generators That Lets You Filter Strings for Bad Words
The Unreasonable Effectiveness of Recurrent Neural Networks
https://mega.nz/#!HXhRwS7R!yl4qZM-gMWdn4Qc3scavOBKqdLNAcZ_WYd2gVPqabPg
rnn-metadata#inline-metadata-trick
https://mega.nz/#!2PhghaZD!_IJPpErXIRIDwRI0ktq2UKUZClDEoY7z8UpF28_qme8
https://mega.nz/#!zX4lzCzK!TNo_1uDlvszGkBUEdd5R_cQ-7Dfv0gyaaaq8BVzw1jA
Semantic projection: recovering human knowledge of multiple, distinct object features from word embeddings
Verb Physics: Relative Physical Knowledge of Actions and Objects
Grounding the Ungrounded: Estimating Locations of Unknown Place Names from Linguistic Associations and Grounded Representations
UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation
Fitting Larger Networks into Memory: TLDR; We Release the Python/Tensorflow Package Openai/gradient-Checkpointing, That Lets You Fit 10× Larger Neural Nets into Memory at the Cost of an Additional 20% Computation Time
MuseNet: a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles
Averaging Weights Leads to Wider Optima and Better Generalization
https://mega.nz/#!XMl3UI7b!KNLvp5wuxe_WAgJwkMVDiyyNmDl9XDXuipl-dQ6Phow
This Is a Python Script As Described in XKCD #1263: ‘Reassuring’. It Generates Thousands of Reassuring Parables about Things Humans Are Better Than Computers at Every Second.
2019-05-24-gpt2-poetry-yeatssecondcoming-500completions.txt
https://www.awanderingmind.blog/posts/2024-01-14-tao-te-ching-by-an-llm.html
https://web.archive.org/web/20200209040154/https://decaut.org/situ/index.php/ttc-compilation/
https://mega.nz/#!m5FWGCgZ!cjvMgViPbBqep_6HqYDb2D3Kl8Tt-RsUnwg7457IfDk
Release Strategies and the Social Impacts of Language Models
Swarm Training: We Demonstrate a New Technique to Train ML Models Using Dozens of Independent TPUs.
Shawwn/gpt-2: Code for the Paper "Language Models Are Unsupervised Multitask Learners"
Danbooru2019 Is a Large-Scale Anime Image Database With 3.69m+ Images Annotated With 108m+ Tags; It Can Be Useful for Machine Learning Purposes such as Image Recognition and Generation.
The Google SRE Handbook: Chapter 4—Service Level Objectives
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Language Models Are Unsupervised Multitask Learners § Experiments
Figure F.1: Four Uncurated Completions from a Context Suggesting the Model Compose a Poem in the Style of Wallace Stevens With the Title ‘Shadows on the Way’
true_poetry: Poetry generator by GPT-2 with meter and rhyme constraints
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Do Massively Pretrained Language Models Make Better Storytellers?
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms
AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
https://www.reddit.com/r/slatestarcodex/comments/b1b47h/gwerns_aigenerated_poetry/
Some Pretty Impressive Machine-Learning Generated Poetry Courtesy of GPT-2
Hark! from Those Shadowy Depths Thy Voice / Mournfully Echoes
Simonepri/lm-Scorer: 📃Language Model Based Sentences Scoring Library
https://web.archive.org/web/20220526054159/http://bkkaggle.github.io/blog/algpt2/2020/06/22/ALGPT2-part-1
https://web.archive.org/web/20210131134147/https://bkkaggle.github.io/blog/algpt2/2020/07/17/ALGPT2-part-2.html
Seduced, Shaggy Samson Snored: The Fictional Machine That Generated Poems, and the Real People Who Had to Translate Them
How to Build a State-Of-The-Art Conversational AI With Transfer Learning by Thomas Wolf
https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/what_is_rsubsimulatorgpt2/
Minimaxir/gpt-2-Keyword-Generation: Method to Encode Text for GPT-2 to Generate Text Based on Provided Keywords
Introducing Aspects of Creativity in Automatic Poetry Generation
Smart Vet: Autocompleting Sentences in Veterinary Medical Records
Deepfake Bot Submissions to Federal Public Comment Websites Cannot Be Distinguished from Human Submissions
https://towardsdatascience.com/how-to-fine-tune-gpt-2-so-you-can-generate-long-form-creative-writing-7a5ae1314a61
This AI Poet Mastered Rhythm, Rhyme, and Natural Language to Write Like Shakespeare
Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme
AdapterHub - 625 Adapters for 71 Text Tasks and 97 Languages
Collaborative Storytelling with Large-scale Neural Language Models
This Article Provides an Overview of Recent Methods to Fine-Tune Large Pre-Trained Language Models
Making Pre-trained Language Models Better Few-shot Learners
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Controllable Generation from Pre-trained Language Models via Inverse Prompting
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
DART: Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners
Towards a Unified View of Parameter-Efficient Transfer Learning
2020-02-03-gpt21.5b-archiveofourownao3-model-510427-samples-topp090.txt
😇A PyTorch Implementation of the DeepMoji Model: State-Of-The-Art Deep Learning Model for Analyzing Sentiment, Emotion, Sarcasm Etc
2021-05-05-astraliteheart-purplesmartai-mylittleponygpt215b-twilightsparkledialogue.png
2021-05-05-astraliteheart-purplesmartai-mylittleponygpt215b-twilightsparkledialogue-torchmojiemotionalvoicecontrol.jpg
2021-05-05-astraliteheart-purplesmartai-mylittleponygpt215b-twilightsparkledialogue-voicedialogue.png
2021-05-05-astraliteheart-purplesmartai-mylittleponygpt215b-gpuloadgraph.png
2020-02-03-gpt21.5b-videogamewalkthrough-model-174925-samples-topp090.txt
OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’
https://mega.nz/file/WPQVFAJQ#mj2bP7Eba00aAaTGm_qqHW4JScGo5sC-F00pJJXe6Zg
Wikipedia Bibliography: