Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Eliciting Language Model Behaviors using Reverse Language Models
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Linearity of Relation Decoding in Transformer Language Models
Accelerating LLM Inference with Staged Speculative Decoding
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
MarioGPT: Open-Ended Text2Level Generation through Large Language Models
GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities
Structured Prompting: Scaling In-Context Learning to 1,000 Examples
Contrastive Decoding: Open-ended Text Generation as Optimization
Contrastive Search Is What You Need For Neural Text Generation
Perfectly Secure Steganography Using Minimum Entropy Coupling
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
Semantic reconstruction of continuous language from non-invasive brain recordings
Deep language algorithms predict semantic comprehension from brain activity
Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
Offline RL for Natural Language Generation with Implicit Language Q Learning
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling
FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Quantifying and alleviating political bias in language models
Controllable Natural Language Generation with Contrastive Prefixes
LID: Pre-Trained Language Models for Interactive Decision-Making
Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development
A hierarchy of linguistic predictions during natural language comprehension
Why are tar.xz files 15× smaller when using Python’s tar library compared to macOS tar?
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Prefix-Tuning: Optimizing Continuous Prompts for Generation
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation
Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
The Chess Transformer: Mastering Play using Generative Language Models
true_poetry: Poetry generator by GPT-2 with meter and rhyme constraints
TREC CAsT 2019: The Conversational Assistance Track Overview
OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’
Reducing Non-Normative Text Generation from Language Models
How Novelists Use Generative Language Models: An Exploratory User Study
Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric
Controlling Text Generation with Plug and Play Language Models
Release Strategies and the Social Impacts of Language Models
Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Universal Adversarial Triggers for Attacking and Analyzing NLP
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
Unraveling the JPEG: JPEG images are everywhere in our digital lives, but behind the veil of familiarity lie algorithms that remove details that are imperceptible to the human eye. This produces the highest visual quality with the smallest file size—but what does that look like? Let’s see what our eyes can’t see!
Some Pretty Impressive Machine-Learning Generated Poetry Courtesy of GPT-2
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers
The Difficulties of Text Generation Using Autoregressive Language Models: A Brief Overview
Let’s Reproduce GPT-2 (1.6B): One 8×H100 Node, 24 Hours, $672
TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs
2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png
https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce
https://www.aiweirdness.com/d-and-d-character-bios-now-making-19-03-15/
https://www.lesswrong.com/posts/CNPvESPru3XNqsw7A/what-s-up-with-all-the-non-mormons-weirdly-specific
https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms
https://www.reddit.com/r/mlscaling/comments/1d3a793/andrej_karpathy_gpt2_124m_in_llmc_in_90_minutes/
Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
https%253A%252F%252Flab42.global%252Fcommunity-interview-jack-cole%252F.html
https%253A%252F%252Fwww.wired.com%252Fstory%252Fwhat-openai-really-wants%252F.html
https%253A%252F%252Farxiv.org%252Fabs%252F2306.17806%2523eleutherai.html
MarioGPT: Open-Ended Text2Level Generation through Large Language Models
GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities
Contrastive Decoding: Open-ended Text Generation as Optimization
Contrastive Search Is What You Need For Neural Text Generation
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
Deep language algorithms predict semantic comprehension from brain activity
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41598-022-20460-9.html
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FLOTA: An Embarrassingly Simple Method to Mitigate Und-es-ira-ble Properties of Pretrained Language Model Tokenizers
https%253A%252F%252Faclanthology.org%252F2022.acl-short.43.pdf.html
Quantifying and alleviating political bias in language models
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252F2%252F2022-liu-3.pdf.html
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DgJcEM8sxHK.html
https%253A%252F%252Farxiv.org%252Fabs%252F2107.01294%2523allen.html
https%253A%252F%252Farxiv.org%252Fabs%252F2106.09685%2523microsoft.html
Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development
https%253A%252F%252Fosf.io%252Fpreprints%252Fpsyarxiv%252Fm6s28%252F.html
https%253A%252F%252Farankomatsuzaki.wordpress.com%252F2021%252F06%252F04%252Fgpt-j%252F.html
https%253A%252F%252Farxiv.org%252Fabs%252F2106.00958%2523openai.html
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
https%253A%252F%252Farxiv.org%252Fabs%252F2101.00027%2523eleutherai.html
Prefix-Tuning: Optimizing Continuous Prompts for Generation
https%253A%252F%252Faclanthology.org%252F2021.naacl-main.235.pdf%2523facebook.html
TREC CAsT 2019: The Conversational Assistance Track Overview
OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’
https%253A%252F%252Fwww.newsweek.com%252Fopenai-text-generator-gpt-2-video-game-walkthrough-most-tedious-1488334.html
How Novelists Use Generative Language Models: An Exploratory User Study
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Ffiction%252F2020-calderwood.pdf.html
Controlling Text Generation with Plug and Play Language Models
https%253A%252F%252Fwww.uber.com%252Fblog%252Fpplm%252F.html
https%253A%252F%252Fplay.aidungeon.com%252Fmain%252Fhome.html
https%253A%252F%252Fopenai.com%252Fresearch%252Ffine-tuning-gpt-2.html
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
https%253A%252F%252Farxiv.org%252Fabs%252F1909.08053%2523nvidia.html
https%253A%252F%252Fminimaxir.com%252F2019%252F09%252Fhowto-gpt2%252F.html
https%253A%252F%252Fmedium.com%252F%40vanya_cohen%252Fopengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc.html
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
https%253A%252F%252Fmedium.com%252F%40NPCollapse%252Freplicating-gpt2-1-5b-86454a7f26af.html
https%253A%252F%252Fopenai.com%252Findex%252Fbetter-language-models%252F.html
TensorFlow Research Cloud (TRC): Accelerate your cutting-edge machine learning research with free Cloud TPUs
https%253A%252F%252Fsites.research.google%252Ftrc%252F.html
Wikipedia Bibliography: