“‘GPT-2’ Tag”,2019-09-27
![]()
Bibliography for tag
ai/nn/transformer/gpt/2, most recent first: 3 related tags, 114 annotations, & 16 links (parent).
- See Also
- Links
- “Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review”, et al 2024
- “Improving Pretraining Data Using Perplexity Correlations”, et al 2024
- “Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process”, et al 2024
- “Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-To-Strong Generalization”, et al 2024
- “The Scaling Law in Stellar Light Curves”, et al 2024
- “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, et al 2024
- “Grokked Transformers Are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization”, et al 2024
- “Fishing for Magikarp: Automatically Detecting Under-Trained Tokens in Large Language Models”, 2024
- “Test-Time Augmentation to Solve ARC”, 2024
- “Σ-GPTs: A New Approach to Autoregressive Models”, et al 2024
- “Language Imbalance Can Boost Cross-Lingual Generalization”, et al 2024
- “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws”, Allen-2024
- “Do Language Models Plan Ahead for Future Tokens?”, et al 2024
- “Neural Redshift: Random Networks Are Not Random Functions”, et al 2024
- “A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task”, et al 2024
- “Mission: Impossible Language Models”, et al 2024
- “A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity”, et al 2024
- “Language Model Alignment With Elastic Reset”, et al 2023
- “Eliciting Language Model Behaviors Using Reverse Language Models”, et al 2023
- “Controlled Text Generation via Language Model Arithmetic”, et al 2023
- “Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks”, et al 2023
- “Tokenizer Choice For LLM Training: Negligible or Crucial?”, et al 2023
- “What OpenAI Really Wants”, 2023
- “Linearity of Relation Decoding in Transformer Language Models”, et al 2023
- “Accelerating LLM Inference With Staged Speculative Decoding”, 2023
- “Stay on Topic With Classifier-Free Guidance”, et al 2023
- “Likelihood-Based Diffusion Language Models”, 2023
- “Mimetic Initialization of Self-Attention Layers”, 2023
- “How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model”, et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, et al 2023
- “Tractable Control for Autoregressive Language Generation”, et al 2023
- “How Does In-Context Learning Help Prompt Tuning?”, et al 2023
- “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, et al 2023
- “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of AI CPA Capabilities”, et al 2023
- “Geographic and Geopolitical Biases of Language Models”, 2022
- “Structured Prompting: Scaling In-Context Learning to 1,000 Examples”, et al 2022
- “Contrastive Decoding: Open-Ended Text Generation As Optimization”, et al 2022
- “Contrastive Search Is What You Need For Neural Text Generation”, 2022
- “Perfectly Secure Steganography Using Minimum Entropy Coupling”, et al 2022
- “Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, 2022
- “Semantic Reconstruction of Continuous Language from Non-Invasive Brain Recordings”, et al 2022
- “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, et al 2022
- “Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, et al 2022
- “DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, et al 2022
- “Offline RL for Natural Language Generation With Implicit Language Q Learning”, et al 2022
- “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, et al 2022
- “Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, et al 2022
- “AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling”, et al 2022
- “FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, et al 2022
- “Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, et al 2022
- “Time Control: Language Modeling via Stochastic Processes”, et al 2022
- “Quantifying and Alleviating Political Bias in Language Models”, et al 2022c
- “Controllable Natural Language Generation With Contrastive Prefixes”, et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, et al 2022
- “Typical Decoding for Natural Language Generation”, et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, et al 2022
- “ClipCap: CLIP Prefix for Image Captioning”, et al 2021
- “Mapping Language Models to Grounded Conceptual Spaces”, 2021
- “Relating Neural Text Degeneration to Exposure Bias”, 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, et al 2021
- “Scarecrow: A Framework for Scrutinizing Machine Text”, et al 2021
- “LoRA: Low-Rank Adaptation of Large Language Models”, et al 2021
- “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, et al 2021
- “GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
- “LHOPT: A Generalizable Approach to Learning Optimizers”, et al 2021
- “A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, et al 2021
- “Why Are Tar.xz Files 15× Smaller When Using Python’s Tar Library Compared to MacOS Tar?”, 2021
- “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, et al 2021
- “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, 2021
- “Bot-Adversarial Dialogue for Safe Conversational Agents”, et al 2021
- “Extracting Training Data from Large Language Models”, et al 2020
- “NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, et al 2020
- “Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-2020
- “GeDi: Generative Discriminator Guided Sequence Generation”, et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, et al 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, et al 2020
- “The Chess Transformer: Mastering Play Using Generative Language Models”, et al 2020
- “True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-2020
- “TREC CAsT 2019: The Conversational Assistance Track Overview”, et al 2020
- “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for ‘Most Tedious Game in History’”, 2020
- “Reducing Non-Normative Text Generation from Language Models”, et al 2020
- “How Novelists Use Generative Language Models: An Exploratory User Study”, et al 2020
- “Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, 2020
- “Controlling Text Generation With Plug and Play Language Models”, et al 2019
- “AI Dungeon 2”, 2019
- “Release Strategies and the Social Impacts of Language Models”, et al 2019
- “GPT-2: 1.5B Release”, et al 2019
- “GPT-2 Folk Music”, 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, et al 2019
- “Fine-Tuning GPT-2 from Human Preferences”, et al 2019
- “Fine-Tuning Language Models from Human Preferences”, et al 2019
- “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, et al 2019
- “Lm-Human-Preferences”, et al 2019
- “How To Make Custom AI-Generated Text With GPT-2”, 2019
- “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, 2019
- “Universal Adversarial Triggers for Attacking and Analyzing NLP”, et al 2019
- “GPT-2: 6-Month Follow-Up”, OpenAI 2019
- “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
- “Addendum: Evaluation of My Model”, 2019
- “Replicating GPT-2-1.5B”, 2019
- “Unraveling the JPEG: JPEG Images Are Everywhere in Our Digital Lives, but behind the Veil of Familiarity Lie Algorithms That Remove Details That Are Imperceptible to the Human Eye. This Produces the Highest Visual Quality With the Smallest File Size—But What Does That Look Like? Let’s See What Our Eyes Can’t See!”, 2019
- “Some Pretty Impressive Machine-Learning Generated Poetry Courtesy of GPT-2”
- “LM Explorer (alpha)”, 2019
- “GPT-2 As Step Toward General Intelligence”, 2019
- “Language Models Are Unsupervised Multitask Learners”, et al 2019
- “Better Language Models and Their Implications”, et al 2019
- “Talk To Transformer”, 2019
- “Notes on a New Philosophy of Empirical Science”, 2011
- “Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers”, et al 2010
- “Timm S. Mueller”
- “The Difficulties of Text Generation Using Autoregressive Language Models: A Brief Overview”, 2024
- “Let’s Reproduce GPT-2 (1.6B): One 8×H100 Node, 24 Hours, $672”
- “Alec Radford”
- “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-Edge Machine Learning Research With Free Cloud TPUs”, TRC 2024
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography