- See Also
-
Links
- “Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023
- “When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
- “Accelerating LLM Inference With Staged Speculative Decoding”, Spector & Re 2023
- “Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023
- “Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023
- “Investigating the Existence of "Secret Language” in Language Models”, Wang et al 2023
- “Stay on Topic With Classifier-Free Guidance”, Sanchez et al 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
- “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
- “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023
- “Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023
- “MEGABYTE: Predicting Million-byte Sequences With Multiscale Transformers”, Yu et al 2023
- “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023
- “Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
- “How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in a Pre-trained Language Model”, Hanna et al 2023
- “Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023
- “Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023
- “Tractable Control for Autoregressive Language Generation”, Zhang et al 2023
- “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023
- “How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023
- “8 Things to Know about Large Language Models”, Bowman 2023
- “BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
- “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
- “Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023
- “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023
- “Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
- “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Zhu et al 2023
- “A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023
- “BiLD: Big Little Transformer Decoder”, Kim et al 2023
- “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Sudhakaran et al 2023
- “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, Qin et al 2023
- “Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023
- “Co-Writing With Opinionated Language Models Affects Users’ Views”, Jakesch et al 2023
- “In-Context Retrieval-Augmented Language Models”, Ram et al 2023
- “Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023
- “Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React.”, Tiku et al 2023
- “The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023
- “Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
- “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Bommarito et al 2023
- “InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
- “GPT-3 Takes the Bar Exam”, II & Katz 2022
- “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022
- “Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022
- “Precise Zero-Shot Dense Retrieval without Relevance Labels”, Gao et al 2022
- “Emergent Analogical Reasoning in Large Language Models”, Webb et al 2022
- “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Wiggers 2022
- “Interpreting Neural Networks through the Polytope Lens”, Black et al 2022
- “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
- “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022
- “Galactica: A Large Language Model for Science”, Taylor et al 2022
- “The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022
- “LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022
- “What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022
- “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Frantar et al 2022
- “When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
- “Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022
- “Contrastive Decoding: Open-ended Text Generation As Optimization”, Li et al 2022
- “Contrastive Search Is What You Need For Neural Text Generation”, Su & Collier 2022
- “Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022
- “Language-Conditioned Absolute Unit NNs”, Gwern 2022
- “BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022
- “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022
- “MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
- “Foundation Transformers”, Wang et al 2022
- “Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”, Mao 2022
- “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022
- “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Caucheteux et al 2022
- “Semantic Reconstruction of Continuous Language from Non-invasive Brain Recordings”, Tang et al 2022
- “Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
- “Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022
- “Out of One, Many: Using Language Models to Simulate Human Samples”, Argyle et al 2022
- “Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest”, Hessel et al 2022
- “FP8 Formats for Deep Learning”, Micikevicius et al 2022
- “What Does a Platypus Look Like? Generating Customized Prompts for Zero-shot Image Classification (CuPL)”, Pratt et al 2022
- “Petals: Collaborative Inference and Fine-tuning of Large Models”, Borzunov et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “Using Large Language Models to Simulate Multiple Humans”, Aher et al 2022
-
“
LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022 - “Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022
- “Effidit: Your AI Writing Assistant”, Shi et al 2022
- “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”, Garg et al 2022
- “RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
- “Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, Goldstein et al 2022
- “Language Models Show Human-like Content Effects on Reasoning”, Dasgupta et al 2022
- “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
- “GODEL: Large-Scale Pre-Training for Goal-Directed Dialog”, Peng et al 2022
- “DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, Arora et al 2022
- “Can Foundation Models Talk Causality?”, Willig et al 2022
- “NOAH: Neural Prompt Search”, Zhang et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
- “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
- “NaturalProver: Grounded Mathematical Proof Generation With Language Models”, Welleck et al 2022
- “Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Tirumala et al 2022
- “RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022
- “OPT: Open Pre-trained Transformer Language Models”, Zhang et al 2022
- “Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022
- “What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
- “WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022
- “Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, Geva et al 2022
- “Time Control: Language Modeling via Stochastic Processes”, Wang et al 2022
- “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
- “Vector-quantized Image Modeling With Improved VQGAN”, Yu et al 2022
- “Quantifying and Alleviating Political Bias in Language Models”, Liu et al 2022c
- “Controllable Natural Language Generation With Contrastive Prefixes”, Qian et al 2022
- “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, Min et al 2022
- “Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022
- “Impact of Pretraining Term Frequencies on Few-Shot Reasoning”, Razeghi et al 2022
- “A Contrastive Framework for Neural Text Generation”, Su et al 2022
- “ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
- “InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
- “AdaPrompt: Adaptive Model Training for Prompt-based NLP”, Chen et al 2022
- “Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022
- “Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022
- “LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
- “PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
- “Typical Decoding for Natural Language Generation”, Meister et al 2022
- “Contracts in the Age of Smart Readers”, Arbel & Becher 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
- “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022
- “Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
- “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
- “Memory-assisted Prompt Editing to Improve GPT-3 After Deployment”, Madaan et al 2022
- “A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Zhang et al 2022
- “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022
- “The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022
- “Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, Tu et al 2022
- “Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
- “Learning to Prompt for Continual Learning”, Wang et al 2021
- “Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
- “Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021
- “LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021
- “Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021
- “Linear Algebra With Transformers”, Charton 2021
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021
- “Long-range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021
- “True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021
- “Few-shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021
- “Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
- “ClipCap: CLIP Prefix for Image Captioning”, Mokady et al 2021
- “Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021
- “On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021
- “Attention Approximates Sparse Distributed Memory”, Bricken & Pehlevan 2021
- “What Can a Generative Language Model Answer About a Passage?”, Summers-Stay et al 2021
- “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021
- “An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
- “Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021
- “Fast Model Editing at Scale”, Mitchell et al 2021
- “Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021
- “A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021
- “Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021
- “Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
- “Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021
- “Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021
- “What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Kim et al 2021
- “Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
- “General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
- “An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021
- “Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021
- “Scarecrow: A Framework for Scrutinizing Machine Text”, Dou et al 2021
- “Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
- “Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021
- “LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
- “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Götz et al 2021
- “RASP: Thinking Like Transformers”, Weiss et al 2021
- “GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
- “LHOPT: A Generalizable Approach to Learning Optimizers”, Almeida et al 2021
- “Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021
- “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
- “A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, Heilbron et al 2021
- “Naver Unveils First ‘hyperscale’ AI Platform”, Jae-eun 2021
- “Machine Learning Scaling”, Gwern 2021
- “Scaling Laws for Language Transfer Learning”, Kim 2021
- “GPT Understands, Too”, Liu et al 2021
- “How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021
- “Pretrained Transformers As Universal Computation Engines”, Lu et al 2021
- “Language Models Have a Moral Dimension”, Schramowski et al 2021
- “Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
- “Proof Artifact Co-training for Theorem Proving With Language Models”, Han et al 2021
- “Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021
- “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, Lazaridou et al 2021
- “MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021
- “Scaling Laws for Transfer”, Hernandez et al 2021
- “Apparently ‘what Ho’ Is a Corruption Of…”, Marguerite 2021
- “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021
- “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021
- “Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
- “Making Pre-trained Language Models Better Few-shot Learners”, Gao et al 2020
- “Extracting Training Data from Large Language Models”, Carlini et al 2020
- “Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020
- “CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Zhang et al 2020
- “Scaling Laws for Autoregressive Generative Modeling”, Henighan et al 2020
- “NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, Lu et al 2020
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
- “Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
- “Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020
- “The Neural Architecture of Language: Integrative Reverse-engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020
- “RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020
- “GPT-3: Its Nature, Scope, Limits, and Consequences”, Floridi & Chiriatti 2020
- “A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation”, Nadeem et al 2020
- “GeDi: Generative Discriminator Guided Sequence Generation”, Krause et al 2020
- “Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020
- “MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
- “Learning to Summarize from Human Feedback”, Stiennon et al 2020
- “Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020
- “Aligning AI With Shared Human Values”, Hendrycks et al 2020
- “The Chess Transformer: Mastering Play Using Generative Language Models”, Noever et al 2020
- “Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020
- “Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, Gwern 2020
- “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
- “OpenAI API Beta Homepage”, OpenAI 2020
- “GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020
- “The Scaling Hypothesis”, Gwern 2020
- “True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-Stay 2020
- “Scaling Laws from the Data Manifold Dimension”, Sharma & Kaplan 2020
- “Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020
- “Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
- “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”, Whalen 2020
- “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020
- “Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
- “Reducing Non-Normative Text Generation from Language Models”, Peng et al 2020
- “Scaling Laws for Neural Language Models”, Kaplan et al 2020
- “What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
- “Reformer: The Efficient Transformer”, Kitaev et al 2020
- “Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
- “Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
- “Controlling Text Generation With Plug and Play Language Models”, Liu et al 2019
- “Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019
- “AI Dungeon 2”, Walton 2019
- “How Can We Know What Language Models Know?”, Jiang et al 2019
- “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
- “GPT-2: 1.5B Release”, Solaiman et al 2019
- “Release Strategies and the Social Impacts of Language Models”, Solaiman et al 2019
- “DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation”, Zhang et al 2019
- “Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019
- “GPT-2 Folk Music”, Branwen & Presser 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
- “Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
- “Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
- “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Shoeybi et al 2019
- “Lm-human-preferences”, Ziegler et al 2019
- “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019
- “How To Make Custom AI-Generated Text With GPT-2”, Woolf 2019
- “Language Modelling State-of-the-art Leaderboards”, paperswithcode.com 2019
- “Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019
- “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Gokaslan & Cohen 2019
- “GPT-2: 6-Month Follow-Up”, OpenAI 2019
- “Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
- “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
- “Neural Text Generation With Unlikelihood Training”, Welleck et al 2019
- “Addendum: Evaluation of My Model”, Leahy 2019
- “Replicating GPT-2-1.5B”, Leahy 2019
- “GROVER: Defending Against Neural Fake News”, Zellers et al 2019
- “MuseNet: a Deep Neural Network That Can Generate 4-minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles”, Payne 2019
- “Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
- “The Curious Case of Neural Text Degeneration”, Holtzman et al 2019
- “Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019
- “LM Explorer (alpha)”, Intelligence 2019
- “GPT-2 As Step Toward General Intelligence”, Alexander 2019
- “Better Language Models and Their Implications”, Radford et al 2019
- “Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
- “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
- “Talk To Transformer”, King 2019
- “Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
- “Universal Transformers”, Dehghani et al 2018
- “Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
- “GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
- “GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018
- “Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)
- “Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017
- “Research Ideas”, Gwern 2017
- “Design a Role-playing Game Using 200 Words or Less.”
- “AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”
- “Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]”
- “OpenAI API Alchemy: Emoji Storytelling 🤖”
- “Transformers As Variational Autoencoders”
- “Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”
- “Deep Learning for Assisting the Process of Music Composition (part 3)”
- “Using GPT-3 to Explain Jokes”
- “Homepage of Paul F. Christiano”, Christiano 2023
- “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-edge Machine Learning Research With Free Cloud TPUs”, TRC 2023
- “Meditations on Moloch”
- “Humans Who Are Not Concentrating Are Not General Intelligences”
- nickwalton00
- “This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”
- “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
- “Interpreting GPT: the Logit Lens”
- “A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below”
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023
“Sparse Autoencoders Find Highly Interpretable Features in Language Models”
“When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
“When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale”
“Accelerating LLM Inference With Staged Speculative Decoding”, Spector & Re 2023
“Accelerating LLM Inference with Staged Speculative Decoding”
“Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023
“Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023
“Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models”
“Investigating the Existence of "Secret Language” in Language Models”, Wang et al 2023
[“Investigating the Existence of "Secret Language” in Language Models”](https://arxiv.org/abs/2307.12507 “‘Investigating the Existence of “Secret Language” in Language Models’, et al 2023 ”){.link-annotated ..include-annotation .include-replace-container}
“Stay on Topic With Classifier-Free Guidance”, Sanchez et al 2023
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”
“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”
“DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023
“DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”
“Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023
“Memorization for Good: Encryption with Autoregressive Language Models”
“MEGABYTE: Predicting Million-byte Sequences With Multiscale Transformers”, Yu et al 2023
“MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers”
“Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023
“Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”
“Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
“Finding Neurons in a Haystack: Case Studies with Sparse Probing”
“How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in a Pre-trained Language Model”, Hanna et al 2023
“Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023
“Emergent and Predictable Memorization in Large Language Models”
“Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023
“Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”
“Tractable Control for Autoregressive Language Generation”, Zhang et al 2023
“Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023
“Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study”
“How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023
“How Large-Language Models Can Revolutionize Military Planning”
“8 Things to Know about Large Language Models”, Bowman 2023
“BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
“Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
“Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023
“Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023
“Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
“Rewarding Chatbots for Real-World Engagement with Millions of Users”
“SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Zhu et al 2023
“SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks”
“A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023
“A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT”
“BiLD: Big Little Transformer Decoder”, Kim et al 2023
“MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Sudhakaran et al 2023
“MarioGPT: Open-Ended Text2Level Generation through Large Language Models”
“Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, Qin et al 2023
“Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”
“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023
“Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×”
“Co-Writing With Opinionated Language Models Affects Users’ Views”, Jakesch et al 2023
“Co-Writing with Opinionated Language Models Affects Users’ Views”
“In-Context Retrieval-Augmented Language Models”, Ram et al 2023
“Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023
“Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React.”, Tiku et al 2023
“The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023
“Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
“Rock Guitar Tablature Generation via Natural Language Processing”
“GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Bommarito et al 2023
“GPT-3 as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”
“GPT-3 Takes the Bar Exam”, II & Katz 2022
“A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022
“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022
“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers”
“Precise Zero-Shot Dense Retrieval without Relevance Labels”, Gao et al 2022
“Precise Zero-Shot Dense Retrieval without Relevance Labels”
“Emergent Analogical Reasoning in Large Language Models”, Webb et al 2022
“Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Wiggers 2022
“Harvey, which uses AI to answer legal questions, lands cash from OpenAI”
“Interpreting Neural Networks through the Polytope Lens”, Black et al 2022
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”
“InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022
“InstructPix2Pix: Learning to Follow Image Editing Instructions”
“Galactica: A Large Language Model for Science”, Taylor et al 2022
“The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022
“LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022
“LMentry: A Language Model Benchmark of Elementary Language Tasks”
“What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022
“What is my math transformer doing? – 3 results on interpretability and generalization”
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Frantar et al 2022
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”
“When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
“Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022
“Contrastive Decoding: Open-ended Text Generation As Optimization”, Li et al 2022
“Contrastive Decoding: Open-ended Text Generation as Optimization”
“Contrastive Search Is What You Need For Neural Text Generation”, Su & Collier 2022
“Contrastive Search Is What You Need For Neural Text Generation”
“Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022
“Language-Conditioned Absolute Unit NNs”, Gwern 2022
“BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022
“BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”
“Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022
“Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”
“MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
“Foundation Transformers”, Wang et al 2022
“Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”, Mao 2022
“Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”
“Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022
“Ask Me Anything (AMA): A simple strategy for prompting language models”
“Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Caucheteux et al 2022
“Deep language algorithms predict semantic comprehension from brain activity”
“Semantic Reconstruction of Continuous Language from Non-invasive Brain Recordings”, Tang et al 2022
“Semantic reconstruction of continuous language from non-invasive brain recordings”
“Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
“Sparrow: Improving alignment of dialogue agents via targeted human judgements”
“Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022
“Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators”
“Out of One, Many: Using Language Models to Simulate Human Samples”, Argyle et al 2022
“Out of One, Many: Using Language Models to Simulate Human Samples”
“Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest”, Hessel et al 2022
“FP8 Formats for Deep Learning”, Micikevicius et al 2022
“What Does a Platypus Look Like? Generating Customized Prompts for Zero-shot Image Classification (CuPL)”, Pratt et al 2022
“Petals: Collaborative Inference and Fine-tuning of Large Models”, Borzunov et al 2022
“Petals: Collaborative Inference and Fine-tuning of Large Models”
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”
“Using Large Language Models to Simulate Multiple Humans”, Aher et al 2022
“LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022
“LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”
“Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022
“Effidit: Your AI Writing Assistant”, Shi et al 2022
“What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”, Garg et al 2022
“What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”
“RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
“Correspondence between the Layered Structure of Deep Language Models and Temporal Structure of Natural Language Processing in the Human Brain”, Goldstein et al 2022
“Language Models Show Human-like Content Effects on Reasoning”, Dasgupta et al 2022
“Language models show human-like content effects on reasoning”
“LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
“LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action”
“GODEL: Large-Scale Pre-Training for Goal-Directed Dialog”, Peng et al 2022
“DIRECTOR: Generator-Classifiers For Supervised Language Modeling”, Arora et al 2022
“DIRECTOR: Generator-Classifiers For Supervised Language Modeling”
“Can Foundation Models Talk Causality?”, Willig et al 2022
“NOAH: Neural Prompt Search”, Zhang et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”
“FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Dao et al 2022
“FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”
“Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
“Quark: Controllable Text Generation with Reinforced Unlearning”
“NaturalProver: Grounded Mathematical Proof Generation With Language Models”, Welleck et al 2022
“NaturalProver: Grounded Mathematical Proof Generation with Language Models”
“Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Tirumala et al 2022
“Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”
“RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022
“RankGen: Improving Text Generation with Large Ranking Models”
“OPT: Open Pre-trained Transformer Language Models”, Zhang et al 2022
“Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022
“What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
“What Language Model to Train if You Have One Million GPU Hours?”
“WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022
“WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models”
“Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”, Geva et al 2022
“Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space”
“Time Control: Language Modeling via Stochastic Processes”, Wang et al 2022
“Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022
“Shared computational principles for language processing in humans and deep language models”
“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
“InstructGPT: Training language models to follow instructions with human feedback”
“Vector-quantized Image Modeling With Improved VQGAN”, Yu et al 2022
“Quantifying and Alleviating Political Bias in Language Models”, Liu et al 2022c
“Quantifying and alleviating political bias in language models”
“Controllable Natural Language Generation With Contrastive Prefixes”, Qian et al 2022
“Controllable Natural Language Generation with Contrastive Prefixes”
“Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, Min et al 2022
“Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”
“Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022
“Brains and algorithms partially converge in natural language processing”
“Impact of Pretraining Term Frequencies on Few-Shot Reasoning”, Razeghi et al 2022
“Impact of Pretraining Term Frequencies on Few-Shot Reasoning”
“A Contrastive Framework for Neural Text Generation”, Su et al 2022
“ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
“InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
“InPars: Data Augmentation for Information Retrieval using Large Language Models”
“AdaPrompt: Adaptive Model Training for Prompt-based NLP”, Chen et al 2022
“Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”
“LID: Pre-Trained Language Models for Interactive Decision-Making”, Li et al 2022
“LID: Pre-Trained Language Models for Interactive Decision-Making”
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”
“Typical Decoding for Natural Language Generation”, Meister et al 2022
“Contracts in the Age of Smart Readers”, Arbel & Becher 2022
“Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
“Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022
“Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
“Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”
“Memory-assisted Prompt Editing to Improve GPT-3 After Deployment”, Madaan et al 2022
“Memory-assisted prompt editing to improve GPT-3 after deployment”
“A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Zhang et al 2022
“A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models”
“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022
“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”
“The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022
“Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, Tu et al 2022
“Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”
“Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
“Learning to Prompt for Continual Learning”, Wang et al 2021
“Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”
“Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021
“Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”
“LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021
“Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021
“Improving language models by retrieving from trillions of tokens”
“Linear Algebra With Transformers”, Charton 2021
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“A General Language Assistant as a Laboratory for Alignment”
“Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021
“Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic”
“Long-range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021
“Long-range and hierarchical language predictions in brains and algorithms”
“True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021
“True Few-Shot Learning with Prompts—A Real-World Perspective”
“Few-shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021
“Mapping Language Models to Grounded Conceptual Spaces”, Patel & Pavlick 2021
“ClipCap: CLIP Prefix for Image Captioning”, Mokady et al 2021
“Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021
“Evaluating Distributional Distortion in Neural Language Modeling”
“On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021
“On Transferability of Prompt Tuning for Natural Language Understanding”
“Attention Approximates Sparse Distributed Memory”, Bricken & Pehlevan 2021
“What Can a Generative Language Model Answer About a Passage?”, Summers-Stay et al 2021
“What Can a Generative Language Model Answer About a Passage?”
“CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021
“CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”
“An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
“An Explanation of In-context Learning as Implicit Bayesian Inference”
“Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021
“Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”
“Fast Model Editing at Scale”, Mitchell et al 2021
“Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021
“Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”
“A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021
“Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021
“Towards a Unified View of Parameter-Efficient Transfer Learning”
“Relating Neural Text Degeneration to Exposure Bias”, Chiang & Chen 2021
“Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021
“Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021
“Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Kim et al 2021
“Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021
“Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization”
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
“General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
“An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021
“An Empirical Exploration in Quality Filtering of Text Data”
“Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021
“Scarecrow: A Framework for Scrutinizing Machine Text”, Dou et al 2021
“Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
“Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021
“Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models”
“LoRA: Low-Rank Adaptation of Large Language Models”, Hu et al 2021
“Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Götz et al 2021
“RASP: Thinking Like Transformers”, Weiss et al 2021
“GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI 2021
“LHOPT: A Generalizable Approach to Learning Optimizers”, Almeida et al 2021
“Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021
“Anthropic raises $124 million to build more reliable, general AI systems”
“ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
“ByT5: Towards a token-free future with pre-trained byte-to-byte models”
“A Hierarchy of Linguistic Predictions during Natural Language Comprehension”, Heilbron et al 2021
“A hierarchy of linguistic predictions during natural language comprehension”
“Naver Unveils First ‘hyperscale’ AI Platform”, Jae-eun 2021
“Machine Learning Scaling”, Gwern 2021
“Scaling Laws for Language Transfer Learning”, Kim 2021
“GPT Understands, Too”, Liu et al 2021
“How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021
“Pretrained Transformers As Universal Computation Engines”, Lu et al 2021
“Language Models Have a Moral Dimension”, Schramowski et al 2021
“Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021
“Learning Chess Blindfolded: Evaluating Language Models on State Tracking”
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
“Investigating the Limitations of the Transformers with Simple Arithmetic Tasks”
“Proof Artifact Co-training for Theorem Proving With Language Models”, Han et al 2021
“Proof Artifact Co-training for Theorem Proving with Language Models”
“Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021
“Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration”
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, Lazaridou et al 2021
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”
“MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021
“MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers”
“Scaling Laws for Transfer”, Hernandez et al 2021
“Apparently ‘what Ho’ Is a Corruption Of…”, Marguerite 2021
“Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Li & Liang 2021
“Prefix-Tuning: Optimizing Continuous Prompts for Generation”
“The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
“The Pile: An 800GB Dataset of Diverse Text for Language Modeling”
“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021
“Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets”
“Bot-Adversarial Dialogue for Safe Conversational Agents”, Xu et al 2021
“Making Pre-trained Language Models Better Few-shot Learners”, Gao et al 2020
“Making Pre-trained Language Models Better Few-shot Learners”
“Extracting Training Data from Large Language Models”, Carlini et al 2020
“Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020
“Thinking ahead: prediction in context as a keystone of language in humans and machines”
“CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Zhang et al 2020
“CPM: A Large-scale Generative Chinese Pre-trained Language Model”
“Scaling Laws for Autoregressive Generative Modeling”, Henighan et al 2020
“NeuroLogic Decoding: (Un)supervised Neural Text Generation With Predicate Logic Constraints”, Lu et al 2020
“NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints”
“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
“L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm”
“Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”, Geerlings & Meroño-Peñuela 2020
“Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation”
“Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020
“The Neural Architecture of Language: Integrative Reverse-engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020
“RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020
“RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”
“GPT-3: Its Nature, Scope, Limits, and Consequences”, Floridi & Chiriatti 2020
“A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation”, Nadeem et al 2020
“A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation”
“GeDi: Generative Discriminator Guided Sequence Generation”, Krause et al 2020
“Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020
“Generative Language Modeling for Automated Theorem Proving”
“MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020
“Learning to Summarize from Human Feedback”, Stiennon et al 2020
“Generative Models Are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”, Bahri et al 2020
“Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study”
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Yoshida et al 2020
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”
“Aligning AI With Shared Human Values”, Hendrycks et al 2020
“The Chess Transformer: Mastering Play Using Generative Language Models”, Noever et al 2020
“The Chess Transformer: Mastering Play using Generative Language Models”
“Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020
“Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity”
“Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, Gwern 2020
“Efficient Attention: Breaking The Quadratic Transformer Bottleneck”
“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020
“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
“Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”
“OpenAI API Beta Homepage”, OpenAI 2020
“GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020
“The Scaling Hypothesis”, Gwern 2020
“True_poetry: Poetry Generator by GPT-2 With Meter and Rhyme Constraints”, Summers-Stay 2020
“true_poetry: Poetry generator by GPT-2 with meter and rhyme constraints”
“Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020
“Trading Off Diversity and Quality in Natural Language Generation”
“Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
“Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining”
“OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”, Whalen 2020
“OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”
“Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
“Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”
“Reducing Non-Normative Text Generation from Language Models”, Peng et al 2020
“Reducing Non-Normative Text Generation from Language Models”
“Scaling Laws for Neural Language Models”, Kaplan et al 2020
“What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
“What does BERT dream of? A visual investigation of nightmares in Sesame Street”
“Reformer: The Efficient Transformer”, Kitaev et al 2020
“Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
“Generative Language Modeling for Automated Theorem Proving § Experiments”
“Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric”, Barrio 2020
“Controlling Text Generation With Plug and Play Language Models”, Liu et al 2019
“Controlling Text Generation with Plug and Play Language Models”
“Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019
“Plug and Play Language Models: A Simple Approach to Controlled Text Generation”
“AI Dungeon 2”, Walton 2019
“How Can We Know What Language Models Know?”, Jiang et al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”
“GPT-2: 1.5B Release”, Solaiman et al 2019
“Release Strategies and the Social Impacts of Language Models”, Solaiman et al 2019
“Release Strategies and the Social Impacts of Language Models”
“DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation”, Zhang et al 2019
“DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation”
“Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019
“Generalization through Memorization: Nearest Neighbor Language Models”
“GPT-2 Folk Music”, Branwen & Presser 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior”
“Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
“Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
“Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Shoeybi et al 2019
“Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”
“Lm-human-preferences”, Ziegler et al 2019
“CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019
“CTRL: A Conditional Transformer Language Model For Controllable Generation”
“How To Make Custom AI-Generated Text With GPT-2”, Woolf 2019
“Language Modelling State-of-the-art Leaderboards”, paperswithcode.com 2019
“Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019
“Smaller, faster, cheaper, lighter: Introducing DistilGPT, a distilled version of GPT”
“OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Gokaslan & Cohen 2019
“GPT-2: 6-Month Follow-Up”, OpenAI 2019
“Universal Adversarial Triggers for Attacking and Analyzing NLP”, Wallace et al 2019
“Universal Adversarial Triggers for Attacking and Analyzing NLP”
“MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, ADLR 2019
“MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”
“Neural Text Generation With Unlikelihood Training”, Welleck et al 2019
“Addendum: Evaluation of My Model”, Leahy 2019
“Replicating GPT-2-1.5B”, Leahy 2019
“GROVER: Defending Against Neural Fake News”, Zellers et al 2019
“MuseNet: a Deep Neural Network That Can Generate 4-minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles”, Payne 2019
“Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
“The Curious Case of Neural Text Degeneration”, Holtzman et al 2019
“Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019
“Smart Vet: Autocompleting Sentences in Veterinary Medical Records”
“LM Explorer (alpha)”, Intelligence 2019
“GPT-2 As Step Toward General Intelligence”, Alexander 2019
“Better Language Models and Their Implications”, Radford et al 2019
“Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”
“Talk To Transformer”, King 2019
“Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
“Music Transformer: Generating Music with Long-Term Structure”
“Universal Transformers”, Dehghani et al 2018
“Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
“GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
“GPT-1: Improving Language Understanding with Unsupervised Learning”
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications”
“GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018
“GPT-1: Improving Language Understanding by Generative Pre-Training”
“Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)
“Deep reinforcement learning from human preferences § Appendix A.2: Atari”
“Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017
“Research Ideas”, Gwern 2017
“Design a Role-playing Game Using 200 Words or Less.”
“AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”
“Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]”
“OpenAI API Alchemy: Emoji Storytelling 🤖”
“Transformers As Variational Autoencoders”
“Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”
“Deep Learning for Assisting the Process of Music Composition (part 3)”
“Deep learning for assisting the process of music composition (part 3)”
“Using GPT-3 to Explain Jokes”
“Homepage of Paul F. Christiano”, Christiano 2023
“TensorFlow Research Cloud (TRC): Accelerate Your Cutting-edge Machine Learning Research With Free Cloud TPUs”, TRC 2023
“Meditations on Moloch”
“Humans Who Are Not Concentrating Are Not General Intelligences”
“Humans Who Are Not Concentrating Are Not General Intelligences”
nickwalton00
“This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”
“This is the OpenAI API. It makes spookily good twitter bots. 13⁄10 would retweet”
“AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
“AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
“Interpreting GPT: the Logit Lens”
“A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
language-models
language-models, large-models, transformer, pre-trained, replication
Language-Modeling
Language-Generation
Miscellaneous
-
/doc/ai/nn/transformer/gpt/2023-qin-figure1-chatgptvsgpt35on20nlpdatasets.png
-
/doc/ai/nn/transformer/gpt/2023-bommarito-figure2-progressofgpt3overtimeoncpaaccountingexam.png
-
/doc/ai/nn/transformer/gpt/2023-bommarito-figure1-gpt3cpaaccountingexamperformancebyexamsection.png
-
/doc/ai/nn/transformer/gpt/2022-08-06-gwern-meme-netflixliegirl-studyingdeeplearningscaling.png
-
/doc/ai/nn/transformer/gpt/2022-05-22-gwern-meme-tintinwhataweekhuh-2ndanniversaryofgpt3paper.png
-
/doc/ai/nn/transformer/gpt/2022-bommarito-figure2-increaseofgpt3modelaccuracyonbarexambysize.png
-
/doc/ai/nn/transformer/gpt/2022-bommarito-figure1-gpt3performanceonbarexambycategory.png
-
/doc/ai/nn/transformer/gpt/2021-05-25-naver-hyperclova-computescaling0137bto82b.png
-
/doc/ai/nn/transformer/gpt/2021-01-11-gwern-meme-dogbarkcanthurtyou-aiscaling.jpg
-
/doc/ai/nn/transformer/gpt/2021-nogueira-figure1-additionperformanceofnumberorthographies.png
-
/doc/ai/nn/transformer/gpt/2021-kim-figure5-transferfromenglishtochinesespanishgerman.png
-
/doc/ai/nn/transformer/gpt/2021-kim-figure4-datatransferfromenglishtochinese.png
-
/doc/ai/nn/transformer/gpt/2021-hernandez-transferlearning-figure2-transferscaling.png
-
/doc/ai/nn/transformer/gpt/2021-dou-figure4-errorsbydecodingsamplingstrategyhyperparameters.png
-
/doc/ai/nn/transformer/gpt/2021-dou-figure3-errorsbytype.png
-
/doc/ai/nn/transformer/gpt/2021-dou-figure2-errorsbymodel.png
-
/doc/ai/nn/transformer/gpt/2021-almeida-figure2-lhoptgpt3hyperparametertuningscalinglaw.png
-
/doc/ai/nn/transformer/gpt/2020-07-19-oceaninthemiddleofanisland-gpt3-chinesepoetrytranslation.png
-
/doc/ai/nn/transformer/gpt/2020-06-21-openai-beta-gpt3-playgroundui.png
-
/doc/ai/nn/transformer/gpt/2020-06-18-karpathy-expandingbrainmeme-gpt3metalearning.jpg
-
/doc/ai/nn/transformer/gpt/2020-04-01-gwern-gpt2-5k-midi-training.png
-
/doc/ai/nn/transformer/gpt/2020-02-03-gpt21.5b-videogamewalkthrough-model-174925-samples-topp090.txt
-
/doc/ai/nn/transformer/gpt/2020-02-03-gpt21.5b-archiveofourownao3-model-510427-samples-topp090.txt
-
/doc/ai/nn/transformer/gpt/2020-01-20-gwern-gpt2-25k-midi-training.png
-
/doc/ai/nn/transformer/gpt/2020-zhang-figure1-thelikelihoodtrap.png
-
/doc/ai/nn/transformer/gpt/2020-nadeem-figure1-gpt2samplingqualityvsdiversity.png
-
/doc/ai/nn/transformer/gpt/2020-kaplan-figure7-rnnsvstransformers.png
-
/doc/ai/nn/transformer/gpt/2020-kaplan-figure15-projectingscaling.png
-
/doc/ai/nn/transformer/gpt/2020-kaplan-figure1-dlscaling.png
-
/doc/ai/nn/transformer/gpt/2020-kaplan-appendix1-summaryofneurallanguagemodelscalingpowerlaws.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-table1-autoregressivemodelsscalingpowerlaws.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure31-qandamodelscaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure3-domainmodelsizescaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure2-universalmodelsizescaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure11-pretrainingimageclassificationscaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure1-scalingacrossdomains.png
-
/doc/ai/nn/transformer/gpt/2020-hendrycks-figure1b-gpt3-qascaling.png
-
/doc/ai/nn/transformer/gpt/2020-brown-gpt3-figure13-meanperformancescalingcurve.png
-
/doc/ai/nn/transformer/gpt/2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.png
-
/doc/ai/nn/transformer/gpt/2020-brown-figure31-gpt3scaling.png
-
/doc/ai/nn/transformer/gpt/2020-bostrom-unigramlm-figure1-unigramlmvsbpe.png
-
/doc/ai/nn/transformer/gpt/2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png
-
/doc/ai/nn/transformer/gpt/2019-12-17-gwern-gpt2-preferencelearning-abc-terminal.png
-
/doc/ai/nn/transformer/gpt/2019-12-16-gwern-gpt2-15b-poetry-tensorboard-100tputraining.png
-
/doc/ai/nn/transformer/gpt/2019-12-13-gwern-gpt2-preferencelearning-abc-combinedmodel-halfbounce.png
-
/doc/ai/nn/transformer/gpt/2019-12-13-gwern-gpt2-15b-poetry-tensorboard-97tputraining.png
-
/doc/ai/nn/transformer/gpt/2019-12-12-gwern-gpt2-abc-score-polkaebbbab.png
-
/doc/ai/nn/transformer/gpt/2019-11-19-gwern-gpt2-15b-poetry-tensorboard-1tputraining.jpg
-
/doc/ai/nn/transformer/gpt/2019-11-07-amodei-aiandcompute-twodistincteras-gpt3modified.png
-
/doc/ai/nn/transformer/gpt/2019-ziegler-preferencelearning-figure1-architecture.png
-
/doc/ai/nn/transformer/gpt/2019-radford-figure4-gpt2validationloss.png
-
/doc/ai/nn/transformer/gpt/2019-openai-gpt2-demo-recyclingtextsample.png
-
/doc/ai/nn/transformer/gpt/2019-keskar-table7-datasetsandcontrolcodesmetadata.png
-
/doc/ai/nn/transformer/gpt/2019-keskar-table2-ctrltextsamplesusingonlymetadatawithoutaprompt.png
-
/doc/ai/nn/transformer/gpt/2018-huang-magenta-musictransformer-attentionvisualization.png
-
https://analyticsindiamag.com/when-chatgpt-attempted-upsc-exam/
-
https://blog.research.google/2017/08/transformer-novel-neural-network.html
-
https://colab.research.google.com/drive/1c6VccMPsOMAUQCKU4BVDRd5Y32qkozmK
-
https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf
-
https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion
-
https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf
-
https://nautil.us/your-next-new-best-friend-might-be-a-robot-235779/
-
https://soundcloud.com/seaandsailor/sets/char-rnn-composes-irish-folk-music
-
https://techtualist.substack.com/p/i-wrote-a-script-for-gpt-3-to-take
-
https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release
-
https://twitter.com/BlancheMinerva/status/1662521904727756801
-
https://twitter.com/OfficialLoganK/status/1664476604658069511
-
https://twitter.com/RiversHaveWings/status/1459646450275553285
-
https://twitter.com/mathemagic1an/status/1595410144522813440
-
https://www.alignmentforum.org/posts/rtEtTybuCcDWLk7N9/ama-conjecture-a-new-alignment-startup
-
https://www.forefront.ai/blog-posts/how-to-fine-tune-gpt-neox
-
https://www.freaktakes.com/p/the-past-and-present-of-computer
-
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
-
https://www.lesswrong.com/posts/EzuBSASuui5qekhLA/assessing-alephalphas-multimodal-model
-
https://www.lesswrong.com/posts/PDLfpRwSynu73mxGw/basic-facts-about-language-model-internals-1
-
https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee
-
https://www.lesswrong.com/posts/a3FuA7fGgpTQ7mX3W/is-gpt3-a-good-rationalist-instructgpt3-2-2
-
https://www.lesswrong.com/posts/jfq2BH5kfQqu2vYv3/we-are-conjecture-a-new-alignment-research-startup
-
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse-due-to-rlhf
-
https://www.lesswrong.com/posts/thePw6qdyabD8XR4y/interpreting-openai-s-whisper
-
https://www.lesswrong.com/posts/yZb5eFvDoaqB337X5/investigating-causal-understanding-in-llms
-
https://www.lesswrong.com/posts/ydeaHqDPJ5REJWvat/a-one-question-turing-test-for-gpt-3
-
https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html
-
https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/
-
https://www.reddit.com/r/GPT3/comments/ra6nk4/had_gpt3_generate_the_onion_headlines/
-
https://www.reddit.com/r/GPT3/comments/tgud2t/my_new_favorite_thing_is_making_gpt3_create/
-
https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/
-
https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/
Link Bibliography
-
https://arxiv.org/abs/2306.17806#eleutherai
: “Stay on Topic With Classifier-Free Guidance”, Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman -
https://arxiv.org/abs/2306.07567
: “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Fabien Roger -
https://arxiv.org/abs/2305.10429#google
: “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, -
https://www.forbes.com/sites/alexkonrad/2023/05/02/inflection-ai-ex-deepmind-launches-pi-chatbot/
: “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Alex Konrad -
https://arxiv.org/abs/2304.06762#nvidia
: “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, -
https://warontherocks.com/2023/04/how-large-language-models-can-revolutionize-military-planning/
: “How Large-Language Models Can Revolutionize Military Planning”, Benjamin Jensen, Dan Tadross -
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and
: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org -
https://osf.io/5uxra/
: “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Yuki Kataoka -
https://arxiv.org/abs/2302.13939
: “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian -
https://arxiv.org/abs/2302.05981
: “MarioGPT: Open-Ended Text2Level Generation through Large Language Models”, Shyam Sudhakaran, Miguel González-Duque, Claire Glanois, Matthias Freiberger, Elias Najarro, Sebastian Risi -
https://arxiv.org/abs/2302.06476
: “Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang -
https://arxiv.org/abs/2302.00560
: “Co-Writing With Opinionated Language Models Affects Users’ Views”, Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, Mor Naaman -
https://arxiv.org/abs/2301.04408
: “GPT-3 As Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities”, Jillian Bommarito, Michael Bommarito, Daniel Martin Katz, Jessica Katz -
https://arxiv.org/abs/2212.14402
: “GPT-3 Takes the Bar Exam”, Michael Bommarito II, Daniel Martin Katz -
https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html
: “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Nico Grant, Cade Metz -
https://arxiv.org/abs/2212.10496
: “Precise Zero-Shot Dense Retrieval without Relevance Labels”, Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan -
https://techcrunch.com/2022/11/23/harvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai/
: “Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Kyle Wiggers -
https://arxiv.org/abs/2211.10438
: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han -
https://arxiv.org/abs/2211.09800
: “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Tim Brooks, Aleksander Holynski, Alexei A. Efros -
https://arxiv.org/abs/2211.09085#facebook
: “Galactica: A Large Language Model for Science”, -
https://arxiv.org/abs/2210.17323
: “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh -
https://arxiv.org/abs/2210.15097
: “Contrastive Decoding: Open-ended Text Generation As Optimization”, -
https://arxiv.org/abs/2210.14140
: “Contrastive Search Is What You Need For Neural Text Generation”, Yixuan Su, Nigel Collier -
https://arxiv.org/abs/2210.13673#nvidia
: “Evaluating Parameter Efficient Learning for Generation”, -
aunn-papyrus
: “Language-Conditioned Absolute Unit NNs”, Gwern -
https://arxiv.org/abs/2210.10341#microsoft
: “BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”, Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu -
https://arxiv.org/abs/2210.15458#google
: “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Passos, Sumit Sanghai -
https://arxiv.org/abs/2210.06423#microsoft
: “Foundation Transformers”, -
https://arxiv.org/abs/2210.04243
: “Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”, Huanru Henry Mao -
https://arxiv.org/abs/2210.02441
: “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, -
https://www.nature.com/articles/s41598-022-20460-9
: “Deep Language Algorithms Predict Semantic Comprehension from Brain Activity”, Charlotte Caucheteux, Alexandre Gramfort, Jean-Rémi King -
https://arxiv.org/abs/2209.03320
: “What Does a Platypus Look Like? Generating Customized Prompts for Zero-shot Image Classification (CuPL)”, Sarah Pratt, Rosanne Liu, Ali Farhadi -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2208.01066
: “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”, Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant -
https://arxiv.org/abs/2207.04429
: “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine -
https://arxiv.org/abs/2206.01861#microsoft
: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He -
https://arxiv.org/abs/2205.14135
: “FlashAttention: Fast and Memory-Efficient Exact Attention With IO-Awareness”, Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré -
https://arxiv.org/abs/2205.12910#allen
: “NaturalProver: Grounded Mathematical Proof Generation With Language Models”, Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi -
https://www.nature.com/articles/s41593-022-01026-4
: “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, -
https://arxiv.org/abs/2110.04627#google
: “Vector-quantized Image Modeling With Improved VQGAN”, -
2022-liu-3.pdf
: “Quantifying and Alleviating Political Bias in Language Models”, Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Soroush Vosoughi -
https://arxiv.org/abs/2202.12837#facebook
: “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer -
https://www.nature.com/articles/s42003-022-03036-1
: “Brains and Algorithms Partially Converge in Natural Language Processing”, Charlotte Caucheteux, Jean-Rémi King -
https://arxiv.org/abs/2201.11990#microsoftnvidia
: “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, -
https://swabhs.com/assets/pdf/wanli.pdf#allen
: “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi -
https://arxiv.org/abs/2201.05320#allen
: “CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant -
2022-tu.pdf
: “Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, Sean Tu, Amy Cyphert, Sam Perl -
https://arxiv.org/abs/2112.04426#deepmind
: “Improving Language Models by Retrieving from Trillions of Tokens”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”, -
https://openreview.net/forum?id=gJcEM8sxHK
: “Mapping Language Models to Grounded Conceptual Spaces”, Roma Patel, Ellie Pavlick -
https://arxiv.org/abs/2111.09734
: “ClipCap: CLIP Prefix for Image Captioning”, Ron Mokady, Amir Hertz, Amit H. Bermano -
https://aclanthology.org/2021.mrqa-1.7.pdf
: “What Can a Generative Language Model Answer About a Passage?”, Douglas Summers-Stay, Claire Bonial, Clare Voss -
https://arxiv.org/abs/2111.02570#microsoft
: “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, -
https://arxiv.org/abs/2110.11309
: “Fast Model Editing at Scale”, Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning -
https://arxiv.org/abs/2109.07958
: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Stephanie Lin, Jacob Hilton, Owain Evans -
https://arxiv.org/abs/2109.02593#allen
: “General-Purpose Question-Answering With Macaw”, Oyvind Tafjord, Peter Clark -
https://arxiv.org/abs/2107.01294#allen
: “Scarecrow: A Framework for Scrutinizing Machine Text”, Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A. Smith, Yejin Choi -
https://arxiv.org/abs/2106.09685
: “LoRA: Low-Rank Adaptation of Large Language Models”, Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen -
https://psyarxiv.com/m6s28/
: “Let the Algorithm Speak: How to Use Neural Networks for Automatic Item Generation in Psychological Scale Development”, Friedrich Götz, Rakoen Maertens, Sander van der Linden -
https://arxiv.org/abs/2106.06981
: “RASP: Thinking Like Transformers”, Gail Weiss, Yoav Goldberg, Eran Yahav -
https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/
: “GPT-J-6B: 6B JAX-Based Transformer”, EleutherAI -
https://arxiv.org/abs/2106.00958#openai
: “LHOPT: A Generalizable Approach to Learning Optimizers”, Diogo Almeida, Clemens Winter, Jie Tang, Wojciech Zaremba -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel -
https://m.koreaherald.com/view.php?ud=20210525000824#naver
: “Naver Unveils First ‘hyperscale’ AI Platform”, Kang Jae-eun -
scaling
: “Machine Learning Scaling”, Gwern -
https://arxiv.org/abs/2102.13019
: “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Rodrigo Nogueira, Zhiying Jiang, Jimmy Li -
https://arxiv.org/abs/2102.01951#scaling&org=deepmind
: “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, -
https://arxiv.org/abs/2101.00190
: “Prefix-Tuning: Optimizing Continuous Prompts for Generation”, Xiang Lisa Li, Percy Liang -
https://arxiv.org/abs/2101.00027#eleutherai
: “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, -
https://aclanthology.org/2021.naacl-main.235.pdf#facebook
: “Bot-Adversarial Dialogue for Safe Conversational Agents”, Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, Emily Dinan -
https://arxiv.org/abs/2010.14701#openai
: “Scaling Laws for Autoregressive Generative Modeling”, -
https://arxiv.org/abs/2009.03393#openai
: “Generative Language Modeling for Automated Theorem Proving”, Stanislas Polu, Ilya Sutskever -
https://arxiv.org/abs/2009.03300
: “MMLU: Measuring Massive Multitask Language Understanding”, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt -
attention
: “Efficient Attention: Breaking The Quadratic Transformer Bottleneck”, Gwern -
scaling-hypothesis
: “The Scaling Hypothesis”, Gwern -
https://arxiv.org/abs/2004.10802
: “Scaling Laws from the Data Manifold Dimension”, Utkarsh Sharma, Jared Kaplan -
https://www.newsweek.com/openai-text-generator-gpt-2-video-game-walkthrough-most-tedious-1488334
: “OpenAI Text Generator GPT-2 Creates Video Game Walkthrough for 'Most Tedious Game in History'”, Andrew Whalen -
https://arxiv.org/abs/2001.08361#openai
: “Scaling Laws for Neural Language Models”, -
https://arxiv.org/abs/2001.04451#google
: “Reformer: The Efficient Transformer”, Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya -
https://www.uber.com/blog/pplm/
: “Controlling Text Generation With Plug and Play Language Models”, Rosanne Liu, Sumanth Dathathri, Andrea Madotto, Piero Molino, Jason Yosinski -
https://play.aidungeon.io/main/home
: “AI Dungeon 2”, Nick Walton -
gpt-2-music
: “GPT-2 Folk Music”, Gwern Branwen, Shawn Presser -
https://arxiv.org/abs/1909.08053#nvidia
: “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism”, Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro -
https://arxiv.org/abs/1909.05858#salesforce
: “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher (Salesforce) -
https://minimaxir.com/2019/09/howto-gpt2/
: “How To Make Custom AI-Generated Text With GPT-2”, Max Woolf -
https://medium.com/@vanya_cohen/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc
: “OpenGPT-2: We Replicated GPT-2-1.5b Because You Can Too”, Aaron Gokaslan, Vanya Cohen -
https://nv-adlr.github.io/MegatronLM
: “MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism”, NVID I. A. ADLR -
https://medium.com/@NPCollapse/replicating-gpt2-1-5b-86454a7f26af
: “Replicating GPT-2-1.5B”, Connor Leahy -
https://openai.com/research/musenet
: “MuseNet: a Deep Neural Network That Can Generate 4-minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles”, Christine Payne -
https://openai.com/research/better-language-models
: “Better Language Models and Their Implications”, Alec Radford, Jeffrey Wu, Dario Amodei, Daniela Amodei, Jack Clark, Miles Brundage, Ilya Sutskever -
https://magenta.tensorflow.org/music-transformer
: “Music Transformer: Generating Music With Long-Term Structure”, Cheng-Zhi Anna Huang, Ian Simon, Monica Dinculescu -
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5
: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever -
idea
: “Research Ideas”, Gwern -
https://paulfchristiano.com/
: “Homepage of Paul F. Christiano”, Paul F. Christiano -
https://sites.research.google/trc/
: “TensorFlow Research Cloud (TRC): Accelerate Your Cutting-edge Machine Learning Research With Free Cloud TPUs”, TRC