‘GPT’ tag
- See Also
- Gwern
-
Links
- “Continuous Autoregressive Models With Noise Augmentation Avoid Error Accumulation”, Pasini et al 2024
- “Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?”, Yang et al 2024
- “2:4 Sparse Llama: Smaller Models for Efficient GPU Inference”, Kurtić et al 2024
- “Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?”, Jeong et al 2024
- “Model Equality Testing: Which Model Is This API Serving?”, Gao et al 2024
- “Centaur: a Foundation Model of Human Cognition”, Binz et al 2024
- “Do LLMs Estimate Uncertainty Well in Instruction-Following?”, Heo et al 2024
- “Interpretable Contrastive Monte Carlo Tree Search Reasoning”, Gao et al 2024
- “NGPT: Normalized Transformer With Representation Learning on the Hypersphere”, Loshchilov et al 2024
- “LLM Applications I Want To See”, Constantin 2024
- “Ensemble Everything Everywhere: Multi-Scale Aggregation for Adversarial Robustness”, Fort & Lakshminarayanan 2024
- “Token Erasure As a Footprint of Implicit Vocabulary Items in LLMs”, Feucht et al 2024
- “Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, Porian et al 2024
- “When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, Chang et al 2024
- “Nemotron-4 340B Technical Report”, Adler et al 2024
- “DataComp-LM: In Search of the next Generation of Training Sets for Language Models”, Li et al 2024
- “How Do Large Language Models Acquire Factual Knowledge During Pretraining?”, Chang et al 2024
- “Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs”, Hans et al 2024
- “Discovering Preference Optimization Algorithms With and for Large Language Models”, Lu et al 2024
- “MCTSr: Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine With LLaMA-3-8B”, Zhang et al 2024
- “For Chinese Students, the New Tactic Against AI Checks: More AI”, Qitong 2024
- “MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series”, Zhang et al 2024
- “Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass”, Shen et al 2024
- “SpaceByte: Towards Deleting Tokenization from Large Language Modeling”, Slagle 2024
- “Towards Smaller, Faster Decoder-Only Transformers: Architectural Variants and Their Implications”, Suresh & P 2024
- “Design of Highly Functional Genome Editors by Modeling the Universe of CRISPR-Cas Sequences”, Ruffolo et al 2024
- “From r to Q✱: Your Language Model Is Secretly a Q-Function”, Rafailov et al 2024
- “CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models”, Lee et al 2024
- “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, Chiu et al 2024
- “Training LLMs over Neurally Compressed Text”, Lester et al 2024
- “Reverse Training to Nurse the Reversal Curse”, Golovneva et al 2024
- “Evolutionary Optimization of Model Merging Recipes”, Akiba et al 2024
- “Yi: Open Foundation Models by 01.AI”, Young et al 2024
- “Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, Zhai et al 2024
- “Fast Adversarial Attacks on Language Models In One GPU Minute”, Sadasivan et al 2024
- “Autonomous Data Selection With Language Models for Mathematical Texts”, Zhang et al 2024
- “Grandmaster-Level Chess Without Search”, Ruoss et al 2024
- “Neural Networks Learn Statistics of Increasing Complexity”, Belrose et al 2024
- “Arrows of Time for Large Language Models”, Papadopoulos et al 2024
- “SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Ashkboos et al 2024
- “Excuse Me, Sir? Your Language Model Is Leaking (information)”, Zamir 2024
- “TinyLlama: An Open-Source Small Language Model”, Zhang et al 2024
- “LLaMA Pro: Progressive LLaMA With Block Expansion”, Wu et al 2024
- “Generative AI Is Already Widespread in the Public Sector”, Bright et al 2024
- “Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws”, Sardana & Frankle 2023
- “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Yuan et al 2023
- “Reasons to Reject? Aligning Language Models With Judgments”, Xu et al 2023
- “Generative Multimodal Models Are In-Context Learners”, Sun et al 2023
- “Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning”, Dutta et al 2023
- “Object Recognition As Next Token Prediction”, Yue et al 2023
- “MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, Chen et al 2023
- “Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching”, Campbell et al 2023
- “OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Tong et al 2023
- “Positional Description Matters for Transformers Arithmetic”, Shen et al 2023
- “Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models”, Zhang et al 2023
- “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023
- “Learn Your Tokens: Word-Pooled Tokenization for Language Modeling”, Thawani et al 2023
- “Llemma: An Open Language Model For Mathematics”, Azerbayev et al 2023
- “In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023
- “OSD: Online Speculative Decoding”, Liu et al 2023
- “Let Models Speak Ciphers: Multiagent Debate through Embeddings”, Pham et al 2023
- “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023
- “XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023
- “MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models”, Yu et al 2023
- “Language Modeling Is Compression”, Delétang et al 2023
- “Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023
- “Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023
- “When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
- “Language Reward Modulation for Pretraining Reinforcement Learning”, Adeniji et al 2023
- “ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
- “Studying Large Language Model Generalization With Influence Functions”, Grosse et al 2023
- “Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023
- “Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023
- “Length Generalization in Arithmetic Transformers”, Jelassi et al 2023
- “Are Aligned Neural Networks Adversarially Aligned?”, Carlini et al 2023
- “Improving Long-Horizon Imitation Through Instruction Prediction”, Hejna et al 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
- “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
- “Undetectable Watermarks for Language Models”, Christ et al 2023
- “Improving Language Models With Advantage-Based Offline Policy Gradients”, Baheti et al 2023
- “Accelerating Transformer Inference for Translation via Parallel Decoding”, Santilli et al 2023
- “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023
- “Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023
- “MEGABYTE: Predicting Million-Byte Sequences With Multiscale Transformers”, Yu et al 2023
- “Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
- “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023
- “Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023
- “A Comparative Study between Full-Parameter and LoRA-Based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
- “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023
- “How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023
- “Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling”, Biderman et al 2023
- “8 Things to Know about Large Language Models”, Bowman 2023
- “BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
- “The Quantization Model of Neural Scaling”, Michaud et al 2023
- “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
- “Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023
- “Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
- “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023
- “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Zhu et al 2023
- “A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023
- “BiLD: Big Little Transformer Decoder”, Kim et al 2023
- “Data Selection for Language Models via Importance Resampling”, Xie et al 2023
- “In-Context Retrieval-Augmented Language Models”, Ram et al 2023
- “Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023
- “Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React”, Tiku et al 2023
- “Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
- “InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
- “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022
- “Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022
- “Rethinking the Role of Scale for In-Context Learning: An Interpretability-Based Case Study at 66 Billion Scale”, Bansal et al 2022
- “Interpreting Neural Networks through the Polytope Lens”, Black et al 2022
- “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
- “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022
- “Galactica: A Large Language Model for Science”, Taylor et al 2022
- “Large Language Models Struggle to Learn Long-Tail Knowledge”, Kandpal et al 2022
- “The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022
- “Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022
- “GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers”, Frantar et al 2022
- “What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022
- “When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
- “Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022
- “Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022
- “BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022
- “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022
- “MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
- “Foundation Transformers”, Wang et al 2022
- “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022
- “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022
- “Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
- “Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022
- “FP8 Formats for Deep Learning”, Micikevicius et al 2022
- “Petals: Collaborative Inference and Fine-Tuning of Large Models”, Borzunov et al 2022
-
“
LLM.int8()
: 8-Bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022 - “Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022
- “Effidit: Your AI Writing Assistant”, Shi et al 2022
- “Language Models Show Human-Like Content Effects on Reasoning”, Dasgupta et al 2022
- “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
- “Can Foundation Models Talk Causality?”, Willig et al 2022
- “NOAH: Neural Prompt Search”, Zhang et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
- “RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022
- “Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022
- “What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
- “WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022
- “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022
- “Vector-Quantized Image Modeling With Improved VQGAN”, Yu et al 2022
- “Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022
- “Quantifying Memorization Across Neural Language Models”, Carlini et al 2022
- “A Contrastive Framework for Neural Text Generation”, Su et al 2022
- “AdaPrompt: Adaptive Model Training for Prompt-Based NLP”, Chen et al 2022
- “InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
- “ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
- “Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022
- “Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022
- “PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
- “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022
- “Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
- “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
- “A Survey of Controllable Text Generation Using Transformer-Based Pre-Trained Language Models”, Zhang et al 2022
- “The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022
- “Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021
- “Learning to Prompt for Continual Learning”, Wang et al 2021
- “Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
- “Few-Shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
- “LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021
- “Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021
- “Linear Algebra With Transformers”, Charton 2021
- “Zero-Shot Image-To-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021
- “Long-Range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021
- “True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021
- “Few-Shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021
- “Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021
- “On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021
- “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021
- “Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021
- “Fast Model Editing at Scale”, Mitchell et al 2021
- “Yuan 1.0: Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021
- “Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021
- “A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021
- “Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021
- “Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021
- “What Changes Can Large-Scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-Scale Korean Generative Pretrained Transformers”, Kim et al 2021
- “Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021
- “General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
- “An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021
- “Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021
- “Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
- “Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021
- “RASP: Thinking Like Transformers”, Weiss et al 2021
- “ByT5: Towards a Token-Free Future With Pre-Trained Byte-To-Byte Models”, Xue et al 2021
- “Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021
- “Naver Unveils First ‘Hyperscale’ AI Platform”, Jae-eun 2021
- “Scaling Laws for Language Transfer Learning”, Kim 2021
- “GPT Understands, Too”, Liu et al 2021
- “How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021
- “Pretrained Transformers As Universal Computation Engines”, Lu et al 2021
- “Language Models Have a Moral Dimension”, Schramowski et al 2021
- “Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
- “Proof Artifact Co-Training for Theorem Proving With Language Models”, Han et al 2021
- “Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021
- “Scaling Laws for Transfer”, Hernandez et al 2021
- “MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021
- “Apparently ‘What Ho’ Is a Corruption Of…”, Marguerite 2021
- “Making Pre-Trained Language Models Better Few-Shot Learners”, Gao et al 2020
- “Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020
- “CPM: A Large-Scale Generative Chinese Pre-Trained Language Model”, Zhang et al 2020
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
- “Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020
- “The Neural Architecture of Language: Integrative Reverse-Engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020
- “RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020
- “A Systematic Characterization of Sampling Algorithms for Open-Ended Language Generation”, Nadeem et al 2020
- “Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020
- “Learning to Summarize from Human Feedback”, Stiennon et al 2020
- “ETHICS: Aligning AI With Shared Human Values”, Hendrycks et al 2020
- “Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020
- “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
- “OpenAI API Beta Homepage”, OpenAI 2020
- “Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020
- “Scaling Laws from the Data Manifold Dimension”, Sharma & Kaplan 2020
- “Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
- “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020
- “Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
- “Scaling Laws for Neural Language Models”, Kaplan et al 2020
- “Reformer: The Efficient Transformer”, Kitaev et al 2020
- “What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
- “Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
- “Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019
- “How Can We Know What Language Models Know?”, Jiang et al 2019
- “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
- “Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019
- “DialoGPT: Large-Scale Generative Pre-Training for Conversational Response Generation”, Zhang et al 2019
- “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019
- “Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019
- “Language Modeling State-Of-The-Art Leaderboards”, paperswithcode.com 2019
- “Neural Text Generation With Unlikelihood Training”, Welleck et al 2019
- “GROVER: Defending Against Neural Fake News”, Zellers et al 2019
- “Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—Whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
- “The Curious Case of Neural Text Degeneration”, Holtzman et al 2019
- “Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019
- “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
- “Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
- “Universal Transformers”, Dehghani et al 2018
- “Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
- “GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
- “Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)
- “Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017
- “Design a Role-Playing Game Using 200 Words or Less.”
- “How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning”
- “AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”
- “Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]”
- “OpenAI API Alchemy: Emoji Storytelling 🤖”
- “Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference”
- “I Blew $720 on 100 Notebooks from Alibaba and Started a Paper Website Business”
- “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
- “Transformers As Variational Autoencoders”
- “BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.”
- “Efficient, Reusable RNNs and LSTMs for Torch”
- “Updated Training?”
- “Karpathy/minGPT: A Minimal PyTorch Re-Implementation of the OpenAI GPT (Generative Pretrained Transformer) Training”
- “Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.”
- “Loom: Multiversal Tree Writing Interface for Human-AI Collaboration”, Janus 2024
- “Zphang/minimal-Opt”
- “Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”
- “Deep Learning for Assisting the Process of Music Composition (part 3)”
- “Google DeepMind’s Grandmaster-Level Chess Without Search”
- “The Technology Behind BLOOM Training”
- “Psych-101 Dataset [For Centaur]”
- The Gostak
- “Imprompter”
- “Your Next New Best Friend Might Be a Robot”
- “I Made a Custom Gpt That Incorporates Advertisement/product Placement With Its...”
- “The Annotated Transformer”
- “Homepage of Paul F. Christiano”, Christiano 2024
- “Data Exfiltration from Slack AI via Indirect Prompt Injection”, PromptArmor 2024
- “Introductory Antimemetics (abandoned First Draft)”, Hughes 2024
- “Jared Kaplan”
- “Meditations on Moloch”
- “Stream Seaandsailor”
- “Humans Who Are Not Concentrating Are Not General Intelligences”
- “Monitor: An AI-Driven Observability Interface”
- “This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”
- “AMA Conjecture, A New Alignment Startup”
- “WikiCrow”
- “ChatGPT As Muse, Not Oracle”, Litt 2024
- “Interpreting GPT: the Logit Lens”
- “Assessing AlephAlpha’s Multimodal Model”
- “Is GPT-3 a Good Rationalist?”
- “We Are Conjecture, A New Alignment Research Startup”
- “Investigating Causal Understanding in LLMs”
- “A One-Question Turing Test for GPT-3”
- “This Mystical Book Was Co-Authored by a Disturbingly Realistic AI”
- “The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong”
- “Season 1 Ep. 22 OpenAI's Ilya Sutskever: The Man Who Made AI Work”
- “WELM”
- nickwalton00
- sama
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Gwern
“GPT-3 Semantic Derealization”, Gwern 2024
“Research Ideas”, Gwern 2017
“You Should Write More Online—It’s Still a Good Time”, Gwern 2024
“Machine Learning Scaling”, Gwern 2021
Links
“Continuous Autoregressive Models With Noise Augmentation Avoid Error Accumulation”, Pasini et al 2024
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
“Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?”, Yang et al 2024
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
“2:4 Sparse Llama: Smaller Models for Efficient GPU Inference”, Kurtić et al 2024
2:4 Sparse Llama: Smaller Models for Efficient GPU Inference
“Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?”, Jeong et al 2024
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
“Model Equality Testing: Which Model Is This API Serving?”, Gao et al 2024
“Centaur: a Foundation Model of Human Cognition”, Binz et al 2024
“Do LLMs Estimate Uncertainty Well in Instruction-Following?”, Heo et al 2024
“Interpretable Contrastive Monte Carlo Tree Search Reasoning”, Gao et al 2024
“NGPT: Normalized Transformer With Representation Learning on the Hypersphere”, Loshchilov et al 2024
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
“LLM Applications I Want To See”, Constantin 2024
“Ensemble Everything Everywhere: Multi-Scale Aggregation for Adversarial Robustness”, Fort & Lakshminarayanan 2024
Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness
“Token Erasure As a Footprint of Implicit Vocabulary Items in LLMs”, Feucht et al 2024
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
“Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, Porian et al 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
“When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, Chang et al 2024
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
“Nemotron-4 340B Technical Report”, Adler et al 2024
“DataComp-LM: In Search of the next Generation of Training Sets for Language Models”, Li et al 2024
DataComp-LM: In search of the next generation of training sets for language models
“How Do Large Language Models Acquire Factual Knowledge During Pretraining?”, Chang et al 2024
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
“Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs”, Hans et al 2024
Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs
“Discovering Preference Optimization Algorithms With and for Large Language Models”, Lu et al 2024
Discovering Preference Optimization Algorithms with and for Large Language Models
“MCTSr: Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine With LLaMA-3-8B”, Zhang et al 2024
“For Chinese Students, the New Tactic Against AI Checks: More AI”, Qitong 2024
For Chinese Students, the New Tactic Against AI Checks: More AI
“MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series”, Zhang et al 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
“Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass”, Shen et al 2024
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
“SpaceByte: Towards Deleting Tokenization from Large Language Modeling”, Slagle 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
“Towards Smaller, Faster Decoder-Only Transformers: Architectural Variants and Their Implications”, Suresh & P 2024
Towards smaller, faster decoder-only transformers: Architectural variants and their implications
“Design of Highly Functional Genome Editors by Modeling the Universe of CRISPR-Cas Sequences”, Ruffolo et al 2024
Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences
“From r to Q✱: Your Language Model Is Secretly a Q-Function”, Rafailov et al 2024
“CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models”, Lee et al 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
“CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, Chiu et al 2024
“Training LLMs over Neurally Compressed Text”, Lester et al 2024
“Reverse Training to Nurse the Reversal Curse”, Golovneva et al 2024
“Evolutionary Optimization of Model Merging Recipes”, Akiba et al 2024
“Yi: Open Foundation Models by 01.AI”, Young et al 2024
“Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, Zhai et al 2024
“Fast Adversarial Attacks on Language Models In One GPU Minute”, Sadasivan et al 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
“Autonomous Data Selection With Language Models for Mathematical Texts”, Zhang et al 2024
Autonomous Data Selection with Language Models for Mathematical Texts
“Grandmaster-Level Chess Without Search”, Ruoss et al 2024
“Neural Networks Learn Statistics of Increasing Complexity”, Belrose et al 2024
“Arrows of Time for Large Language Models”, Papadopoulos et al 2024
“SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Ashkboos et al 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
“Excuse Me, Sir? Your Language Model Is Leaking (information)”, Zamir 2024
Excuse me, sir? Your language model is leaking (information)
“TinyLlama: An Open-Source Small Language Model”, Zhang et al 2024
“LLaMA Pro: Progressive LLaMA With Block Expansion”, Wu et al 2024
“Generative AI Is Already Widespread in the Public Sector”, Bright et al 2024
“Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws”, Sardana & Frankle 2023
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
“TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Yuan et al 2023
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
“Reasons to Reject? Aligning Language Models With Judgments”, Xu et al 2023
“Generative Multimodal Models Are In-Context Learners”, Sun et al 2023
“Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning”, Dutta et al 2023
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
“Object Recognition As Next Token Prediction”, Yue et al 2023
“MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, Chen et al 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
“Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching”, Campbell et al 2023
“OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Tong et al 2023
OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say
“Positional Description Matters for Transformers Arithmetic”, Shen et al 2023
“Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models”, Zhang et al 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
“Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
“Learn Your Tokens: Word-Pooled Tokenization for Language Modeling”, Thawani et al 2023
Learn Your Tokens: Word-Pooled Tokenization for Language Modeling
“Llemma: An Open Language Model For Mathematics”, Azerbayev et al 2023
“In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023
In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries
“OSD: Online Speculative Decoding”, Liu et al 2023
“Let Models Speak Ciphers: Multiagent Debate through Embeddings”, Pham et al 2023
Let Models Speak Ciphers: Multiagent Debate through Embeddings
“OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
“XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023
xVal: A Continuous Number Encoding for Large Language Models
“MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models”, Yu et al 2023
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
“Language Modeling Is Compression”, Delétang et al 2023
“Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023
Sparse Autoencoders Find Highly Interpretable Features in Language Models
“Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023
“When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
“Language Reward Modulation for Pretraining Reinforcement Learning”, Adeniji et al 2023
Language Reward Modulation for Pretraining Reinforcement Learning
“ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
“Studying Large Language Model Generalization With Influence Functions”, Grosse et al 2023
Studying Large Language Model Generalization with Influence Functions
“Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023
“Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
“Length Generalization in Arithmetic Transformers”, Jelassi et al 2023
“Are Aligned Neural Networks Adversarially Aligned?”, Carlini et al 2023
“Improving Long-Horizon Imitation Through Instruction Prediction”, Hejna et al 2023
Improving Long-Horizon Imitation Through Instruction Prediction
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
“Undetectable Watermarks for Language Models”, Christ et al 2023
“Improving Language Models With Advantage-Based Offline Policy Gradients”, Baheti et al 2023
Improving Language Models with Advantage-based Offline Policy Gradients
“Accelerating Transformer Inference for Translation via Parallel Decoding”, Santilli et al 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
“DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
“Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023
Memorization for Good: Encryption with Autoregressive Language Models
“MEGABYTE: Predicting Million-Byte Sequences With Multiscale Transformers”, Yu et al 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
“Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
“Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023
Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot
“Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023
Emergent and Predictable Memorization in Large Language Models
“A Comparative Study between Full-Parameter and LoRA-Based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
“Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
“How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023
How Large-Language Models Can Revolutionize Military Planning
“Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling”, Biderman et al 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
“8 Things to Know about Large Language Models”, Bowman 2023
“BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
“The Quantization Model of Neural Scaling”, Michaud et al 2023
“Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
“Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023
“Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
Rewarding Chatbots for Real-World Engagement with Millions of Users
“Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023
“SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, Zhu et al 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
“A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
“BiLD: Big Little Transformer Decoder”, Kim et al 2023
“Data Selection for Language Models via Importance Resampling”, Xie et al 2023
Data Selection for Language Models via Importance Resampling
“In-Context Retrieval-Augmented Language Models”, Ram et al 2023
“Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023
“Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React”, Tiku et al 2023
“Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
Rock Guitar Tablature Generation via Natural Language Processing
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
“A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022
“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
“Rethinking the Role of Scale for In-Context Learning: An Interpretability-Based Case Study at 66 Billion Scale”, Bansal et al 2022
“Interpreting Neural Networks through the Polytope Lens”, Black et al 2022
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
“InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
“Galactica: A Large Language Model for Science”, Taylor et al 2022
“Large Language Models Struggle to Learn Long-Tail Knowledge”, Kandpal et al 2022
“The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022
“Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022
“GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers”, Frantar et al 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
“What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022
What is my math transformer doing? – 3 results on interpretability and generalization
“When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
“Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022
“Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022
“BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
“Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
“MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
“Foundation Transformers”, Wang et al 2022
“Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022
Ask Me Anything (AMA): A simple strategy for prompting language models
“Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022
“Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
Sparrow: Improving alignment of dialogue agents via targeted human judgements
“Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022
Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators
“FP8 Formats for Deep Learning”, Micikevicius et al 2022
“Petals: Collaborative Inference and Fine-Tuning of Large Models”, Borzunov et al 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
“LLM.int8()
: 8-Bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022
LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale
“Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022
“Effidit: Your AI Writing Assistant”, Shi et al 2022
“Language Models Show Human-Like Content Effects on Reasoning”, Dasgupta et al 2022
Language models show human-like content effects on reasoning
“LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
“Can Foundation Models Talk Causality?”, Willig et al 2022
“NOAH: Neural Prompt Search”, Zhang et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
“Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
Quark: Controllable Text Generation with Reinforced Unlearning
“RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022
RankGen: Improving Text Generation with Large Ranking Models
“Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022
“What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
What Language Model to Train if You Have One Million GPU Hours?
“WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
“Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022
Shared computational principles for language processing in humans and deep language models
“Vector-Quantized Image Modeling With Improved VQGAN”, Yu et al 2022
“Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022
Brains and algorithms partially converge in natural language processing
“Quantifying Memorization Across Neural Language Models”, Carlini et al 2022
“A Contrastive Framework for Neural Text Generation”, Su et al 2022
“AdaPrompt: Adaptive Model Training for Prompt-Based NLP”, Chen et al 2022
“InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
InPars: Data Augmentation for Information Retrieval using Large Language Models
“ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
“Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022
Data Scaling Laws in NMT: The Effect of Noise and Architecture
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
“Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022
“Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
“A Survey of Controllable Text Generation Using Transformer-Based Pre-Trained Language Models”, Zhang et al 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
“The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022
“Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021
“Learning to Prompt for Continual Learning”, Wang et al 2021
“Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
“Few-Shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts
“LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021
“Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021
Improving language models by retrieving from trillions of tokens
“Linear Algebra With Transformers”, Charton 2021
“Zero-Shot Image-To-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
“Long-Range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021
Long-range and hierarchical language predictions in brains and algorithms
“True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021
True Few-Shot Learning with Prompts—A Real-World Perspective
“Few-Shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021
“Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021
Evaluating Distributional Distortion in Neural Language Modeling
“On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021
On Transferability of Prompt Tuning for Natural Language Understanding
“CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
“Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
“Fast Model Editing at Scale”, Mitchell et al 2021
“Yuan 1.0: Large-Scale Pre-Trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
“Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
“A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021
“Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021
“Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
“What Changes Can Large-Scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-Scale Korean Generative Pretrained Transformers”, Kim et al 2021
“Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
“General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
“An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021
“Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021
“Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
“Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
“RASP: Thinking Like Transformers”, Weiss et al 2021
“ByT5: Towards a Token-Free Future With Pre-Trained Byte-To-Byte Models”, Xue et al 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
“Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021
Anthropic raises $124 million to build more reliable, general AI systems
“Naver Unveils First ‘Hyperscale’ AI Platform”, Jae-eun 2021
“Scaling Laws for Language Transfer Learning”, Kim 2021
“GPT Understands, Too”, Liu et al 2021
“How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021
“Pretrained Transformers As Universal Computation Engines”, Lu et al 2021
“Language Models Have a Moral Dimension”, Schramowski et al 2021
“Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021
Learning Chess Blindfolded: Evaluating Language Models on State Tracking
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
“Proof Artifact Co-Training for Theorem Proving With Language Models”, Han et al 2021
Proof Artifact Co-training for Theorem Proving with Language Models
“Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration
“Scaling Laws for Transfer”, Hernandez et al 2021
“MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
“Apparently ‘What Ho’ Is a Corruption Of…”, Marguerite 2021
“Making Pre-Trained Language Models Better Few-Shot Learners”, Gao et al 2020
“Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020
Thinking ahead: prediction in context as a keystone of language in humans and machines
“CPM: A Large-Scale Generative Chinese Pre-Trained Language Model”, Zhang et al 2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm
“Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020
“The Neural Architecture of Language: Integrative Reverse-Engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020
“RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020
RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text
“A Systematic Characterization of Sampling Algorithms for Open-Ended Language Generation”, Nadeem et al 2020
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
“Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020
“Learning to Summarize from Human Feedback”, Stiennon et al 2020
“ETHICS: Aligning AI With Shared Human Values”, Hendrycks et al 2020
“Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020
Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
“OpenAI API Beta Homepage”, OpenAI 2020
“Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020
Trading Off Diversity and Quality in Natural Language Generation
“Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020
Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks
“Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
“Scaling Laws for Neural Language Models”, Kaplan et al 2020
“Reformer: The Efficient Transformer”, Kitaev et al 2020
“What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
What does BERT dream of? A visual investigation of nightmares in Sesame Street
“Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
Generative Language Modeling for Automated Theorem Proving § Experiments
“Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
“How Can We Know What Language Models Know?”, Jiang et al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
“Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019
Generalization through Memorization: Nearest Neighbor Language Models
“DialoGPT: Large-Scale Generative Pre-Training for Conversational Response Generation”, Zhang et al 2019
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
“CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019
CTRL: A Conditional Transformer Language Model For Controllable Generation
“Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019
Smaller, faster, cheaper, lighter: Introducing DistilGPT, a distilled version of GPT
“Language Modeling State-Of-The-Art Leaderboards”, paperswithcode.com 2019
“Neural Text Generation With Unlikelihood Training”, Welleck et al 2019
“GROVER: Defending Against Neural Fake News”, Zellers et al 2019
“Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—Whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
“The Curious Case of Neural Text Degeneration”, Holtzman et al 2019
“Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019
Smart Vet: Autocompleting Sentences in Veterinary Medical Records
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
“Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
Music Transformer: Generating Music with Long-Term Structure
“Universal Transformers”, Dehghani et al 2018
“Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
“GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
GPT-1: Improving Language Understanding with Unsupervised Learning
“GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018
GPT-1: Improving Language Understanding by Generative Pre-Training
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications
“Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)
Deep reinforcement learning from human preferences § Appendix A.2: Atari
“Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017
“Design a Role-Playing Game Using 200 Words or Less.”
“How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning”
“AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”
“Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]”
“OpenAI API Alchemy: Emoji Storytelling 🤖”
OpenAI API Alchemy: Emoji storytelling 🤖:
View External Link:
https://andrewmayne.com/2020/06/24/open-ai-alchemy-emoji-storytelling/
“Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference”
Llama-3.1-405B now runs at 969 tokens/s on Cerebras Inference
“I Blew $720 on 100 Notebooks from Alibaba and Started a Paper Website Business”
I blew $720 on 100 notebooks from Alibaba and started a Paper Website business:
“AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
“Transformers As Variational Autoencoders”
“BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.”
“Efficient, Reusable RNNs and LSTMs for Torch”
“Updated Training?”
“Karpathy/minGPT: A Minimal PyTorch Re-Implementation of the OpenAI GPT (Generative Pretrained Transformer) Training”
“Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.”
“Loom: Multiversal Tree Writing Interface for Human-AI Collaboration”, Janus 2024
Loom: Multiversal tree writing interface for human-AI collaboration:
“Zphang/minimal-Opt”
“Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”
“Deep Learning for Assisting the Process of Music Composition (part 3)”
Deep learning for assisting the process of music composition (part 3):
“Google DeepMind’s Grandmaster-Level Chess Without Search”
“The Technology Behind BLOOM Training”
“Psych-101 Dataset [For Centaur]”
The Gostak
“Imprompter”
“Your Next New Best Friend Might Be a Robot”
Your Next New Best Friend Might Be a Robot:
View External Link:
https://nautil.us/your-next-new-best-friend-might-be-a-robot-235779/
“I Made a Custom Gpt That Incorporates Advertisement/product Placement With Its...”
I made a custom gpt that incorporates advertisement/product placement with its...
“The Annotated Transformer”
“Homepage of Paul F. Christiano”, Christiano 2024
“Data Exfiltration from Slack AI via Indirect Prompt Injection”, PromptArmor 2024
Data Exfiltration from Slack AI via indirect prompt injection
“Introductory Antimemetics (abandoned First Draft)”, Hughes 2024
“Jared Kaplan”
“Meditations on Moloch”
“Stream Seaandsailor”
“Humans Who Are Not Concentrating Are Not General Intelligences”
Humans Who Are Not Concentrating Are Not General Intelligences
“Monitor: An AI-Driven Observability Interface”
“This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”
This is the OpenAI API. It makes spookily good twitter bots. 13⁄10 would retweet:
“AMA Conjecture, A New Alignment Startup”
“WikiCrow”
WikiCrow:
“ChatGPT As Muse, Not Oracle”, Litt 2024
“Interpreting GPT: the Logit Lens”
“Assessing AlephAlpha’s Multimodal Model”
“Is GPT-3 a Good Rationalist?”
“We Are Conjecture, A New Alignment Research Startup”
“Investigating Causal Understanding in LLMs”
“A One-Question Turing Test for GPT-3”
“This Mystical Book Was Co-Authored by a Disturbingly Realistic AI”
This Mystical Book Was Co-Authored by a Disturbingly Realistic AI
“The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong”
The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong
“Season 1 Ep. 22 OpenAI's Ilya Sutskever: The Man Who Made AI Work”
Season 1 Ep. 22 OpenAI's Ilya Sutskever: The man who made AI work:
“WELM”
nickwalton00
sama
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
controllable-text
neural-interpretability
compression
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/2023-bommarito-figure1-gpt3cpaaccountingexamperformancebyexamsection.jpg
: -
/doc/ai/nn/transformer/gpt/2023-bommarito-figure2-progressofgpt3overtimeoncpaaccountingexam.jpg
: -
/doc/ai/nn/transformer/gpt/2023-qin-figure1-chatgptvsgpt35on20nlpdatasets.png
: -
/doc/ai/nn/transformer/gpt/2022-08-06-gwern-meme-netflixliegirl-studyingdeeplearningscaling.jpg
: -
/doc/ai/nn/transformer/gpt/2022-05-22-gwern-meme-tintinwhataweekhuh-2ndanniversaryofgpt3paper.png
: -
/doc/ai/nn/transformer/gpt/2022-bommarito-figure1-gpt3performanceonbarexambycategory.jpg
: -
/doc/ai/nn/transformer/gpt/2022-bommarito-figure2-increaseofgpt3modelaccuracyonbarexambysize.jpg
: -
/doc/ai/nn/transformer/gpt/2021-05-25-naver-hyperclova-computescaling0137bto82b.jpg
: -
/doc/ai/nn/transformer/gpt/2021-01-11-gwern-meme-dogbarkcanthurtyou-aiscaling.jpg
: -
/doc/ai/nn/transformer/gpt/2021-almeida-figure2-lhoptgpt3hyperparametertuningscalinglaw.jpg
: -
/doc/ai/nn/transformer/gpt/2021-dou-figure2-errorsbymodel.png
: -
/doc/ai/nn/transformer/gpt/2021-dou-figure3-errorsbytype.png
: -
/doc/ai/nn/transformer/gpt/2021-dou-figure4-errorsbydecodingsamplingstrategyhyperparameters.png
: -
/doc/ai/nn/transformer/gpt/2021-hernandez-transferlearning-figure2-transferscaling.png
: -
/doc/ai/nn/transformer/gpt/2021-kim-figure4-datatransferfromenglishtochinese.jpg
: -
/doc/ai/nn/transformer/gpt/2021-kim-figure5-transferfromenglishtochinesespanishgerman.jpg
: -
/doc/ai/nn/transformer/gpt/2021-nogueira-figure1-additionperformanceofnumberorthographies.png
: -
/doc/ai/nn/transformer/gpt/2020-06-21-openai-beta-gpt3-playgroundui.png
: -
/doc/ai/nn/transformer/gpt/2020-06-18-karpathy-expandingbrainmeme-gpt3metalearning.jpg
: -
/doc/ai/nn/transformer/gpt/2020-04-01-gwern-gpt2-5k-midi-training.png
: -
/doc/ai/nn/transformer/gpt/2020-02-03-gpt21.5b-archiveofourownao3-model-510427-samples-topp090.txt
: -
/doc/ai/nn/transformer/gpt/2020-02-03-gpt21.5b-videogamewalkthrough-model-174925-samples-topp090.txt
: -
/doc/ai/nn/transformer/gpt/2020-01-20-gwern-gpt2-25k-midi-training.png
: -
/doc/ai/nn/transformer/gpt/2020-bostrom-unigramlm-figure1-unigramlmvsbpe.png
: -
/doc/ai/nn/transformer/gpt/2020-brown-figure31-gpt3scaling.png
: -
/doc/ai/nn/transformer/gpt/2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.jpg
: -
/doc/ai/nn/transformer/gpt/2020-brown-gpt3-figure13-meanperformancescalingcurve.png
: -
/doc/ai/nn/transformer/gpt/2020-hendrycks-figure1b-gpt3-qascaling.png
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure1-scalingacrossdomains.jpg
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure11-pretrainingimageclassificationscaling.png
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure2-universalmodelsizescaling.jpg
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure3-domainmodelsizescaling.png
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure31-qandamodelscaling.jpg
: -
/doc/ai/nn/transformer/gpt/2020-henighan-table1-autoregressivemodelsscalingpowerlaws.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-appendix1-summaryofneurallanguagemodelscalingpowerlaws.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-figure1-dlscaling.jpg
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-figure15-projectingscaling.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-figure7-scalingrnnsvstransformersshowsrnnplateau.png
: -
/doc/ai/nn/transformer/gpt/2020-zhang-figure1-thelikelihoodtrap.png
: -
/doc/ai/nn/transformer/gpt/2019-12-17-gwern-gpt2-preferencelearning-abc-terminal.png
: -
/doc/ai/nn/transformer/gpt/2019-12-16-gwern-gpt2-15b-poetry-tensorboard-100tputraining.png
: -
/doc/ai/nn/transformer/gpt/2019-12-13-gwern-gpt2-15b-poetry-tensorboard-97tputraining.png
: -
/doc/ai/nn/transformer/gpt/2019-12-13-gwern-gpt2-preferencelearning-abc-combinedmodel-halfbounce.png
: -
/doc/ai/nn/transformer/gpt/2019-12-12-gwern-gpt2-abc-score-polkaebbbab.png
: -
/doc/ai/nn/transformer/gpt/2019-11-19-gwern-gpt2-15b-poetry-tensorboard-1tputraining.jpg
: -
/doc/ai/nn/transformer/gpt/2019-keskar-table2-ctrltextsamplesusingonlymetadatawithoutaprompt.png
: -
/doc/ai/nn/transformer/gpt/2019-keskar-table7-datasetsandcontrolcodesmetadata.png
: -
/doc/ai/nn/transformer/gpt/2019-openai-gpt2-demo-recyclingtextsample.jpg
: -
/doc/ai/nn/transformer/gpt/2019-radford-figure4-gpt2validationloss.jpg
: -
/doc/ai/nn/transformer/gpt/2019-ziegler-preferencelearning-figure1-architecture.png
: -
/doc/ai/nn/transformer/gpt/2018-huang-magenta-musictransformer-attentionvisualization.jpg
: -
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
-
https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
: -
https://analyticsindiamag.com/when-chatgpt-attempted-upsc-exam/
: -
https://blog.eleuther.ai/trlx-exploratory-analysis/
:View External Link:
-
https://colab.research.google.com/drive/1c6VccMPsOMAUQCKU4BVDRd5Y32qkozmK
-
https://davidrozado.substack.com/p/the-political-preferences-of-llms
-
https://eprint.iacr.org/2021/686
:View External Link:
-
https://github.com/jujumilk3/leaked-system-prompts/tree/main
-
https://hedgehogreview.com/issues/markets-and-the-good/articles/language-machinery
-
https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf
-
https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion
-
https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf
-
https://platform.openai.com/docs/guides/gpt-best-practices
: -
https://promptarmor.substack.com/p/data-exfiltration-from-writercom
-
https://simonwillison.net/2023/Apr/14/worst-that-can-happen/
-
https://techtualist.substack.com/p/i-wrote-a-script-for-gpt-3-to-take
-
https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release
-
https://web.archive.org/web/20240102075620/https://www.jailbreakchat.com/
-
https://www.buildt.ai/blog/viral-ripout
:View HTML (19MB):
/doc/www/www.buildt.ai/578673ced29982f87eb8e930f5e6d692a44fed4e.html
-
https://www.forbes.com/sites/thomasbrewster/2023/11/16/chatgpt-becomes-a-social-media-spy-assistant/
-
https://www.forefront.ai/blog-posts/how-to-fine-tune-gpt-neox
: -
https://www.freaktakes.com/p/the-past-and-present-of-computer
-
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
: -
https://www.lesswrong.com/posts/PDLfpRwSynu73mxGw/basic-facts-about-language-model-internals-1
: -
https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee
-
https://www.lesswrong.com/posts/etoMr4vcnP7joQHWa/notes-from-a-prompt-factory
-
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
-
https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html
-
https://www.oneusefulthing.org/p/working-with-ai-two-paths-to-prompting
-
https://www.politico.eu/article/italian-privacy-regulator-bans-chatgpt/
-
https://www.reddit.com/r/ChatGPT/comments/12xai7j/spamming_the_word_stop_2300_times_or_probably_any/
-
https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/
-
https://www.reddit.com/r/GPT3/comments/ra6nk4/had_gpt3_generate_the_onion_headlines/
: -
https://www.reddit.com/r/GPT3/comments/tgud2t/my_new_favorite_thing_is_making_gpt3_create/
: -
https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/
: -
https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/
:
Bibliography
-
https://arxiv.org/abs/2410.01707
: “Interpretable Contrastive Monte Carlo Tree Search Reasoning”, -
https://arxiv.org/abs/2408.05446
: “Ensemble Everything Everywhere: Multi-Scale Aggregation for Adversarial Robustness”, -
https://arxiv.org/abs/2406.20086
: “Token Erasure As a Footprint of Implicit Vocabulary Items in LLMs”, -
https://arxiv.org/abs/2406.19146
: “Resolving Discrepancies in Compute-Optimal Scaling of Language Models”, -
https://arxiv.org/abs/2406.13131
: “When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, -
https://arxiv.org/abs/2406.11794
: “DataComp-LM: In Search of the next Generation of Training Sets for Language Models”, -
https://arxiv.org/abs/2406.07394
: “MCTSr: Accessing GPT-4 Level Mathematical Olympiad Solutions via Monte Carlo Tree Self-Refine With LLaMA-3-8B”, -
https://arxiv.org/abs/2405.18400
: “Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass”, -
https://arxiv.org/abs/2404.12358
: “From r to Q✱: Your Language Model Is Secretly a Q-Function”, -
https://arxiv.org/abs/2404.06664
: “CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack Of) Multicultural Knowledge”, -
https://arxiv.org/abs/2402.17152#facebook
: “Actions Speak Louder Than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)”, -
https://arxiv.org/abs/2402.15570
: “Fast Adversarial Attacks on Language Models In One GPU Minute”, -
https://arxiv.org/abs/2402.07625
: “Autonomous Data Selection With Language Models for Mathematical Texts”, -
https://arxiv.org/abs/2402.04494#deepmind
: “Grandmaster-Level Chess Without Search”, -
https://arxiv.org/abs/2401.15024#microsoft
: “SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, -
https://arxiv.org/abs/2401.02385
: “TinyLlama: An Open-Source Small Language Model”, -
https://arxiv.org/abs/2312.16862
: “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, -
https://arxiv.org/abs/2311.16079
: “MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, -
https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/
: “OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, -
https://arxiv.org/abs/2310.06786
: “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, -
https://arxiv.org/abs/2309.12284
: “MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models”, -
https://arxiv.org/abs/2309.10668#deepmind
: “Language Modeling Is Compression”, -
https://arxiv.org/abs/2306.07567
: “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, -
https://arxiv.org/abs/2305.10429#google
: “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, -
https://www.forbes.com/sites/alexkonrad/2023/05/02/inflection-ai-ex-deepmind-launches-pi-chatbot/
: “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, -
https://arxiv.org/abs/2304.06762#nvidia
: “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, -
https://warontherocks.com/2023/04/how-large-language-models-can-revolutionize-military-planning/
: “How Large-Language Models Can Revolutionize Military Planning”, -
https://arxiv.org/abs/2303.13506
: “The Quantization Model of Neural Scaling”, -
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and
: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, -
https://osf.io/5uxra/
: “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, -
https://arxiv.org/abs/2302.13939
: “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks”, -
https://arxiv.org/abs/2302.03169
: “Data Selection for Language Models via Importance Resampling”, -
https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html
: “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, -
https://arxiv.org/abs/2211.10438
: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, -
https://arxiv.org/abs/2211.09800
: “InstructPix2Pix: Learning to Follow Image Editing Instructions”, -
https://arxiv.org/abs/2211.09085#facebook
: “Galactica: A Large Language Model for Science”, -
https://arxiv.org/abs/2211.08411
: “Large Language Models Struggle to Learn Long-Tail Knowledge”, -
https://arxiv.org/abs/2210.17323
: “GPTQ: Accurate Post-Training Quantization for Generative Pre-Trained Transformers”, -
https://arxiv.org/abs/2210.13673#nvidia
: “Evaluating Parameter Efficient Learning for Generation”, -
https://arxiv.org/abs/2210.10341#microsoft
: “BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining”, -
https://arxiv.org/abs/2210.15458#google
: “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, -
https://arxiv.org/abs/2210.06423#microsoft
: “Foundation Transformers”, -
https://arxiv.org/abs/2210.02441
: “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, -
https://arxiv.org/abs/2210.01241
: “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, -
https://arxiv.org/abs/2207.04429
: “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, -
https://arxiv.org/abs/2206.01861#microsoft
: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, -
https://www.nature.com/articles/s41593-022-01026-4
: “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, -
https://arxiv.org/abs/2110.04627#google
: “Vector-Quantized Image Modeling With Improved VQGAN”, -
https://www.nature.com/articles/s42003-022-03036-1
: “Brains and Algorithms Partially Converge in Natural Language Processing”, -
https://arxiv.org/abs/2201.11990#microsoftnvidia
: “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, -
https://swabhs.com/assets/pdf/wanli.pdf#allen
: “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, -
https://arxiv.org/abs/2112.04426#deepmind
: “Improving Language Models by Retrieving from Trillions of Tokens”, -
https://arxiv.org/abs/2111.13440
: “True Few-Shot Learning With Prompts—A Real-World Perspective”, -
https://arxiv.org/abs/2111.02570#microsoft
: “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, -
https://arxiv.org/abs/2110.11309
: “Fast Model Editing at Scale”, -
https://arxiv.org/abs/2109.02593#allen
: “General-Purpose Question-Answering With Macaw”, -
https://arxiv.org/abs/2106.06981
: “RASP: Thinking Like Transformers”, -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-Free Future With Pre-Trained Byte-To-Byte Models”, -
https://m.koreaherald.com/view.php?ud=20210525000824#naver
: “Naver Unveils First ‘Hyperscale’ AI Platform”, -
https://arxiv.org/abs/2009.03393#openai
: “Generative Language Modeling for Automated Theorem Proving”, -
https://aclanthology.org/2020.acl-main.463.pdf
: “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, -
https://arxiv.org/abs/2004.10802
: “Scaling Laws from the Data Manifold Dimension”, -
https://arxiv.org/abs/2001.08361#openai
: “Scaling Laws for Neural Language Models”, -
https://arxiv.org/abs/2001.04451#google
: “Reformer: The Efficient Transformer”, -
https://arxiv.org/abs/1909.05858#salesforce
: “CTRL: A Conditional Transformer Language Model For Controllable Generation”, -
https://paperswithcode.com/task/language-modelling
: “Language Modeling State-Of-The-Art Leaderboards”, -
https://arxiv.org/abs/1901.02860
: “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, -
https://magenta.tensorflow.org/music-transformer
: “Music Transformer: Generating Music With Long-Term Structure”, -
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5
: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, -
https://paulfchristiano.com/
: “Homepage of Paul F. Christiano”,