Bibliography:

  1. ‘Transformer’ tag

  2. ‘GPT-2 fiction’ tag

  3. ‘GPT-2’ tag

  4. ‘GPT-2 nonfiction’ tag

  5. /doc/ai/nn/transformer/gpt/2/poetry

  6. ‘GPT-3 fiction’ tag

  7. ‘GPT-3 humor’ tag

  8. ‘GPT-3’ tag

  9. ‘GPT-3 nonfiction’ tag

  10. ‘GPT-3 poetry’ tag

  11. ‘GPT-4 fiction’ tag

  12. ‘GPT-4’ tag

  13. ‘GPT-4 nonfiction’ tag

  14. ‘GPT-4 poetry’ tag

  15. ‘Sydney (AI)’ tag

  16. ‘GPT-5’ tag

  17. ‘GPT calibration’ tag

  18. ‘Claude AI’ tag

  19. ‘Codex’ tag

  20. ‘DALL·E 1’ tag

  21. ‘DALL·E 2’ tag

  22. ‘DALL·E 3’ tag

  23. ‘DALL·E’ tag

  24. ‘GPT fiction’ tag

  25. ‘inner monologue (AI)’ tag

  26. ‘instruct-tuning LLMs’ tag

  27. ‘Jukebox’ tag

  28. ‘LaMDA’ tag

  29. ‘GPT non-fiction’ tag

  30. ‘PaLM 2’ tag

  31. ‘PaLM’ tag

  32. ‘GPT poetry’ tag

  33. ‘Whisper NN’ tag

  34. ‘LM tokenization’ tag

  35. ‘video generation’ tag

  36. GPT-3 Semantic Derealization

  37. Research Ideas

  38. You should write more online—it’s still a good time

  39. Machine Learning Scaling

  40. Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

  41. Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

  42. 2:4 Sparse Llama: Smaller Models for Efficient GPU Inference

  43. Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?

  44. Model Equality Testing: Which Model Is This API Serving?

  45. Centaur: a foundation model of human cognition

  46. Do LLMs estimate uncertainty well in instruction-following?

  47. Interpretable Contrastive Monte Carlo Tree Search Reasoning

  48. nGPT: Normalized Transformer with Representation Learning on the Hypersphere

  49. LLM Applications I Want To See

  50. 994c2f94d62a984842ed3fa41412926dccca6241.html

  51. Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

  52. Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

  53. Resolving Discrepancies in Compute-Optimal Scaling of Language Models

  54. When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models

  55. Nemotron-4 340B Technical Report

  56. DataComp-LM: In search of the next generation of training sets for language models

  57. How Do Large Language Models Acquire Factual Knowledge During Pretraining?

  58. Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs

  59. Discovering Preference Optimization Algorithms with and for Large Language Models

  60. MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B

  61. For Chinese Students, the New Tactic Against AI Checks: More AI

  62. MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

  63. Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

  64. SpaceByte: Towards Deleting Tokenization from Large Language Modeling

  65. Towards smaller, faster decoder-only transformers: Architectural variants and their implications

  66. Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences

  67. From r to Q: Your Language Model is Secretly a Q-Function

  68. CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

  69. CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge

  70. Training LLMs over Neurally Compressed Text

  71. Reverse Training to Nurse the Reversal Curse

  72. Evolutionary Optimization of Model Merging Recipes

  73. Yi: Open Foundation Models by 01.AI

  74. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)

  75. Fast Adversarial Attacks on Language Models In One GPU Minute

  76. Autonomous Data Selection with Language Models for Mathematical Texts

  77. Grandmaster-Level Chess Without Search

  78. Neural Networks Learn Statistics of Increasing Complexity

  79. Arrows of Time for Large Language Models

  80. SliceGPT: Compress Large Language Models by Deleting Rows and Columns

  81. Excuse me, sir? Your language model is leaking (information)

  82. TinyLlama: An Open-Source Small Language Model

  83. LLaMA Pro: Progressive LLaMA with Block Expansion

  84. Generative AI is already widespread in the public sector

  85. Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

  86. TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

  87. Reasons to Reject? Aligning Language Models with Judgments

  88. Generative Multimodal Models are In-Context Learners

  89. Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

  90. Object Recognition as Next Token Prediction

  91. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

  92. Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching

  93. OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say

  94. Positional Description Matters for Transformers Arithmetic

  95. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

  96. Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

  97. Learn Your Tokens: Word-Pooled Tokenization for Language Modeling

  98. Llemma: An Open Language Model For Mathematics

  99. In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries

  100. OSD: Online Speculative Decoding

  101. Let Models Speak Ciphers: Multiagent Debate through Embeddings

  102. OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

  103. xVal: A Continuous Number Encoding for Large Language Models

  104. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

  105. Language Modeling Is Compression

  106. Sparse Autoencoders Find Highly Interpretable Features in Language Models

  107. Anchor Points: Benchmarking Models with Much Fewer Examples

  108. When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

  109. Language Reward Modulation for Pretraining Reinforcement Learning

  110. ReST: Reinforced Self-Training (ReST) for Language Modeling

  111. Studying Large Language Model Generalization with Influence Functions

  112. Multimodal Neurons in Pretrained Text-Only Transformers

  113. Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

  114. Length Generalization in Arithmetic Transformers

  115. Are aligned neural networks adversarially aligned?

  116. Improving Long-Horizon Imitation Through Instruction Prediction

  117. Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

  118. SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

  119. Undetectable Watermarks for Language Models

  120. Improving Language Models with Advantage-based Offline Policy Gradients

  121. Accelerating Transformer Inference for Translation via Parallel Decoding

  122. DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

  123. Memorization for Good: Encryption with Autoregressive Language Models

  124. MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

  125. Finding Neurons in a Haystack: Case Studies with Sparse Probing

  126. Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot

  127. Emergent and Predictable Memorization in Large Language Models

  128. A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model

  129. Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

  130. How Large-Language Models Can Revolutionize Military Planning

  131. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

  132. 8 Things to Know about Large Language Models

  133. BloombergGPT: A Large Language Model for Finance

  134. The Quantization Model of Neural Scaling

  135. Int-4 LLaMa is not enough—Int-3 and beyond: More compression, easier to build apps on LLMs that run locally

  136. Consistency Analysis of ChatGPT

  137. Rewarding Chatbots for Real-World Engagement with Millions of Users

  138. Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan

  139. SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

  140. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

  141. BiLD: Big Little Transformer Decoder

  142. Data Selection for Language Models via Importance Resampling

  143. In-Context Retrieval-Augmented Language Models

  144. Crawling the Internal Knowledge-Base of Language Models

  145. Big Tech was moving cautiously on AI. Then came ChatGPT. Google, Facebook and Microsoft helped build the scaffolding of AI. Smaller companies are taking it to the masses, forcing Big Tech to react

  146. Rock Guitar Tablature Generation via Natural Language Processing

  147. InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

  148. A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A new wave of chat bots like ChatGPT use artificial intelligence that could reinvent or even replace the traditional internet search engine

  149. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

  150. Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

  151. Interpreting Neural Networks through the Polytope Lens

  152. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

  153. InstructPix2Pix: Learning to Follow Image Editing Instructions

  154. Galactica: A Large Language Model for Science

  155. Large Language Models Struggle to Learn Long-Tail Knowledge

  156. The CRINGE Loss: Learning what language not to model

  157. Mysteries of mode collapse § Inescapable wedding parties

  158. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

  159. What is my math transformer doing? – 3 results on interpretability and generalization

  160. When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

  161. Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

  162. Evaluating Parameter Efficient Learning for Generation

  163. BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

  164. Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

  165. MTEB: Massive Text Embedding Benchmark

  166. Foundation Transformers

  167. Ask Me Anything (AMA): A simple strategy for prompting language models

  168. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

  169. Sparrow: Improving alignment of dialogue agents via targeted human judgements

  170. Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators

  171. FP8 Formats for Deep Learning

  172. Petals: Collaborative Inference and Fine-tuning of Large Models

  173. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

  174. Meaning without reference in large language models

  175. Effidit: Your AI Writing Assistant

  176. Language models show human-like content effects on reasoning

  177. LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

  178. Can Foundation Models Talk Causality?

  179. NOAH: Neural Prompt Search

  180. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

  181. Quark: Controllable Text Generation with Reinforced Unlearning

  182. RankGen: Improving Text Generation with Large Ranking Models

  183. Opal: Multimodal Image Generation for News Illustration

  184. What Language Model to Train if You Have One Million GPU Hours?

  185. WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

  186. Shared computational principles for language processing in humans and deep language models

  187. Vector-quantized Image Modeling with Improved VQGAN

  188. Brains and algorithms partially converge in natural language processing

  189. Quantifying Memorization Across Neural Language Models

  190. A Contrastive Framework for Neural Text Generation

  191. AdaPrompt: Adaptive Model Training for Prompt-based NLP

  192. InPars: Data Augmentation for Information Retrieval using Large Language Models

  193. ROME: Locating and Editing Factual Associations in GPT

  194. Cedille: A large autoregressive French language model

  195. Data Scaling Laws in NMT: The Effect of Noise and Architecture

  196. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

  197. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

  198. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

  199. WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

  200. A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models

  201. The Defeat of the Winograd Schema Challenge

  202. Learning To Retrieve Prompts for In-Context Learning

  203. Learning to Prompt for Continual Learning

  204. Amortized Noisy Channel Neural Machine Translation

  205. Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

  206. PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts

  207. LMTurk: Few-Shot Learners as Crowdsourcing Workers

  208. Improving language models by retrieving from trillions of tokens

  209. Linear algebra with transformers

  210. Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

  211. Long-range and hierarchical language predictions in brains and algorithms

  212. True Few-Shot Learning with Prompts—A Real-World Perspective

  213. Few-shot Named Entity Recognition with Cloze Questions

  214. Evaluating Distributional Distortion in Neural Language Modeling

  215. On Transferability of Prompt Tuning for Natural Language Understanding

  216. CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

  217. Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

  218. Fast Model Editing at Scale

  219. Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

  220. Towards a Unified View of Parameter-Efficient Transfer Learning

  221. A Few More Examples May Be Worth Billions of Parameters

  222. Scaling Laws for Neural Machine Translation

  223. Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

  224. What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

  225. Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization

  226. General-Purpose Question-Answering with Macaw

  227. An Empirical Exploration in Quality Filtering of Text Data

  228. Want To Reduce Labeling Cost? GPT-3 Can Help

  229. Multimodal Few-Shot Learning with Frozen Language Models

  230. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

  231. RASP: Thinking Like Transformers

  232. ByT5: Towards a token-free future with pre-trained byte-to-byte models

  233. Anthropic raises $124 million to build more reliable, general AI systems

  234. Naver unveils first ‘hyperscale’ AI platform

  235. Scaling Laws for Language Transfer Learning

  236. GPT Understands, Too

  237. How Many Data Points is a Prompt Worth?

  238. Pretrained Transformers as Universal Computation Engines

  239. Language Models have a Moral Dimension

  240. Learning Chess Blindfolded: Evaluating Language Models on State Tracking

  241. Investigating the Limitations of the Transformers with Simple Arithmetic Tasks

  242. Proof Artifact Co-training for Theorem Proving with Language Models

  243. Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration

  244. Scaling Laws for Transfer

  245. MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

  246. Apparently ‘what ho’ is a corruption of…

  247. Making Pre-trained Language Models Better Few-shot Learners

  248. Thinking ahead: prediction in context as a keystone of language in humans and machines

  249. CPM: A Large-scale Generative Chinese Pre-trained Language Model

  250. L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm

  251. Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

  252. The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing

  253. RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text

  254. A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

  255. Generative Language Modeling for Automated Theorem Proving

  256. Learning to summarize from human feedback

  257. ETHICS: Aligning AI With Shared Human Values

  258. Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity

  259. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

  260. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

  261. OpenAI API Beta homepage

  262. Trading Off Diversity and Quality in Natural Language Generation

  263. Scaling Laws from the Data Manifold Dimension

  264. Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining

  265. Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

  266. Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

  267. Scaling Laws for Neural Language Models

  268. Reformer: The Efficient Transformer

  269. What does BERT dream of? A visual investigation of nightmares in Sesame Street

  270. Generative Language Modeling for Automated Theorem Proving § Experiments

  271. Plug and Play Language Models: A Simple Approach to Controlled Text Generation

  272. How Can We Know What Language Models Know?

  273. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

  274. Generalization through Memorization: Nearest Neighbor Language Models

  275. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

  276. CTRL: A Conditional Transformer Language Model For Controllable Generation

  277. Smaller, faster, cheaper, lighter: Introducing DistilGPT, a distilled version of GPT

  278. Language Modeling State-of-the-art leaderboards

  279. Neural Text Generation with Unlikelihood Training

  280. GROVER: Defending Against Neural Fake News

  281. Generative Modeling with Sparse Transformers: We’ve developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30× longer than possible previously

  282. The Curious Case of Neural Text Degeneration

  283. Smart Vet: Autocompleting Sentences in Veterinary Medical Records

  284. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  285. Music Transformer: Generating Music with Long-Term Structure

  286. Universal Transformers

  287. Adversarial Reprogramming of Neural Networks

  288. GPT-1: Improving Language Understanding with Unsupervised Learning

  289. GPT-1: Improving Language Understanding by Generative Pre-Training

  290. GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications

  291. Deep reinforcement learning from human preferences § Appendix A.2: Atari

  292. Learning to Generate Reviews and Discovering Sentiment

  293. Design a Role-Playing Game Using 200 Words or Less.

  294. 68dc29784a54b4b94a1215b358244b267755a4e1.html

  295. How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning

  296. bdf17c80e1ed5dc516811f03acef03415b220143.html

  297. AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.

  298. 52d2fe5633e74d1f355221dba088b17ff34db79d.html

  299. Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷‍♂️ 🤔 🤔 🤔 💯]

  300. OpenAI API Alchemy: Emoji Storytelling 🤖

  301. Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference

  302. I Blew $720 on 100 Notebooks from Alibaba and Started a Paper Website Business

  303. b295888386dd5c1f2e2a679fd7b84432811d3917.html

  304. AlphaStar: Mastering the Real-Time Strategy Game StarCraft II

  305. Transformers As Variational Autoencoders

  306. BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.

  307. Efficient, Reusable RNNs and LSTMs for Torch

  308. Updated Training?

  309. Karpathy/minGPT: A Minimal PyTorch Re-Implementation of the OpenAI GPT (Generative Pretrained Transformer) Training

  310. Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.

  311. Loom: Multiversal Tree Writing Interface for Human-AI Collaboration

  312. ab1b1b61962d42831ad82c1ecaab2a7d3aef8423.html

  313. Zphang/minimal-Opt

  314. Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.

  315. Deep Learning for Assisting the Process of Music Composition (part 3)

  316. Google DeepMind’s Grandmaster-Level Chess Without Search

  317. The Technology Behind BLOOM Training

  318. Psych-101 Dataset [For Centaur]

  319. The Gostak

  320. Imprompter

  321. Your Next New Best Friend Might Be a Robot

  322. I Made a Custom Gpt That Incorporates Advertisement/product Placement With Its...

  323. The Annotated Transformer

  324. Homepage of Paul F. Christiano

  325. Data Exfiltration from Slack AI via Indirect Prompt Injection

  326. Introductory Antimemetics (abandoned First Draft)

  327. Jared Kaplan

  328. Meditations on Moloch

  329. Stream Seaandsailor

  330. Humans Who Are Not Concentrating Are Not General Intelligences

  331. Monitor: An AI-Driven Observability Interface

  332. This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet

  333. 22435625719a0806d8474097cd0740ce131d6684.html

  334. AMA Conjecture, A New Alignment Startup

  335. WikiCrow

  336. 1754102bd82fb703b3c17ea47232c04c76f7452e.html

  337. ChatGPT As Muse, Not Oracle

  338. Interpreting GPT: the Logit Lens

  339. Assessing AlephAlpha’s Multimodal Model

  340. Is GPT-3 a Good Rationalist?

  341. We Are Conjecture, A New Alignment Research Startup

  342. Investigating Causal Understanding in LLMs

  343. A One-Question Turing Test for GPT-3

  344. This Mystical Book Was Co-Authored by a Disturbingly Realistic AI

  345. The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong

  346. Season 1 Ep. 22 OpenAI's Ilya Sutskever: The Man Who Made AI Work

  347. WELM

  348. I'Ve Been Testing the Largest of @OpenAI's Models With AI Dungeon and Been Constantly Impressed at How Interesting and Dynamic the Characters Are, like This Queen, Long Thought to Be Dead, Hiding from Enemies and Not Happy about Me Prying into Her Personal Life.

  349. OpenAI now generates about 100 billion words per day

  350. design#future-tag-features

    [Transclude the forward-link's context]

  351. 2024-zhao-figure2-roughllmdecisionboundariesonsimplebinaryclassificationtaskdespite128examples.png

  352. 2023-03-20-gpt4-scottalexander-halfanhourbeforedawninsanfranciscosample.png

  353. 2023-bommarito-figure1-gpt3cpaaccountingexamperformancebyexamsection.jpg

  354. 2023-bommarito-figure2-progressofgpt3overtimeoncpaaccountingexam.jpg

  355. 2023-jakesch-figure10-participantsdidnotnoticemodelslanttowardsapositionaffectedtheirownargumentwriting.jpg

  356. 2023-jakesch-figure3-participantseditorialwritingaboutsocialmediabenefitswereaffectedbygpt3promptslants.jpg

  357. 2023-jakesch-figure5-hastyparticipantseditorialwritingweremoreaffectedbyslantedgpt3prompts.jpg

  358. 2023-jakesch-figure6-slantedmodelpromptschangedpeoplesopinionafterwritinganeditorial.jpg

  359. 2023-jakesch-figure9-participantsdidnotnoticemodelslanttowardsaposition.jpg

  360. 2023-qin-figure1-chatgptvsgpt35on20nlpdatasets.png

  361. 2022-08-19-gwern-meme-deathknockingatdoor-deeplearningscalingsuccesses.png

  362. 2022-08-06-gwern-meme-netflixliegirl-studyingdeeplearningscaling.jpg

  363. 2022-05-22-gwern-meme-tintinwhataweekhuh-2ndanniversaryofgpt3paper.png

  364. 2022-bommarito-figure1-gpt3performanceonbarexambycategory.jpg

  365. 2022-bommarito-figure2-increaseofgpt3modelaccuracyonbarexambysize.jpg

  366. 2022-ganguli-figure2-visualizationofsuccessfulredteamattacksonlanguagemodels.png

  367. 2021-05-25-naver-hyperclova-computescaling0137bto82b.jpg

  368. 2021-01-11-gwern-meme-dogbarkcanthurtyou-aiscaling.jpg

  369. 2021-almeida-figure2-lhoptgpt3hyperparametertuningscalinglaw.jpg

  370. 2021-almeida-figure3-lhoptlearnedhyperparameteroptimizationongpt2largewikitext103speedupdouble.jpg

  371. 2021-askell-figure2-prmptingimprovesalignmentwithmodelscalingwithdecreasingalignmenttax.jpg

  372. 2021-askell-figure5-anthropcgptlearnshumanpreferencesatn500withgreaterscale.jpg

  373. 2021-dou-figure2-errorsbymodel.png

  374. 2021-dou-figure3-errorsbytype.png

  375. 2021-dou-figure4-errorsbydecodingsamplingstrategyhyperparameters.png

  376. 2021-hernandez-transferlearning-figure2-transferscaling.png

  377. 2021-kim-figure4-datatransferfromenglishtochinese.jpg

  378. 2021-kim-figure5-transferfromenglishtochinesespanishgerman.jpg

  379. 2021-nogueira-figure1-additionperformanceofnumberorthographies.png

  380. 2021-solaiman-figure3-largergpt3modelsfinetunebetteronpalmstoxicitydataset.jpg

  381. 2020-06-21-openai-beta-gpt3-playgroundui.png

  382. 2020-06-18-karpathy-expandingbrainmeme-gpt3metalearning.jpg

  383. 2020-04-01-gwern-gpt2-5k-midi-training.png

  384. 2020-02-03-gpt21.5b-archiveofourownao3-model-510427-samples-topp090.txt

  385. 2020-02-03-gpt21.5b-videogamewalkthrough-model-174925-samples-topp090.txt

  386. 2020-01-20-gwern-gpt2-25k-midi-training.png

  387. 2020-01-15-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-finalrun.png

  388. 2020-bostrom-unigramlm-figure1-unigramlmvsbpe.png

  389. 2020-brown-figure31-gpt3scaling.png

  390. 2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.jpg

  391. 2020-brown-gpt3-figure13-meanperformancescalingcurve.png

  392. 2020-hendrycks-figure1b-gpt3-qascaling.png

  393. 2020-henighan-figure1-scalingacrossdomains.jpg

  394. 2020-henighan-figure11-pretrainingimageclassificationscaling.png

  395. 2020-henighan-figure2-universalmodelsizescaling.jpg

  396. 2020-henighan-figure3-domainmodelsizescaling.png

  397. 2020-henighan-figure31-qandamodelscaling.jpg

  398. 2020-henighan-table1-autoregressivemodelsscalingpowerlaws.png

  399. 2020-kaplan-appendix1-summaryofneurallanguagemodelscalingpowerlaws.png

  400. 2020-kaplan-figure1-dlscaling.jpg

  401. 2020-kaplan-figure15-projectingscaling.png

  402. 2020-kaplan-figure7-scalingrnnsvstransformersshowsrnnplateau.png

  403. 2020-zhang-figure1-thelikelihoodtrap.png

  404. 2019-12-26-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-collapse.png

  405. 2019-12-17-gwern-gpt2-preferencelearning-abc-terminal.png

  406. 2019-12-16-gwern-gpt2-15b-poetry-tensorboard-100tputraining.png

  407. 2019-12-13-gwern-gpt2-15b-poetry-tensorboard-97tputraining.png

  408. 2019-12-13-gwern-gpt2-preferencelearning-abc-combinedmodel-halfbounce.png

  409. 2019-12-12-gwern-gpt2-abc-score-polkaebbbab.png

  410. 2019-11-19-gwern-gpt2-15b-poetry-tensorboard-1tputraining.jpg

  411. 2019-keskar-table1-ctrlsamplesdemonstratingmetadatainfluenceontextcompletions.png

  412. 2019-keskar-table2-ctrltextsamplesusingonlymetadatawithoutaprompt.png

  413. 2019-keskar-table3-ctrltextsamplesshowinginfluenceofurllinksasprefixmetadata.png

  414. 2019-keskar-table4-ctrltextsamplesusingtemplatizedcontrolcodesforspecifictaskslikeqaortranslation.png

  415. 2019-keskar-table5-ctrltextsamplesmixingzeroshotgeneralizationofmetadata.png

  416. 2019-keskar-table7-datasetsandcontrolcodesmetadata.png

  417. 2019-openai-gpt2-demo-recyclingtextsample.jpg

  418. 2019-radford-figure4-gpt2validationloss.jpg

  419. 2019-ziegler-preferencelearning-figure1-architecture.png

  420. 2018-huang-magenta-musictransformer-attentionvisualization.jpg

  421. https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html

  422. https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/

  423. 06933b9b9a363a8ba3702bd147712068db1eb095.html

  424. https://ai.meta.com/blog/meta-llama-3/

  425. https://amaarora.github.io/2020/02/18/annotatedGPT2.html

  426. cb2faf56d50cf24fedb37741275b7c7d7b5ab2ba.html

  427. https://analyticsindiamag.com/when-chatgpt-attempted-upsc-exam/

  428. 8981287b811c3f5fa5cc43daa446d65b4dbff19d.html

  429. https://blog.eleuther.ai/trlx-exploratory-analysis/

  430. https://blog.helix.ml/p/how-we-got-fine-tuning-mistral-7b

  431. https://carper.ai/instruct-gpt-announcement/

  432. 766cb8b990cab4b23efdc653265df90bc0acb688.html

  433. https://chat.lmsys.org/

  434. https://colab.research.google.com/drive/1c6VccMPsOMAUQCKU4BVDRd5Y32qkozmK

  435. https://culture.org/ghosts/

  436. c1635afce38b785e481517daa051f35e788f2d67.html

  437. https://davidrozado.substack.com/p/the-political-preferences-of-llms

  438. https://embracethered.com/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/#responsible-disclosure

  439. 938a66908d685ba0973f77a6f0d816e0c639a763.html#responsible-disclosure

  440. https://eprint.iacr.org/2021/686

  441. https://gandalf.lakera.ai/

  442. https://github.com/NolanoOrg/llama-int4-quant/

  443. https://github.com/antimatter15/alpaca.cpp

  444. https://github.com/brexhq/prompt-engineering

  445. https://github.com/castorini/transformers-arithmetic

  446. https://github.com/epfLLM/meditron

  447. https://github.com/gh18l/CrawlGPT

  448. https://github.com/greshake/llm-security

  449. dee78cf7ed56d108fe08e2dc89c1ffa9152bef7f.html

  450. https://github.com/jujumilk3/leaked-system-prompts/tree/main

  451. https://github.com/langchain-ai/langchain

  452. https://github.com/qwopqwop200/GPTQ-for-LLaMa

  453. https://github.com/ray-project/llm-numbers

  454. https://github.com/sanjeevanahilan/nanoChatGPT

  455. https://github.com/sberbank-ai/ru-gpts

  456. https://github.com/sgrvinod/chess-transformers

  457. https://hedgehogreview.com/issues/markets-and-the-good/articles/language-machinery

  458. https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf

  459. https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion

  460. https://huggingface.co/blog/rlhf

  461. https://jaykmody.com/blog/gpt-from-scratch/

  462. 117a3f3397e403dbf919c3b186127f6858466e66.html

  463. https://justine.lol/oneliners/

  464. 28689672dc2eb3f54bf94739d33bc4e016693ba5.html

  465. https://meteorfrom.space/

  466. e90b927ed9f69f6be08dc4013bc3c118d999de4b.html

  467. https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf

  468. https://minimaxir.com/2023/03/new-chatgpt-overlord/

  469. be35928db14bc14ec4e887559e50a2562a010f80.html

  470. https://openai.com/blog/customizing-gpt-3

  471. https://openai.com/blog/our-approach-to-alignment-research/

  472. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4627587

  473. https://platform.openai.com/docs/gptbot

  474. 616202c012ede04b03ea65cc1d5466b95e632256.html

  475. https://platform.openai.com/docs/guides/gpt-best-practices

  476. 9033b0629fd31a0385ba66b2116ac2ac35de7943.html

  477. https://platform.openai.com/docs/guides/prompt-engineering

  478. https://promptarmor.substack.com/p/data-exfiltration-from-writercom

  479. https://qwenlm.github.io/blog/qwen1.5-32b/

  480. 08bde6ef08ee40f2148261db1032a29c3eb72c80.html

  481. https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/

  482. https://simonwillison.net/2023/Apr/14/worst-that-can-happen/

  483. https://simonwillison.net/2023/Aug/3/weird-world-of-llms/

  484. https://techtualist.substack.com/p/i-wrote-a-script-for-gpt-3-to-take

  485. https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release

  486. https://til.simonwillison.net/llms/llama-7b-m2

  487. 28a21d92edf1a849d82e5c7825697e68cb30916b.html

  488. https://web.archive.org/web/20240102075620/https://www.jailbreakchat.com/

  489. https://www.alignmentforum.org/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais

  490. https://www.askviable.com/blog/why-we-chose-gpt-3-embeddings-for-the-clustering-behind-our-feedback-reports

  491. 31ddb5cc20ec19860de9803a3df7abce079680cb.html

  492. https://www.brown.edu/news/2023-04-25/open-web-text

  493. c0bcbe0d3fa93bef8abde06c137af0115131a2a8.html

  494. https://www.buildt.ai/blog/viral-ripout

  495. 578673ced29982f87eb8e930f5e6d692a44fed4e.html

  496. https://www.forbes.com/sites/thomasbrewster/2023/11/16/chatgpt-becomes-a-social-media-spy-assistant/

  497. https://www.forefront.ai/blog-posts/how-to-fine-tune-gpt-neox

  498. 6e78db460ac938fc3f6a6d56837b2adf8ef99428.html

  499. https://www.freaktakes.com/p/the-past-and-present-of-computer

  500. https://www.lawfaremedia.org/article/chatgpt-unbound

  501. https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

  502. https://www.lesswrong.com/posts/GyaDCzsyQgc48j8t3/linear-encoding-of-character-level-information-in-gpt-j

  503. https://www.lesswrong.com/posts/PDLfpRwSynu73mxGw/basic-facts-about-language-model-internals-1

  504. https://www.lesswrong.com/posts/SCqDipWAhZ49JNdmL/paper-llms-trained-on-a-is-b-fail-to-learn-b-is-a#eKhSncieBquLsFTXZ

  505. https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee

  506. https://www.lesswrong.com/posts/axxnpQi8FyBPE4rbq/hutter-prize-for-prompts?commentId=WKNXFtQWzfSs9mGih

  507. https://www.lesswrong.com/posts/dFbfCLZA4pejckeKc/a-mechanistic-explanation-for-solidgoldmagikarp-like-tokens

  508. https://www.lesswrong.com/posts/etoMr4vcnP7joQHWa/notes-from-a-prompt-factory

  509. https://www.lesswrong.com/posts/wqRqb7h6ZC48iDgfK/tentatively-found-600-monosemantic-features-in-a-small-lm

  510. https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

  511. https://www.nature.com/articles/d41586-021-00530-0

  512. https://www.nature.com/articles/s41586-023-06647-8

  513. https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html

  514. https://www.oneusefulthing.org/p/working-with-ai-two-paths-to-prompting

  515. https://www.politico.eu/article/italian-privacy-regulator-bans-chatgpt/

  516. https://www.reddit.com/r/ChatGPT/comments/12xai7j/spamming_the_word_stop_2300_times_or_probably_any/

  517. https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/

  518. https://www.reddit.com/r/GPT3/comments/ra6nk4/had_gpt3_generate_the_onion_headlines/

  519. e4720e5d01ba14d0e9b13bb2c42a6d37923c65df.html

  520. https://www.reddit.com/r/GPT3/comments/tgud2t/my_new_favorite_thing_is_making_gpt3_create/

  521. d13049475d40f32148a1a313cb8b0bc45edb885e.html

  522. https://www.reddit.com/r/MachineLearning/comments/12xwzt9/d_be_careful_with_user_facing_apps_using_llms/

  523. https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/

  524. 9d2c91d560fbecb82b5e11fc87b57cf391b9a95a.html

  525. https://www.roft.io/

  526. https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/

  527. 70bbc07b9a27d32fd8b69bba025aceee6628bd66.html

  528. https://www.youtube.com/watch?v=dO4TPJkeaaU

  529. https://www.youtube.com/watch?v=kLC8AHZX4N8&t=394s

  530. https://x.com/Altimor/status/1825659507617460439

  531. https://x.com/ChrisJBakke/status/1736533308849443121

  532. https://x.com/D_Rod_Tweets/status/1628030272745746432

  533. https://x.com/DrJimFan/status/1631709224387624962

  534. https://x.com/ESYudkowsky/status/1635577836525469697

  535. https://x.com/NolanoOrg/status/1634027966651834370

  536. https://x.com/OfficialLoganK/status/1664476604658069511

  537. https://x.com/OpenAI/status/1676072388436594688

  538. https://x.com/RiversHaveWings/status/1459646450275553285

  539. https://x.com/arvind_io/status/1488257004783112192

  540. https://x.com/bentossell/status/1598673037976543240

  541. https://x.com/chillzaza_/status/1710795541087469647

  542. https://x.com/colin_fraser/status/1630763222671454208

  543. https://x.com/colin_fraser/status/1635350490484719618

  544. https://x.com/colin_fraser/status/1635360285187018752

  545. https://x.com/colin_fraser/status/1635450606013014016

  546. https://x.com/cyrilzakka/status/1646532570597982208

  547. https://x.com/dust4ai/status/1587104029712203778

  548. https://x.com/emollick/status/1759633391098732967

  549. https://x.com/fluffykittnmeow/status/1737639861350269213

  550. https://x.com/francoisfleuret/status/1714531085512544760

  551. https://x.com/goodside/status/1558622567635865600

  552. https://x.com/jakezward/status/1728032639402037610

  553. https://x.com/jamesjyu/status/1467568693806649346

  554. https://x.com/jheitzeb/status/1612130278293803009

  555. https://x.com/jjvincent/status/1648594881198039040

  556. https://x.com/labenz/status/1611750398712332292

  557. https://x.com/mathemagic1an/status/1595410144522813440

  558. https://x.com/mayfer/status/1581388723635523584

  559. https://x.com/mranti/status/1822959677216551095

  560. https://x.com/parisba/status/1719523035450167535

  561. https://x.com/repligate/status/1630593115407937536

  562. https://x.com/sdtoyer/status/1729933591541670287

  563. https://x.com/thesephist/status/1617747154231259137

  564. https://x.com/thisisdaleb/status/1628891229562847233

  565. https://x.com/thiteanish/status/1635188333705043969

  566. https://x.com/voooooogel/status/1865189744776507809

  567. https://x.com/wenquai/status/1748016021808595242

  568. https://x.com/woj_zaremba/status/1191773448999034880

  569. https://x.com/yacineMTB/status/1737523618832425273

  570. Interpretable Contrastive Monte Carlo Tree Search Reasoning

  571. https%253A%252F%252Farxiv.org%252Fabs%252F2410.01707.html

  572. Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness

  573. https%253A%252F%252Farxiv.org%252Fabs%252F2408.05446.html

  574. Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

  575. https%253A%252F%252Farxiv.org%252Fabs%252F2406.20086.html

  576. Resolving Discrepancies in Compute-Optimal Scaling of Language Models

  577. https%253A%252F%252Farxiv.org%252Fabs%252F2406.19146.html

  578. When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models

  579. https%253A%252F%252Farxiv.org%252Fabs%252F2406.13131.html

  580. DataComp-LM: In search of the next generation of training sets for language models

  581. Luke Zettlemoyer

  582. https%253A%252F%252Farxiv.org%252Fabs%252F2406.11794.html

  583. MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B

  584. https%253A%252F%252Farxiv.org%252Fabs%252F2406.07394.html

  585. Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

  586. https%253A%252F%252Farxiv.org%252Fabs%252F2405.18400.html

  587. From r to Q: Your Language Model is Secretly a Q-Function

  588. https%253A%252F%252Farxiv.org%252Fabs%252F2404.12358.html

  589. CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge

  590. https%253A%252F%252Farxiv.org%252Fabs%252F2404.06664.html

  591. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)

  592. https%253A%252F%252Farxiv.org%252Fabs%252F2402.17152%2523facebook.html

  593. Fast Adversarial Attacks on Language Models In One GPU Minute

  594. https%253A%252F%252Farxiv.org%252Fabs%252F2402.15570.html

  595. Autonomous Data Selection with Language Models for Mathematical Texts

  596. https%253A%252F%252Farxiv.org%252Fabs%252F2402.07625.html

  597. Grandmaster-Level Chess Without Search

  598. https%253A%252F%252Farxiv.org%252Fabs%252F2402.04494%2523deepmind.html

  599. SliceGPT: Compress Large Language Models by Deleting Rows and Columns

  600. https%253A%252F%252Farxiv.org%252Fabs%252F2401.15024%2523microsoft.html

  601. TinyLlama: An Open-Source Small Language Model

  602. https%253A%252F%252Farxiv.org%252Fabs%252F2401.02385.html

  603. TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

  604. https%253A%252F%252Farxiv.org%252Fabs%252F2312.16862.html

  605. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

  606. https%253A%252F%252Farxiv.org%252Fabs%252F2311.16079.html

  607. OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say

  608. https%253A%252F%252Fwww.reuters.com%252Ftechnology%252Fsam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22%252F.html

  609. OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

  610. https%253A%252F%252Farxiv.org%252Fabs%252F2310.06786.html

  611. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

  612. https%253A%252F%252Farxiv.org%252Fabs%252F2309.12284.html

  613. Language Modeling Is Compression

  614. https%253A%252F%252Farxiv.org%252Fabs%252F2309.10668%2523deepmind.html

  615. Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

  616. https%253A%252F%252Farxiv.org%252Fabs%252F2306.07567.html

  617. DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

  618. Percy Liang

  619. https%253A%252F%252Farxiv.org%252Fabs%252F2305.10429%2523google.html

  620. Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot

  621. https%253A%252F%252Fwww.forbes.com%252Fsites%252Falexkonrad%252F2023%252F05%252F02%252Finflection-ai-ex-deepmind-launches-pi-chatbot%252F.html

  622. Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

  623. https%253A%252F%252Farxiv.org%252Fabs%252F2304.06762%2523nvidia.html

  624. How Large-Language Models Can Revolutionize Military Planning

  625. https%253A%252F%252Fwarontherocks.com%252F2023%252F04%252Fhow-large-language-models-can-revolutionize-military-planning%252F.html

  626. The Quantization Model of Neural Scaling

  627. https%253A%252F%252Farxiv.org%252Fabs%252F2303.13506.html

  628. Int-4 LLaMa is not enough—Int-3 and beyond: More compression, easier to build apps on LLMs that run locally

  629. https%253A%252F%252Fnolanoorg.substack.com%252Fp%252Fint-4-llama-is-not-enough-int-3-and.html

  630. Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan

  631. https%253A%252F%252Fosf.io%252F5uxra%252F.html

  632. SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

  633. https%253A%252F%252Farxiv.org%252Fabs%252F2302.13939.html

  634. Data Selection for Language Models via Importance Resampling

  635. Percy Liang

  636. https%253A%252F%252Farxiv.org%252Fabs%252F2302.03169.html

  637. A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A new wave of chat bots like ChatGPT use artificial intelligence that could reinvent or even replace the traditional internet search engine

  638. https://x.com/cademetz

  639. https%253A%252F%252Fwww.nytimes.com%252F2022%252F12%252F21%252Ftechnology%252Fai-chatgpt-google-search.html.html

  640. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

  641. https%253A%252F%252Farxiv.org%252Fabs%252F2211.10438.html

  642. InstructPix2Pix: Learning to Follow Image Editing Instructions

  643. https%253A%252F%252Farxiv.org%252Fabs%252F2211.09800.html

  644. Galactica: A Large Language Model for Science

  645. https%253A%252F%252Farxiv.org%252Fabs%252F2211.09085%2523facebook.html

  646. Large Language Models Struggle to Learn Long-Tail Knowledge

  647. Colin Raffel

  648. https%253A%252F%252Farxiv.org%252Fabs%252F2211.08411.html

  649. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

  650. https%253A%252F%252Farxiv.org%252Fabs%252F2210.17323.html

  651. Evaluating Parameter Efficient Learning for Generation

  652. https%253A%252F%252Farxiv.org%252Fabs%252F2210.13673%2523nvidia.html

  653. BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

  654. https%253A%252F%252Farxiv.org%252Fabs%252F2210.10341%2523microsoft.html

  655. Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

  656. https%253A%252F%252Farxiv.org%252Fabs%252F2210.15458%2523google.html

  657. Foundation Transformers

  658. Furu Wei

  659. https%253A%252F%252Farxiv.org%252Fabs%252F2210.06423%2523microsoft.html

  660. Ask Me Anything (AMA): A simple strategy for prompting language models

  661. https%253A%252F%252Farxiv.org%252Fabs%252F2210.02441.html

  662. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

  663. Hannaneh Hajishirzi—University of Washington

  664. https%253A%252F%252Farxiv.org%252Fabs%252F2210.01241.html

  665. LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

  666. Sergey Levine

  667. https%253A%252F%252Farxiv.org%252Fabs%252F2207.04429.html

  668. ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

  669. https%253A%252F%252Farxiv.org%252Fabs%252F2206.01861%2523microsoft.html

  670. Shared computational principles for language processing in humans and deep language models

  671. Omer Levy

  672. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41593-022-01026-4.html

  673. Vector-quantized Image Modeling with Improved VQGAN

  674. https%253A%252F%252Farxiv.org%252Fabs%252F2110.04627%2523google.html

  675. Brains and algorithms partially converge in natural language processing

  676. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs42003-022-03036-1.html

  677. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

  678. https%253A%252F%252Farxiv.org%252Fabs%252F2201.11990%2523microsoftnvidia.html

  679. WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

  680. Noah A. Smith

  681. https%253A%252F%252Fswabhs.com%252Fassets%252Fpdf%252Fwanli.pdf%2523allen.html

  682. Improving language models by retrieving from trillions of tokens

  683. Karen Simonyan

  684. https%253A%252F%252Farxiv.org%252Fabs%252F2112.04426%2523deepmind.html

  685. True Few-Shot Learning with Prompts—A Real-World Perspective

  686. https%253A%252F%252Farxiv.org%252Fabs%252F2111.13440.html

  687. CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

  688. Jianfeng Gao at Microsoft Research

  689. https%253A%252F%252Farxiv.org%252Fabs%252F2111.02570%2523microsoft.html

  690. Fast Model Editing at Scale

  691. https%253A%252F%252Farxiv.org%252Fabs%252F2110.11309.html

  692. General-Purpose Question-Answering with Macaw

  693. https%253A%252F%252Farxiv.org%252Fabs%252F2109.02593%2523allen.html

  694. RASP: Thinking Like Transformers

  695. https%253A%252F%252Farxiv.org%252Fabs%252F2106.06981.html

  696. ByT5: Towards a token-free future with pre-trained byte-to-byte models

  697. Colin Raffel

  698. https%253A%252F%252Farxiv.org%252Fabs%252F2105.13626%2523google.html

  699. Naver unveils first ‘hyperscale’ AI platform

  700. https%253A%252F%252Fm.koreaherald.com%252Fview.php%253Fud%253D20210525000824%2523naver.html

  701. Generative Language Modeling for Automated Theorem Proving

  702. https%253A%252F%252Farxiv.org%252Fabs%252F2009.03393%2523openai.html

  703. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

  704. https%253A%252F%252Faclanthology.org%252F2020.acl-main.463.pdf.html

  705. Scaling Laws from the Data Manifold Dimension

  706. Jared Kaplan

  707. https%253A%252F%252Farxiv.org%252Fabs%252F2004.10802.html

  708. Scaling Laws for Neural Language Models

  709. Jared Kaplan

  710. Sam McCandlish

  711. Alec Radford

  712. https%253A%252F%252Farxiv.org%252Fabs%252F2001.08361%2523openai.html

  713. Reformer: The Efficient Transformer

  714. Łukasz Kaiser

  715. https%253A%252F%252Farxiv.org%252Fabs%252F2001.04451%2523google.html

  716. CTRL: A Conditional Transformer Language Model For Controllable Generation

  717. Caiming Xiong—Home Page

  718. Richard Socher

  719. https%253A%252F%252Farxiv.org%252Fabs%252F1909.05858%2523salesforce.html

  720. Language Modeling State-of-the-art leaderboards

  721. https%253A%252F%252Fpaperswithcode.com%252Ftask%252Flanguage-modelling.html

  722. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  723. Zihang Dai

  724. Zhilin Yang

  725. https://www.cs.cmu.edu/~./yiming/

  726. https%253A%252F%252Farxiv.org%252Fabs%252F1901.02860.html

  727. Music Transformer: Generating Music with Long-Term Structure

  728. https%253A%252F%252Fmagenta.tensorflow.org%252Fmusic-transformer.html

  729. GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications

  730. Alec Radford

  731. Tim Salimans

  732. https%253A%252F%252Fs3-us-west-2.amazonaws.com%252Fopenai-assets%252Fresearch-covers%252Flanguage-unsupervised%252Flanguage_understanding_paper.pdf%2523page%253D5.html

  733. Homepage of Paul F. Christiano

  734. https%253A%252F%252Fpaulfchristiano.com%252F.html