Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
2:4 Sparse Llama: Smaller Models for Efficient GPU Inference
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
Do LLMs estimate uncertainty well in instruction-following?
Interpretable Contrastive Monte Carlo Tree Search Reasoning
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
DataComp-LM: In search of the next generation of training sets for language models
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs
Discovering Preference Optimization Algorithms with and for Large Language Models
MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B
For Chinese Students, the New Tactic Against AI Checks: More AI
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Towards smaller, faster decoder-only transformers: Architectural variants and their implications
Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences
From r to Q✱: Your Language Model is Secretly a Q-Function
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)
Fast Adversarial Attacks on Language Models In One GPU Minute
Autonomous Data Selection with Language Models for Mathematical Texts
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Excuse me, sir? Your language model is leaking (information)
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Reasons to Reject? Aligning Language Models with Judgments
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching
OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say
Positional Description Matters for Transformers Arithmetic
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Learn Your Tokens: Word-Pooled Tokenization for Language Modeling
In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries
Let Models Speak Ciphers: Multiagent Debate through Embeddings
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
xVal: A Continuous Number Encoding for Large Language Models
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Anchor Points: Benchmarking Models with Much Fewer Examples
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
Language Reward Modulation for Pretraining Reinforcement Learning
ReST: Reinforced Self-Training (ReST) for Language Modeling
Studying Large Language Model Generalization with Influence Functions
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
Improving Long-Horizon Imitation Through Instruction Prediction
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Improving Language Models with Advantage-based Offline Policy Gradients
Accelerating Transformer Inference for Translation via Parallel Decoding
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Memorization for Good: Encryption with Autoregressive Language Models
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot
Emergent and Predictable Memorization in Large Language Models
A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
How Large-Language Models Can Revolutionize Military Planning
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Int-4 LLaMa is not enough—Int-3 and beyond: More compression, easier to build apps on LLMs that run locally
Rewarding Chatbots for Real-World Engagement with Millions of Users
Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
Data Selection for Language Models via Importance Resampling
Big Tech was moving cautiously on AI. Then came ChatGPT. Google, Facebook and Microsoft helped build the scaffolding of AI. Smaller companies are taking it to the masses, forcing Big Tech to react
Rock Guitar Tablature Generation via Natural Language Processing
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A new wave of chat bots like ChatGPT use artificial intelligence that could reinvent or even replace the traditional internet search engine
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
InstructPix2Pix: Learning to Follow Image Editing Instructions
Large Language Models Struggle to Learn Long-Tail Knowledge
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
What is my math transformer doing? – 3 results on interpretability and generalization
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Can language models handle recursively nested grammatical structures? A case study on comparing models and humans
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Ask Me Anything (AMA): A simple strategy for prompting language models
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Sparrow: Improving alignment of dialogue agents via targeted human judgements
Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators
Petals: Collaborative Inference and Fine-tuning of Large Models
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Language models show human-like content effects on reasoning
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Quark: Controllable Text Generation with Reinforced Unlearning
RankGen: Improving Text Generation with Large Ranking Models
What Language Model to Train if You Have One Million GPU Hours?
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Shared computational principles for language processing in humans and deep language models
Brains and algorithms partially converge in natural language processing
InPars: Data Augmentation for Information Retrieval using Large Language Models
Data Scaling Laws in NMT: The Effect of Noise and Architecture
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts
Improving language models by retrieving from trillions of tokens
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Long-range and hierarchical language predictions in brains and algorithms
True Few-Shot Learning with Prompts—A Real-World Perspective
Evaluating Distributional Distortion in Neural Language Modeling
On Transferability of Prompt Tuning for Natural Language Understanding
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
Towards a Unified View of Parameter-Efficient Transfer Learning
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
An Empirical Exploration in Quality Filtering of Text Data
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Anthropic raises $124 million to build more reliable, general AI systems
Learning Chess Blindfolded: Evaluating Language Models on State Tracking
Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
Proof Artifact Co-training for Theorem Proving with Language Models
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Making Pre-trained Language Models Better Few-shot Learners
Thinking ahead: prediction in context as a keystone of language in humans and machines
CPM: A Large-scale Generative Chinese Pre-trained Language Model
L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
The neural architecture of language: Integrative reverse-engineering converges on a model for predictive processing
RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
Generative Language Modeling for Automated Theorem Proving
Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Trading Off Diversity and Quality in Natural Language Generation
Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining
Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
What does BERT dream of? A visual investigation of nightmares in Sesame Street
Generative Language Modeling for Automated Theorem Proving § Experiments
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
Generalization through Memorization: Nearest Neighbor Language Models
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
CTRL: A Conditional Transformer Language Model For Controllable Generation
Smaller, faster, cheaper, lighter: Introducing DistilGPT, a distilled version of GPT
Generative Modeling with Sparse Transformers: We’ve developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30× longer than possible previously
Smart Vet: Autocompleting Sentences in Veterinary Medical Records
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Music Transformer: Generating Music with Long-Term Structure
GPT-1: Improving Language Understanding with Unsupervised Learning
GPT-1: Improving Language Understanding by Generative Pre-Training
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications
Deep reinforcement learning from human preferences § Appendix A.2: Atari
How Does In-Context Learning Work? A Framework for Understanding the Differences from Traditional Supervised Learning
AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.
Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]
Llama-3.1-405B Now Runs at 969 Tokens/s on Cerebras Inference
I Blew $720 on 100 Notebooks from Alibaba and Started a Paper Website Business
AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.
Karpathy/minGPT: A Minimal PyTorch Re-Implementation of the OpenAI GPT (Generative Pretrained Transformer) Training
Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.
Loom: Multiversal Tree Writing Interface for Human-AI Collaboration
Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.
Deep Learning for Assisting the Process of Music Composition (part 3)
I Made a Custom Gpt That Incorporates Advertisement/product Placement With Its...
Data Exfiltration from Slack AI via Indirect Prompt Injection
Humans Who Are Not Concentrating Are Not General Intelligences
This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet
This Mystical Book Was Co-Authored by a Disturbingly Realistic AI
The Guy Behind the Fake AI Halloween Parade Listing Says You’ve Got It All Wrong
Season 1 Ep. 22 OpenAI's Ilya Sutskever: The Man Who Made AI Work
I'Ve Been Testing the Largest of @OpenAI's Models With AI Dungeon and Been Constantly Impressed at How Interesting and Dynamic the Characters Are, like This Queen, Long Thought to Be Dead, Hiding from Enemies and Not Happy about Me Prying into Her Personal Life.
2024-zhao-figure2-roughllmdecisionboundariesonsimplebinaryclassificationtaskdespite128examples.png
2023-03-20-gpt4-scottalexander-halfanhourbeforedawninsanfranciscosample.png
2023-bommarito-figure1-gpt3cpaaccountingexamperformancebyexamsection.jpg
2023-bommarito-figure2-progressofgpt3overtimeoncpaaccountingexam.jpg
2023-jakesch-figure10-participantsdidnotnoticemodelslanttowardsapositionaffectedtheirownargumentwriting.jpg
2023-jakesch-figure3-participantseditorialwritingaboutsocialmediabenefitswereaffectedbygpt3promptslants.jpg
2023-jakesch-figure5-hastyparticipantseditorialwritingweremoreaffectedbyslantedgpt3prompts.jpg
2023-jakesch-figure6-slantedmodelpromptschangedpeoplesopinionafterwritinganeditorial.jpg
2023-jakesch-figure9-participantsdidnotnoticemodelslanttowardsaposition.jpg
2022-08-19-gwern-meme-deathknockingatdoor-deeplearningscalingsuccesses.png
2022-08-06-gwern-meme-netflixliegirl-studyingdeeplearningscaling.jpg
2022-05-22-gwern-meme-tintinwhataweekhuh-2ndanniversaryofgpt3paper.png
2022-bommarito-figure1-gpt3performanceonbarexambycategory.jpg
2022-bommarito-figure2-increaseofgpt3modelaccuracyonbarexambysize.jpg
2022-ganguli-figure2-visualizationofsuccessfulredteamattacksonlanguagemodels.png
2021-almeida-figure2-lhoptgpt3hyperparametertuningscalinglaw.jpg
2021-almeida-figure3-lhoptlearnedhyperparameteroptimizationongpt2largewikitext103speedupdouble.jpg
2021-askell-figure2-prmptingimprovesalignmentwithmodelscalingwithdecreasingalignmenttax.jpg
2021-askell-figure5-anthropcgptlearnshumanpreferencesatn500withgreaterscale.jpg
2021-dou-figure4-errorsbydecodingsamplingstrategyhyperparameters.png
2021-hernandez-transferlearning-figure2-transferscaling.png
2021-kim-figure5-transferfromenglishtochinesespanishgerman.jpg
2021-nogueira-figure1-additionperformanceofnumberorthographies.png
2021-solaiman-figure3-largergpt3modelsfinetunebetteronpalmstoxicitydataset.jpg
2020-06-18-karpathy-expandingbrainmeme-gpt3metalearning.jpg
2020-02-03-gpt21.5b-archiveofourownao3-model-510427-samples-topp090.txt
2020-02-03-gpt21.5b-videogamewalkthrough-model-174925-samples-topp090.txt
2020-01-15-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-finalrun.png
2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.jpg
2020-henighan-figure11-pretrainingimageclassificationscaling.png
2020-henighan-table1-autoregressivemodelsscalingpowerlaws.png
2020-kaplan-appendix1-summaryofneurallanguagemodelscalingpowerlaws.png
2020-kaplan-figure7-scalingrnnsvstransformersshowsrnnplateau.png
2019-12-26-gwern-gpt2-preferencelearning-abc-combinedmodel-klregularized-collapse.png
2019-12-16-gwern-gpt2-15b-poetry-tensorboard-100tputraining.png
2019-12-13-gwern-gpt2-15b-poetry-tensorboard-97tputraining.png
2019-12-13-gwern-gpt2-preferencelearning-abc-combinedmodel-halfbounce.png
2019-11-19-gwern-gpt2-15b-poetry-tensorboard-1tputraining.jpg
2019-keskar-table1-ctrlsamplesdemonstratingmetadatainfluenceontextcompletions.png
2019-keskar-table2-ctrltextsamplesusingonlymetadatawithoutaprompt.png
2019-keskar-table3-ctrltextsamplesshowinginfluenceofurllinksasprefixmetadata.png
2019-keskar-table4-ctrltextsamplesusingtemplatizedcontrolcodesforspecifictaskslikeqaortranslation.png
2019-keskar-table5-ctrltextsamplesmixingzeroshotgeneralizationofmetadata.png
2018-huang-magenta-musictransformer-attentionvisualization.jpg
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
https://analyticsindiamag.com/when-chatgpt-attempted-upsc-exam/
https://colab.research.google.com/drive/1c6VccMPsOMAUQCKU4BVDRd5Y32qkozmK
https://davidrozado.substack.com/p/the-political-preferences-of-llms
https://embracethered.com/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/#responsible-disclosure
938a66908d685ba0973f77a6f0d816e0c639a763.html#responsible-disclosure
https://github.com/jujumilk3/leaked-system-prompts/tree/main
https://hedgehogreview.com/issues/markets-and-the-good/articles/language-machinery
https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf
https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion
https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf
https://openai.com/blog/our-approach-to-alignment-research/
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4627587
https://platform.openai.com/docs/guides/gpt-best-practices
https://platform.openai.com/docs/guides/prompt-engineering
https://promptarmor.substack.com/p/data-exfiltration-from-writercom
https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/
https://simonwillison.net/2023/Apr/14/worst-that-can-happen/
https://techtualist.substack.com/p/i-wrote-a-script-for-gpt-3-to-take
https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release
https://web.archive.org/web/20240102075620/https://www.jailbreakchat.com/
https://www.alignmentforum.org/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais
https://www.askviable.com/blog/why-we-chose-gpt-3-embeddings-for-the-clustering-behind-our-feedback-reports
https://www.forbes.com/sites/thomasbrewster/2023/11/16/chatgpt-becomes-a-social-media-spy-assistant/
https://www.forefront.ai/blog-posts/how-to-fine-tune-gpt-neox
https://www.freaktakes.com/p/the-past-and-present-of-computer
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
https://www.lesswrong.com/posts/GyaDCzsyQgc48j8t3/linear-encoding-of-character-level-information-in-gpt-j
https://www.lesswrong.com/posts/PDLfpRwSynu73mxGw/basic-facts-about-language-model-internals-1
https://www.lesswrong.com/posts/SCqDipWAhZ49JNdmL/paper-llms-trained-on-a-is-b-fail-to-learn-b-is-a#eKhSncieBquLsFTXZ
https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee
https://www.lesswrong.com/posts/axxnpQi8FyBPE4rbq/hutter-prize-for-prompts?commentId=WKNXFtQWzfSs9mGih
https://www.lesswrong.com/posts/dFbfCLZA4pejckeKc/a-mechanistic-explanation-for-solidgoldmagikarp-like-tokens
https://www.lesswrong.com/posts/etoMr4vcnP7joQHWa/notes-from-a-prompt-factory
https://www.lesswrong.com/posts/wqRqb7h6ZC48iDgfK/tentatively-found-600-monosemantic-features-in-a-small-lm
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html
https://www.oneusefulthing.org/p/working-with-ai-two-paths-to-prompting
https://www.politico.eu/article/italian-privacy-regulator-bans-chatgpt/
https://www.reddit.com/r/ChatGPT/comments/12xai7j/spamming_the_word_stop_2300_times_or_probably_any/
https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/
https://www.reddit.com/r/GPT3/comments/ra6nk4/had_gpt3_generate_the_onion_headlines/
https://www.reddit.com/r/GPT3/comments/tgud2t/my_new_favorite_thing_is_making_gpt3_create/
https://www.reddit.com/r/MachineLearning/comments/12xwzt9/d_be_careful_with_user_facing_apps_using_llms/
https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/
https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
DataComp-LM: In search of the next generation of training sets for language models
MCTSr: Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMA-3-8B
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
From r to Q✱: Your Language Model is Secretly a Q-Function
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)
https%253A%252F%252Farxiv.org%252Fabs%252F2402.17152%2523facebook.html
Fast Adversarial Attacks on Language Models In One GPU Minute
Autonomous Data Selection with Language Models for Mathematical Texts
https%253A%252F%252Farxiv.org%252Fabs%252F2402.04494%2523deepmind.html
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
https%253A%252F%252Farxiv.org%252Fabs%252F2401.15024%2523microsoft.html
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say
https%253A%252F%252Fwww.reuters.com%252Ftechnology%252Fsam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22%252F.html
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2309.10668%2523deepmind.html
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
https%253A%252F%252Farxiv.org%252Fabs%252F2305.10429%2523google.html
Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot
https%253A%252F%252Fwww.forbes.com%252Fsites%252Falexkonrad%252F2023%252F05%252F02%252Finflection-ai-ex-deepmind-launches-pi-chatbot%252F.html
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
https%253A%252F%252Farxiv.org%252Fabs%252F2304.06762%2523nvidia.html
How Large-Language Models Can Revolutionize Military Planning
https%253A%252F%252Fwarontherocks.com%252F2023%252F04%252Fhow-large-language-models-can-revolutionize-military-planning%252F.html
Int-4 LLaMa is not enough—Int-3 and beyond: More compression, easier to build apps on LLMs that run locally
https%253A%252F%252Fnolanoorg.substack.com%252Fp%252Fint-4-llama-is-not-enough-int-3-and.html
Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Data Selection for Language Models via Importance Resampling
A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A new wave of chat bots like ChatGPT use artificial intelligence that could reinvent or even replace the traditional internet search engine
https%253A%252F%252Fwww.nytimes.com%252F2022%252F12%252F21%252Ftechnology%252Fai-chatgpt-google-search.html.html
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
InstructPix2Pix: Learning to Follow Image Editing Instructions
https%253A%252F%252Farxiv.org%252Fabs%252F2211.09085%2523facebook.html
Large Language Models Struggle to Learn Long-Tail Knowledge
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2210.13673%2523nvidia.html
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
https%253A%252F%252Farxiv.org%252Fabs%252F2210.10341%2523microsoft.html
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
https%253A%252F%252Farxiv.org%252Fabs%252F2210.15458%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.06423%2523microsoft.html
Ask Me Anything (AMA): A simple strategy for prompting language models
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2206.01861%2523microsoft.html
Shared computational principles for language processing in humans and deep language models
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41593-022-01026-4.html
https%253A%252F%252Farxiv.org%252Fabs%252F2110.04627%2523google.html
Brains and algorithms partially converge in natural language processing
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs42003-022-03036-1.html
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
https%253A%252F%252Farxiv.org%252Fabs%252F2201.11990%2523microsoftnvidia.html
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
https%253A%252F%252Fswabhs.com%252Fassets%252Fpdf%252Fwanli.pdf%2523allen.html
Improving language models by retrieving from trillions of tokens
https%253A%252F%252Farxiv.org%252Fabs%252F2112.04426%2523deepmind.html
True Few-Shot Learning with Prompts—A Real-World Perspective
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
https%253A%252F%252Farxiv.org%252Fabs%252F2111.02570%2523microsoft.html
https%253A%252F%252Farxiv.org%252Fabs%252F2109.02593%2523allen.html
ByT5: Towards a token-free future with pre-trained byte-to-byte models
https%253A%252F%252Farxiv.org%252Fabs%252F2105.13626%2523google.html
https%253A%252F%252Fm.koreaherald.com%252Fview.php%253Fud%253D20210525000824%2523naver.html
Generative Language Modeling for Automated Theorem Proving
https%253A%252F%252Farxiv.org%252Fabs%252F2009.03393%2523openai.html
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
https%253A%252F%252Faclanthology.org%252F2020.acl-main.463.pdf.html
https%253A%252F%252Farxiv.org%252Fabs%252F2001.08361%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2001.04451%2523google.html
CTRL: A Conditional Transformer Language Model For Controllable Generation
https%253A%252F%252Farxiv.org%252Fabs%252F1909.05858%2523salesforce.html
https%253A%252F%252Fpaperswithcode.com%252Ftask%252Flanguage-modelling.html
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Music Transformer: Generating Music with Long-Term Structure
https%253A%252F%252Fmagenta.tensorflow.org%252Fmusic-transformer.html
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications
https%253A%252F%252Fs3-us-west-2.amazonaws.com%252Fopenai-assets%252Fresearch-covers%252Flanguage-unsupervised%252Flanguage_understanding_paper.pdf%2523page%253D5.html
Wikipedia Bibliography: