“‘LM Tokenization’ Tag”,2020-03-01 ():
![]()
Bibliography for tag
ai/nn/tokenization, most recent first: 3 related tags, 116 annotations, & 77 links (parent).
- See Also
- Gwern
- Links
- “The Structure of the Token Space for Large Language Models”, et al 2024
- “When a Language Model Is Optimized for Reasoning, Does It Still Show Embers of Autoregression? An Analysis of OpenAI O1”, et al 2024
- “MaskBit: Embedding-Free Image Generation via Bit Tokens”, et al 2024
- “A New Class of Glitch Tokens: BPE Sub-Token Artifacts”
- “JPEG-LM: LLMs As Image Generators With Canonical Codec Representations”, et al 2024
- “Token Erasure As a Footprint of Implicit Vocabulary Items in LLMs”, et al 2024
- “Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets”, et al 2024
- “From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step”, et al 2024
- “Zero-Shot Tokenizer Transfer”, et al 2024
- “Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models”, et al 2024
- “Fishing for Magikarp: Automatically Detecting Under-Trained Tokens in Large Language Models”, 2024
- “SpaceByte: Towards Deleting Tokenization from Large Language Modeling”, 2024
- “Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge”, et al 2024
- “Why Do Small Language Models Underperform? Studying Language Model Saturation via the Softmax Bottleneck”, et al 2024
- “Training LLMs over Neurally Compressed Text”, et al 2024
- “Mechanistic Design and Scaling of Hybrid Architectures”, et al 2024
- “Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, 2024
- “Tasks That Language Models Don’t Learn”, 2024
- “Getting the Most out of Your Tokenizer for Pre-Training and Domain Adaptation”, et al 2024
- “MambaByte: Token-Free Selective State Space Model”, et al 2024
- “A Long-Context Language Model for the Generation of Bacteriophage Genomes”, 2023
- “Diff History for Neural Language Agents”, et al 2023
- “TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering”, et al 2023
- “Positional Description Matters for Transformers Arithmetic”, et al 2023
- “AnyText: Multilingual Visual Text Generation And Editing”, et al 2023
- “EELBERT: Tiny Models through Dynamic Embeddings”, et al 2023
- “ChipNeMo: Domain-Adapted LLMs for Chip Design”, et al 2023
- “Learn Your Tokens: Word-Pooled Tokenization for Language Modeling”, et al 2023
- “Tokenizer Choice For LLM Training: Negligible or Crucial?”, et al 2023
- “XVal: A Continuous Number Encoding for Large Language Models”, et al 2023
- “Think Before You Speak: Training Language Models With Pause Tokens”, et al 2023
- “Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, et al 2023
- “Subwords As Skills: Tokenization for Sparse-Reward Reinforcement Learning”, et al 2023
- “PASTA: Pretrained Action-State Transformer Agents”, et al 2023
- “In-Context Autoencoder for Context Compression in a Large Language Model”, et al 2023
- “Teaching Arithmetic to Small Transformers”, et al 2023
- “Length Generalization in Arithmetic Transformers”, et al 2023
- “ChatGPT Is Fun, but It Is Not Funny! Humor Is Still Challenging Large Language Models”, 2023
- “Bytes Are All You Need: Transformers Operating Directly On File Bytes”, et al 2023
- “FERMAT: An Alternative to Accuracy for Numerical Reasoning”, 2023
- “MEGABYTE: Predicting Million-Byte Sequences With Multiscale Transformers”, et al 2023
- “Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, et al 2023
- “What’s AGI, and Why Are AI Experts Skeptical? ChatGPT and Other Bots Have Revived Conversations on Artificial General Intelligence. Scientists Say Algorithms Won’t Surpass You Any Time Soon”, 2023
- “BloombergGPT: A Large Language Model for Finance”, et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, et al 2023
- “LLaMa-1: Open and Efficient Foundation Language Models”, et al 2023
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, et al 2023
- “XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, et al 2023
- “Language Models Are Better Than Humans at Next-Token Prediction”, et al 2022
- “Character-Aware Models Improve Visual Text Rendering”, et al 2022
- “NPM: Nonparametric Masked Language Modeling”, et al 2022
- “Fast Inference from Transformers via Speculative Decoding”, et al 2022
- “Efficient Transformers With Dynamic Token Pooling”, et al 2022
- “Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, et al 2022
- “LMentry: A Language Model Benchmark of Elementary Language Tasks”, et al 2022
- “n-Gram Is Back: Residual Learning of Neural Text Generation With n-Gram Language Model”, et al 2022
- “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, et al 2022
- “DALL·E 2 Is Seeing Double: Flaws in Word-To-Concept Mapping in Text2Image Models”, et al 2022
- “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, et al 2022
- “Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, et al 2022
- “AudioLM: a Language Modeling Approach to Audio Generation”, et al 2022
- “PIXEL: Language Modeling With Pixels”, et al 2022
- “N-Grammer: Augmenting Transformers With Latent n-Grams”, et al 2022
- “Forecasting Future World Events With Neural Networks”, et al 2022
- “SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, et al 2022
- “FLOTA: An Embarrassingly Simple Method to Mitigate Und-Es-Ira-Ble Properties of Pretrained Language Model Tokenizers”, et al 2022
- “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, et al 2022 (page 16 org openai)
- “ByT5 Model for Massively Multilingual Grapheme-To-Phoneme Conversion”, et al 2022
- “Make-A-Scene: Scene-Based Text-To-Image Generation With Human Priors”, et al 2022
- “Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, et al 2022
- “Between Words and Characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP”, et al 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, et al 2021
- “OCR-Free Document Understanding Transformer”, et al 2021
- “What Changes Can Large-Scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-Scale Korean Generative Pretrained Transformers”, et al 2021
- “Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, 2021
- “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, et al 2021
- “Charformer: Fast Character Transformers via Gradient-Based Subword Tokenization”, et al 2021
- “ByT5: Towards a Token-Free Future With Pre-Trained Byte-To-Byte Models”, et al 2021
- “Robust Open-Vocabulary Translation from Visual Text Representations”, et al 2021
- “GPT-3 vs Water Cooler Trivia Participants: A Human vs Robot Showdown”, 2021
- “CANINE: Pre-Training an Efficient Tokenization-Free Encoder for Language Representation”, et al 2021
- “There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It”, et al 2021
- “Perceiver: General Perception With Iterative Attention”, et al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, et al 2021
- “Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words”, et al 2021
- “Fast WordPiece Tokenization”, et al 2020
- “CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, et al 2020
- “Towards End-To-End In-Image Neural Machine Translation”, et al 2020
- “Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, 2020
- “Generative Language Modeling for Automated Theorem Proving § Experiments”, 2020 (page 11 org openai)
- “OTEANN: Estimating the Transparency of Orthographies With an Artificial Neural Network”, 2019
- “GPT-2 Folk Music”, 2019
- “BPE-Dropout: Simple and Effective Subword Regularization”, et al 2019
- “BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance”, Schick & 2019
- “Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, et al 2019
- “Generating Text With Recurrent Neural Networks”, et al 2019
- “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing”, 2018
- “Character-Level Language Modeling With Deeper Self-Attention”, Al- et al 2018
- “Deep-Speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, et al 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, et al 2018 (page 5)
- “One Big Net For Everything”, 2018
- “Breaking the Softmax Bottleneck: A High-Rank RNN Language Model”, et al 2017
- “DeepTingle”, et al 2017
- “Multiplicative LSTM for Sequence Modeling”, et al 2016
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, et al 2016
- “BPEs: Neural Machine Translation of Rare Words With Subword Units”, et al 2015
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher § Table A40: Conversations Can Create the Illusion of Creativity”
- “Commas vs Integers”
- “FineWeb: Decanting the Web for the Finest Text Data at Scale”
- “The Bouba/Kiki Effect And Sound Symbolism In CLIP”
- “BPE Blues”
- “BPE Blues+”
- “The Art of Prompt Design: Prompt Boundaries and Token Healing”
- “Monitor: An AI-Driven Observability Interface”
- “A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More”
- NineOfNein
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography