- See Also
-
Links
- “Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Et Al 2023
- “XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, Et Al 2023
- “NPM: Nonparametric Masked Language Modeling”, Et Al 2022
- “Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Et Al 2022
- “LMentry: A Language Model Benchmark of Elementary Language Tasks”, Et Al 2022
- “N-gram Is Back: Residual Learning of Neural Text Generation With N-gram Language Model”, Et Al 2022
- “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Et Al 2022
- “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Et Al 2022
- “Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, Et Al 2022
- “AudioLM: a Language Modeling Approach to Audio Generation”, Et Al 2022
- “PIXEL: Language Modelling With Pixels”, Et Al 2022
- “N-Grammer: Augmenting Transformers With Latent N-grams”, Et Al 2022
- “SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Et Al 2022
- “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Et Al 2022 (page 16 Org Openai)
- “Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Et Al 2022
- “Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, Et Al 2022
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Et Al 2021
- “What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Et Al 2021
- “Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, 2021
- “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Et Al 2021
- “Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, Et Al 2021
- “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Et Al 2021
- “Robust Open-Vocabulary Translation from Visual Text Representations”, Et Al 2021
- “GPT-3 vs Water Cooler Trivia Participants: A Human vs Robot Showdown”, 2021
- “CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation”, Et Al 2021
- “There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It”, Et Al 2021
- “Perceiver: General Perception With Iterative Attention”, Et Al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Et Al 2021
- “Fast WordPiece Tokenization”, Et Al 2020
- “Towards End-to-End In-Image Neural Machine Translation”, Et Al 2020
- “CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, Et Al 2020
- “GPT-3 Nonfiction”, 2020
- “GPT-3 Creative Fiction”, 2020
- “Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, 2020
- “Generative Language Modeling for Automated Theorem Proving § Experiments”, 2020 (page 11 Org Openai)
- “GPT-2 Folk Music”, 2019
- “BPE-Dropout: Simple and Effective Subword Regularization”, Et Al 2019
- “BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance”, Schick & 2019
- “Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, Et Al 2019
- “Generating Text With Recurrent Neural Networks”, Et Al 2019
- “Character-Level Language Modeling With Deeper Self-Attention”, Al-Et Al 2018
- “Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Et Al 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Et Al 2018 (page 5)
- “One Big Net For Everything”, 2018
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Et Al 2016
- “BPEs: Neural Machine Translation of Rare Words With Subword Units”, Et Al 2015
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher § Table A40: Conversations Can Create the Illusion of Creativity”
- “Commas vs Integers”
- “The Bouba/Kiki Effect And Sound Symbolism In CLIP”
- “Tokens Are Definitely Shorter Than English, but the Performance Even Worse. Getting It to Explain Its Thinking, It Clearly Can’t Tell at All Which Sentences/words Sound the Same, Which Is Odd, Since Homonyms Tend to Have the Same Letters in Russian…On the Other Hand Strength of the Model Definitely Not As Good outside of English.”
- “BPE Blues”
- “BPE Blues+”
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“Language Is Not All You Need: Aligning Perception With Language Models (Kosmos-1)”, Et Al 2023
“Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)”, 2023-02-27 ( ; similar)
“XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, Et Al 2023
“XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models”, 2023-01-25 ( ; similar)
“NPM: Nonparametric Masked Language Modeling”, Et Al 2022
“NPM: Nonparametric Masked Language Modeling”, 2022-12-02 ( ; similar; bibliography)
“Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Et Al 2022
“Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, 2022-11-10 ( ; similar)
“LMentry: A Language Model Benchmark of Elementary Language Tasks”, Et Al 2022
“LMentry: A Language Model Benchmark of Elementary Language Tasks”, 2022-11-03 ( ; backlinks; similar)
“N-gram Is Back: Residual Learning of Neural Text Generation With N-gram Language Model”, Et Al 2022
“n-gram Is Back: Residual Learning of Neural Text Generation with n-gram Language Model”, 2022-10-26 ( ; similar)
“Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Et Al 2022
“Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing (CoPoet)”, 2022-10-25 ( ; backlinks; similar; bibliography)
“Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Et Al 2022
“Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio”, 2022-10-12 ( ; backlinks; similar; bibliography)
“Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, Et Al 2022
“Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints”, 2022-10-06 ( ; similar)
“AudioLM: a Language Modeling Approach to Audio Generation”, Et Al 2022
“AudioLM: a Language Modeling Approach to Audio Generation”, 2022-09-07 ( ; similar)
“PIXEL: Language Modelling With Pixels”, Et Al 2022
“PIXEL: Language Modelling with Pixels”, 2022-07-14 ( ; backlinks; similar; bibliography)
“N-Grammer: Augmenting Transformers With Latent N-grams”, Et Al 2022
“N-Grammer: Augmenting Transformers with latent n-grams”, 2022-07-13 ( ; similar)
“SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Et Al 2022
“SymphonyNet: Symphony Generation with Permutation Invariant Language Model”, 2022-05-10 ( ; similar)
“DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Et Al 2022 (page 16 Org Openai)
“DALL·E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents § 7. Limitations and Risks”, 2022-04-13 ( ; similar; bibliography)
“Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Et Al 2022
“Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors”, 2022-03-24 ( ; similar; bibliography)
“Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, Et Al 2022
“Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words”, 2022-02-24 ( ; similar)
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Et Al 2021
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, 2021-12-15 ( ; similar)
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Et Al 2021
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, 2021-09-10 ( ; similar)
“Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, 2021
“Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, 2021-08-25 (backlinks; similar; bibliography)
“Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Et Al 2021
“Perceiver IO: A General Architecture for Structured Inputs & Outputs”, 2021-07-30 ( ; similar; bibliography)
“Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, Et Al 2021
“Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, 2021-06-23 ( ; similar; bibliography)
“ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Et Al 2021
“ByT5: Towards a token-free future with pre-trained byte-to-byte models”, 2021-05-28 ( ; similar; bibliography)
“Robust Open-Vocabulary Translation from Visual Text Representations”, Et Al 2021
“Robust Open-Vocabulary Translation from Visual Text Representations”, 2021-04-16 ( ; backlinks; similar)
“GPT-3 vs Water Cooler Trivia Participants: A Human vs Robot Showdown”, 2021
“GPT-3 vs Water Cooler Trivia participants: A Human vs Robot Showdown”, 2021-03-12 ( ; backlinks; similar)
“CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation”, Et Al 2021
“CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation”, 2021-03-11 ( ; similar)
“There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It”, Et Al 2021
“There Once Was a Really Bad Poet, It Was Automated but You Didn’t Know It”, 2021-03-05 ( ; backlinks; similar)
“Perceiver: General Perception With Iterative Attention”, Et Al 2021
“Perceiver: General Perception with Iterative Attention”, 2021-03-04 ( ; similar; bibliography)
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Et Al 2021
“Investigating the Limitations of the Transformers with Simple Arithmetic Tasks”, 2021-02-25 ( ; backlinks; similar; bibliography)
“Fast WordPiece Tokenization”, Et Al 2020
“Fast WordPiece Tokenization”, 2020-12-31 (similar; bibliography)
“Towards End-to-End In-Image Neural Machine Translation”, Et Al 2020
“Towards End-to-End In-Image Neural Machine Translation”, 2020-10-20 ( ; similar)
“CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, Et Al 2020
“CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters”, 2020-10-20 ( ; backlinks; similar)
“GPT-3 Nonfiction”, 2020
“GPT-3 Nonfiction”, 2020-06-19 ( ; backlinks; similar; bibliography)
“GPT-3 Creative Fiction”, 2020
“GPT-3 Creative Fiction”, 2020-06-19 ( ; backlinks; similar; bibliography)
“Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, 2020
“Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining”, 2020-04-07 ( ; backlinks; similar)
“Generative Language Modeling for Automated Theorem Proving § Experiments”, 2020 (page 11 Org Openai)
“GPT-2 Folk Music”, 2019
“GPT-2 Folk Music”, 2019-11-01 ( ; backlinks; similar; bibliography)
“BPE-Dropout: Simple and Effective Subword Regularization”, Et Al 2019
“BPE-Dropout: Simple and Effective Subword Regularization”, 2019-10-29 (similar)
“BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance”, Schick & 2019
“BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance”, 2019-10-16 (backlinks; similar)
“Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, Et Al 2019
“Do NLP Models Know Numbers? Probing Numeracy in Embeddings”, 2019-09-17 ( ; backlinks; similar)
“Generating Text With Recurrent Neural Networks”, Et Al 2019
“Generating Text with Recurrent Neural Networks”, 2019-07-16 ( ; similar)
“Character-Level Language Modeling With Deeper Self-Attention”, Al-Et Al 2018
“Character-Level Language Modeling with Deeper Self-Attention”, 2018-08-09 ( ; backlinks; similar)
“Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Et Al 2018
“Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, 2018-07-10 ( ; backlinks; similar)
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Et Al 2018 (page 5)
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications”, 2018-06-08 ( ; similar; bibliography)
“One Big Net For Everything”, 2018
“One Big Net For Everything”, 2018-02-24 ( ; similar)
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Et Al 2016
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, 2016-09-26 ( ; similar)
“BPEs: Neural Machine Translation of Rare Words With Subword Units”, Et Al 2015
“BPEs: Neural Machine Translation of Rare Words with Subword Units”, 2015-08-31 (backlinks; similar)
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher § Table A40: Conversations Can Create the Illusion of Creativity”
“Commas vs Integers”
“The Bouba/Kiki Effect And Sound Symbolism In CLIP”
“Tokens Are Definitely Shorter Than English, but the Performance Even Worse. Getting It to Explain Its Thinking, It Clearly Can’t Tell at All Which Sentences/words Sound the Same, Which Is Odd, Since Homonyms Tend to Have the Same Letters in Russian…On the Other Hand Strength of the Model Definitely Not As Good outside of English.”
“Tokens are definitely shorter than English, but the performance even worse. Getting it to explain its thinking, it clearly can't tell at all which sentences/words sound the same, which is odd, since homonyms tend to have the same letters in Russian...On the other hand strength of the model definitely not as good outside of English.” (backlinks)
“BPE Blues”
“BPE Blues+”
Wikipedia
Miscellaneous
-
https://nitter.moomoo.me/arankomatsuzaki/status/1619548480795734016
-
https://nitter.moomoo.me/repligate/status/1620949459902529537
-
https://nitter.moomoo.me/tomgoldsteincs/status/1601113497592795136
-
https://nitter.moomoo.me/tomgoldsteincs/status/1601113501803552768
-
https://nitter.moomoo.me/tomgoldsteincs/status/1601113505998204928
Link Bibliography
-
https://arxiv.org/abs/2212.01349#facebook
: “NPM: Nonparametric Masked Language Modeling”, Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer: -
https://arxiv.org/abs/2210.13669
: “Help Me Write a Poem: Instruction Tuning As a Vehicle for Collaborative Poetry Writing (CoPoet)”, Tuhin Chakrabarty, Vishakh Padmakumar, He He: -
https://aclanthology.org/2022.cai-1.2.pdf
: “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy: -
https://arxiv.org/abs/2207.06991
: “PIXEL: Language Modelling With Pixels”, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott: -
https://arxiv.org/pdf/2204.06125.pdf#page=16&org=openai
: “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen: -
https://arxiv.org/abs/2203.13131#facebook
: “Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, Yaniv Taigman: -
https://arxiv.org/abs/2108.11193
: “Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens”, Itay Itzhak, Omer Levy: -
https://arxiv.org/abs/2107.14795#deepmind
: “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, : -
https://arxiv.org/abs/2106.12672#google
: “Charformer: Fast Character Transformers via Gradient-based Subword Tokenization”, : -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel: -
https://arxiv.org/abs/2103.03206#deepmind
: “Perceiver: General Perception With Iterative Attention”, Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira: -
https://arxiv.org/abs/2102.13019
: “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Rodrigo Nogueira, Zhiying Jiang, Jimmy Li: -
https://arxiv.org/abs/2012.15524#google
: “Fast WordPiece Tokenization”, Xinying Song, Alex Salcianu, Yang Song, Dave Dopson, Denny Zhou: -
gpt-3-nonfiction
: “GPT-3 Nonfiction”, Gwern Branwen: -
gpt-3
: “GPT-3 Creative Fiction”, Gwern Branwen: -
gpt-2-music
: “GPT-2 Folk Music”, Gwern Branwen, Shawn Presser: -
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5
: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever: