“NPM: Nonparametric Masked Language Modeling”, Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer2022-12-02 (, , )⁠:

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval.

Zero-shot evaluation on 9 closed-set tasks and 7 open-set tasks demonstrates that NPM outperforms larger parametric models, either with or without a retrieve-and-generate approach.

It is particularly better on dealing with rare patterns (word senses or facts), and predicting rare or nearly unseen words (eg. non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

[No comparison to alternate tokenizations, especially ByT5.]