āLess Is More: Parameter-Free Text Classification With Gzipā, 2022-12-19 (; backlinks)ā :
[buggy? deeply misleading?] Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice.
In this paper, we propose a non-parametric alternative to DNNs thatās easy, light-weight and universal in text classification: a combination of a simple compressor like
gzipwith a k-nearest-neighbor classifier.Without any training, pre-training or fine-tuning, our method achieves results that are competitive with non-pretrained deep learning methods on 6 in-distributed datasets. It even outperforms BERT on all 5 OOD datasets, including 4 low-resource languages.
Our method also performs particularly well in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.
[Like all āstupid compressor tricksā, this will be more amusing & overhyped than useful or relevant long-term. cf. N-Grammer, Copy is all You Need]