“Gzip versus Bag-Of-Words for Text Classification With k-NN”, Juri Opitz2023-07-27 (, )⁠:

The effectiveness of compression distance in k-NN-based text classification (gzip) has recently garnered attention.

In this note we show that simpler means can also be effective, and compression may not be needed. Indeed, a bag-of-words matching can achieve similar or better results, and is more efficient.