“Gzip versus Bag-Of-Words for Text Classification With k-NN”, 2023-07-27 ():
The effectiveness of compression distance in k-NN-based text classification (gzip) has recently garnered attention.
In this note we show that simpler means can also be effective, and compression may not be needed. Indeed, a bag-of-words matching can achieve similar or better results, and is more efficient.