“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, 2023-01-08 ():
We carried out a reproducibility study of the InPars recipe for unsupervised training of neural rankers. As a by-product of this study, we developed a simple-yet-effective modification of InPars, which we called InPars-light. Unlike InPars, InPars-light uses only a freely available language model BLOOM and 7-100× smaller ranking models.
On all 5 English retrieval collections (used in the original InPars study) we obtained substantial (7-30%) and statistically improvements over BM25 in nDCG or MRR using only a 30M parameter six-layer MiniLM ranker. In contrast, in the InPars study only a 100× larger MonoT5-3B model consistently outperformed BM25, whereas their smaller MonoT5-220M model (which is still 7× larger than our MiniLM ranker), outperformed BM25 only on MS MARCO and TREC DL 2020. In a purely unsupervised setting, our 435M parameter DeBERTA v3 ranker was roughly at par with the 7× larger MonoT5-3B: In fact, on 3⁄5 datasets, it slightly outperformed MonoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1,000 used in InPars.
We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25.