“Large Dual Encoders Are Generalizable Retrievers”, 2021-12-15 (; similar):
It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization.
In this paper, we challenge this belief by scaling up the size of the dual encoder model while keeping the bottleneck embedding size fixed. With multi-stage training, surprisingly, scaling up the model size brings substantial improvement on a variety of retrieval tasks, especially for out-of-domain generalization.
Experimental results show that our dual encoders, Generalizable T5-based dense Retrievers (GTR), outperform existing sparse and dense retrievers on the BEIR dataset ( et al 2021) substantially. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10% of MS MARCO supervised data to achieve the best out-of-domain performance. All the GTR models are released.