https://openai.com/blog/introducing-text-and-code-embeddings/
https://platform.openai.com/docs/guides/embeddings/use-cases
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Natural Questions: A Benchmark for Question Answering Research
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
SentEval: An Evaluation Toolkit for Universal Sentence Representations
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
GraphCodeBERT: Pre-training Code Representations with Data Flow
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Muppet: Massive Multi-task Representations with Pre-Finetuning
Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?
Wikipedia Bibliography: