Bibliography (29):

https://openai.com/blog/introducing-text-and-code-embeddings/
https://platform.openai.com/docs/guides/embeddings/use-cases
https://x.com/arvind_io/status/1488257004783112192
https://www.askviable.com/blog/why-we-chose-gpt-3-embeddings-for-the-clustering-behind-our-feedback-reports
Contrastive Representation Learning: A Framework and Review
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Natural Questions: A Benchmark for Question Answering Research
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
GPT-3: Language Models are Few-Shot Learners
Evaluating Large Language Models Trained on Code
SentEval: An Evaluation Toolkit for Universal Sentence Representations
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
SimCSE: Simple Contrastive Learning of Sentence Embeddings
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
GraphCodeBERT: Pre-training Code Representations with Data Flow
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Text and Code Embeddings by Contrastive Pre-Training
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
Muppet: Massive Multi-task Representations with Pre-Finetuning
Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?
Wikipedia Bibliography: