Bibliography (28):

  1. https://openai.com/blog/introducing-text-and-code-embeddings/

  2. https://platform.openai.com/docs/guides/embeddings/use-cases

  3. https://x.com/arvind_io/status/1488257004783112192

  4. https://www.askviable.com/blog/why-we-chose-gpt-3-embeddings-for-the-clustering-behind-our-feedback-reports

  5. Contrastive Representation Learning: A Framework and Review

  6. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

  7. Natural Questions: A Benchmark for Question Answering Research

  8. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

  9. GPT-3: Language Models are Few-Shot Learners

  10. Evaluating Large Language Models Trained on Code

  11. SentEval: An Evaluation Toolkit for Universal Sentence Representations

  12. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

  13. SimCSE: Simple Contrastive Learning of Sentence Embeddings

  14. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

  15. GraphCodeBERT: Pre-training Code Representations with Data Flow

  16. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

  17. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

  18. Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning

  19. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

  20. SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

  21. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

  22. Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples

  23. Muppet: Massive Multi-task Representations with Pre-Finetuning

  24. Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?