Bibliography:

  1. ‘neural net’ tag

  2. ‘self-attention’ tag

  3. ‘Sydney (AI)’ tag

  4. Hierarchical Embeddings for Text Search

  5. Absolute Unit NNs: Regression-Based MLPs for Everything

  6. Number Search Engine via NN Embeddings

  7. Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

  8. Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

  9. HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

  10. Long Context RAG Performance of Large Language Models

  11. Inference Scaling for Long-Context Retrieval Augmented Generation

  12. Contextual Document Embeddings

  13. Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?

  14. Masked Mixers for Language Generation and Retrieval

  15. Hermes 3 Technical Report

  16. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

  17. OpenAI’s Colin Jarvis predicts "exponential" advancements in large language model capabilities during AI Summit London keynote

  18. State Soup: In-Context Skill Learning, Retrieval and Mixing

  19. Retrieval Head Mechanistically Explains Long-Context Factuality

  20. Aligning LLM Agents by Learning Latent Preference from User Edits

  21. Towards Generated Image Provenance Analysis Via Conceptual-Similar-Guided-SLIP Retrieval

  22. FABLES: Evaluating faithfulness and content selection in book-length summarization

  23. Long-form factuality in large language models

  24. Online Adaptation of Language Models with a Memory of Amortized Contexts (MAC)

  25. RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

  26. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)

  27. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

  28. RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

  29. Improving Text Embeddings with Large Language Models

  30. ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

  31. Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models

  32. Retrieving Conditions from Reference Images for Diffusion Models

  33. Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

  34. PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

  35. ChipNeMo: Domain-Adapted LLMs for Chip Design

  36. In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries

  37. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

  38. Text Embeddings Reveal (Almost) As Much As Text

  39. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  40. ExpeL: LLM Agents Are Experiential Learners

  41. RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models

  42. Gzip versus bag-of-words for text classification with k-NN

  43. Copy Is All You Need

  44. Lost in the Middle: How Language Models Use Long Contexts

  45. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

  46. Voice Conversion With Just Nearest Neighbors

  47. TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models

  48. Landmark Attention: Random-Access Infinite Context Length for Transformers

  49. WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

  50. Long-Term Value of Exploration: Measurements, Findings and Algorithms

  51. ImageBind: One Embedding Space To Bind Them All

  52. Unlimiformer: Long-Range Transformers with Unlimited Length Input

  53. q2d: Turning Questions into Dialogs to Teach Models How to Search

  54. CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

  55. Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

  56. Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

  57. MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

  58. Mitigating YouTube Recommendation Polarity using BERT and K-Means Clustering

  59. Tag2Text: Guiding Vision-Language Model via Image Tagging

  60. ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

  61. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

  62. How Does In-Context Learning Help Prompt Tuning?

  63. Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models

  64. Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

  65. In-Context Retrieval-Augmented Language Models

  66. Crawling the Internal Knowledge-Base of Language Models

  67. InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

  68. Why do Nearest Neighbor Language Models Work?

  69. Precise Zero-Shot Dense Retrieval without Relevance Labels

  70. Less is More: Parameter-Free Text Classification with Gzip

  71. One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)

  72. Text Embeddings by Weakly-Supervised Contrastive Pre-training

  73. NPM: Nonparametric Masked Language Modeling

  74. Retrieval-Augmented Multimodal Language Modeling

  75. GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

  76. TART: Task-aware Retrieval with Instructions

  77. Large Language Models Struggle to Learn Long-Tail Knowledge

  78. RARR: Attributed Text Generation via Post-hoc Research and Revision

  79. Noise-Robust De-Duplication at Scale

  80. Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)

  81. ReAct: Synergizing Reasoning and Acting in Language Models

  82. FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

  83. Sparrow: Improving alignment of dialogue agents via targeted human judgements

  84. Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators

  85. Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners

  86. Nearest Neighbor Non-autoregressive Text Generation

  87. Understanding Scaling Laws for Recommendation Models

  88. CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks

  89. RealTime QA: What’s the Answer Right Now?

  90. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models

  91. NewsStories: Illustrating articles with visual summaries

  92. Re2G: Retrieve, Rerank, Generate

  93. Large-Scale Retrieval for Reinforcement Learning

  94. A Neural Corpus Indexer for Document Retrieval

  95. Boosting Search Engines with Interactive Agents

  96. Hopular: Modern Hopfield Networks for Tabular Data

  97. NaturalProver: Grounded Mathematical Proof Generation with Language Models

  98. Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

  99. PLAID: An Efficient Engine for Late Interaction Retrieval

  100. RankGen: Improving Text Generation with Large Ranking Models

  101. Unifying Language Learning Paradigms

  102. Retrieval-Augmented Diffusion Models: Semi-Parametric Neural Image Synthesis

  103. KNN-Diffusion: Image Generation via Large-Scale Retrieval

  104. Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

  105. Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

  106. Retrieval Augmented Classification for Long-Tail Visual Recognition

  107. Retrieval-Augmented Reinforcement Learning

  108. Transformer Memory as a Differentiable Search Index

  109. InPars: Data Augmentation for Information Retrieval using Large Language Models

  110. Text and Code Embeddings by Contrastive Pre-Training

  111. LaMDA: Language Models for Dialog Applications

  112. Memory-assisted prompt editing to improve GPT-3 after deployment

  113. A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering

  114. Learning To Retrieve Prompts for In-Context Learning

  115. Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning

  116. WebGPT: Browser-assisted question-answering with human feedback

  117. WebGPT: Improving the factual accuracy of language models through web browsing

  118. Large Dual Encoders Are Generalizable Retrievers

  119. You Only Need One Model for Open-domain Question Answering

  120. Spider: Learning to Retrieve Passages without Supervision

  121. Boosted Dense Retriever

  122. Improving language models by retrieving from trillions of tokens

  123. Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

  124. Florence: A New Foundation Model for Computer Vision

  125. LiT: Zero-Shot Transfer with Locked-image Text Tuning

  126. Scaling Law for Recommendation Models: Towards General-purpose User Representations

  127. SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

  128. HTCN: Harmonious Text Colorization Network for Visual-Textual Presentation Design

  129. Memorizing Transformers

  130. CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

  131. One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective

  132. SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

  133. MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

  134. EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling

  135. Contrastive Language-Image Pre-training for the Italian Language

  136. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

  137. Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

  138. MuSiQue: Multi-hop Questions via Single-hop Question Composition

  139. Internet-Augmented Dialogue Generation

  140. SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

  141. CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

  142. A Multi-Level Attention Model for Evidence-Based Fact Checking

  143. Towards mental time travel: a hierarchical memory for reinforcement learning agents

  144. RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling

  145. Not All Memories are Created Equal: Learning to Forget by Expiring

  146. Rethinking Search: Making Domain Experts out of Dilettantes

  147. SimCSE: Simple Contrastive Learning of Sentence Embeddings

  148. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

  149. Retrieval Augmentation Reduces Hallucination in Conversation

  150. TSDAE: Using Transformer-based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning

  151. NaturalProofs: Mathematical Theorem Proving in Natural Language

  152. China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.

  153. Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence (VitaminC)

  154. ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

  155. Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers

  156. Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

  157. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

  158. Current Limitations of Language Models: What You Need is Retrieval

  159. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

  160. Pre-training via Paraphrasing

  161. Memory Transformer

  162. M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

  163. System for searching illustrations of anime characters focusing on degrees of character attributes

  164. Open-Retrieval Conversational Question Answering

  165. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

  166. Dense Passage Retrieval for Open-Domain Question Answering

  167. Learning to Scale Multilingual Representations for Vision-Language Tasks

  168. REALM: Retrieval-Augmented Language Model Pre-Training

  169. How Much Knowledge Can You Pack Into the Parameters of a Language Model?

  170. REALM: Integrating Retrieval into Language Representation Models

  171. The Importance of Deconstruction

  172. SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning

  173. Generalization through Memorization: Nearest Neighbor Language Models

  174. OHAC: Online Hierarchical Clustering Approximations

  175. MULE: Multimodal Universal Language Embedding

  176. Language Models as Knowledge Bases?

  177. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

  178. Metalearned Neural Memory

  179. ELI5: Long Form Question Answering

  180. Large Memory Layers with Product Keys

  181. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

  182. Dynamic Evaluation of Transformer Language Models

  183. LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game

  184. Top-K Off-Policy Correction for a REINFORCE Recommender System

  185. FEVER: a large-scale dataset for Fact Extraction and VERification

  186. Towards Deep Modeling of Music Semantics using EEG Regularizers

  187. Learning to Organize Knowledge and Answer Questions with N-Gram Machines

  188. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

  189. Bolt: Accelerated Data Mining with Fast Vector Compression

  190. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

  191. Get To The Point: Summarization with Pointer-Generator Networks

  192. Neural Episodic Control

  193. Improving Neural Language Models with a Continuous Cache

  194. Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

  195. Deep Neural Networks for YouTube Recommendations

  196. One-shot Learning with Memory-Augmented Neural Networks

  197. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

  198. PlaNet—Photo Geolocation with Convolutional Neural Networks

  199. Illustration2Vec: a semantic vector representation of illustrations

  200. Neural Turing Machines

  201. Learning to Win by Reading Manuals in a Monte-Carlo Framework

  202. Ukiyo-e Search

  203. SimHash: Hash-based Similarity Detection

  204. This Week’s Citation Classic: Nearest Neighbor Pattern Classification

  205. Nearest neighbor pattern classification

  206. RETRO Is Blazingly Fast

  207. 00caed305a21faca6ce925d0fce74f43c9e34f2f.html

  208. ANN-Benchmarks Is a Benchmarking Environment for Approximate Nearest Neighbor Algorithms Search. This Website Contains the Current Benchmarking Results. Please Visit Https://github.com/erikbern/ann-Benchmarks/ to Get an Overview over Evaluated Data Sets and Algorithms. Make a Pull Request on Github to Add Your Own Code or Improvements to the Benchmarking System.

  209. f54a8834c29e5ff1bf47e7db34d81cd47b32ddf1.html

  210. Find Anything Blazingly Fast With Google's Vector Search Technology

  211. 657bb4da76626fc37d1454bb197b0f5dec00ebd9.html

  212. This Anime Does Not Exist, Search: This Notebook Uses the Precomputed CLIP Feature Vectors for 100k Images from TADNE

  213. Differentiable Neural Computers

  214. b0c1e94edd4149ca423cc8b7783ef9192c63eafc.html

  215. Binary Vector Embeddings Are so Cool

  216. Understanding the BM25 Full Text Search Algorithm

  217. PaddlePaddle/RocketQA: 🚀 RocketQA, Dense Retrieval for Information Retrieval and Question Answering, including Both Chinese and English State-Of-The-Art Models.

  218. Building a Vector Database in 2GB for 36 Million Wikipedia Passages

  219. The Illustrated Retrieval Transformer

  220. 1f8fbb80b6d2d13014b65577ecfa6e0b6836480a.html

  221. 100M Token Context Windows

  222. The Super Effectiveness of Pokémon Embeddings Using Only Raw JSON and Images

  223. Same Energy

  224. Embedding Paragraphs from My Blog With E5-Large-V2

  225. European Parliament Revolutionizes Archive Access With Claude AI

  226. WikiCrow

  227. 1754102bd82fb703b3c17ea47232c04c76f7452e.html

  228. Azure AI Milestone: Microsoft KEAR Surpasses Human Performance on CommonsenseQA Benchmark

  229. 8312ec6281c2f762a6d932c5023da466a143a533.html

  230. Turing Bletchley: A Universal Image Language Representation Model by Microsoft

  231. 9e927b2786ff3d4b08e0637c3793dd19a3479d0b.html

  232. Here Are 120K 𝑤 Samples from @AydaoAI’s Large Anime Model (aka TADNE) Clustered into a Set of 256 Centroids. 𝘸𝘢𝘵𝘤𝘩 𝘪𝘵 𝘴𝘩𝘪𝘯𝘦

  233. design#future-tag-features

    [Transclude the forward-link's context]

  234. 2023-girdhar-figure1-imagebindsjointembeddingspacenablesemergentmultimodalcapabilitieslikeembeddingarithmeticoraudio2imagegeneration.jpg

  235. 2023-girdhar-figure5-objectdetectioninimageswithaudioqueriesinimagebindrequiringnoretraining.png

  236. 2022-gao-figure1-hydearchitecturediagramofhallucinatingananswerandthenlookingupsimilardocumentstousetogeneratearealanswer.png

  237. 2022-press-figure5-selfaskplusgooglesearchengine-innermonologueforsearchingtheinternettoanswermultihopquestions.png

  238. 2022-press-table1-selfaskplusgooglesearchengine-innermonologueforsearchingtheinternettoanswermultihopquestions-benchmarkperformance.jpg

  239. https://about.sourcegraph.com/blog/cheating-is-all-you-need

  240. https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/

  241. 52db30d34a0a4233d55f023a38836da9078a751b.html

  242. https://aimd.app/blog/2024-01-16-using-ai-to-overengineer-404-pages

  243. https://applied-llms.org/

  244. https://ashvardanian.com/posts/abusing-vector-search/

  245. e7ce3a5b54013bb5267fbeb0a0e1b168774d18fe.html

  246. https://ashvardanian.com/posts/gcc-12-vs-avx512fp16/

  247. e65b2bf50b25095df2f8c56b2f5838db97fbb913.html

  248. https://ashvardanian.com/posts/python-c-assembly-comparison/

  249. 885a763e83ed886ea022fe63111c58d8cfd26f84.html

  250. https://blog.helix.ml/p/how-we-got-fine-tuning-mistral-7b

  251. https://blog.pgvecto.rs/my-binary-vector-search-is-better-than-your-fp32-vectors

  252. fd175d2fa010e0a1ec26ee2b46fae52a2fd8be7d.html

  253. https://cookbook.openai.com/examples/tag_caption_images_with_gpt4v

  254. https://docs.sweep.dev/blogs/sweeps-core-algo

  255. https://economistwritingeveryday.com/2024/01/07/using-phind-for-academic-references/

  256. 0eeeea4f73e22cbb16d234e14cf7d70623d452c6.html

  257. https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/

  258. 9b75c15e86f9da3c8576ad516c616978c5932d59.html

  259. https://every.to/chain-of-thought/gpt-4-is-a-reasoning-engine

  260. ada2f3cd958e1bb679e482e905cd35399d95ff4e.html

  261. https://every.to/chain-of-thought/i-spent-a-week-with-gemini-pro-1-5-it-s-fantastic

  262. 6413a29058d53aa23f7da249d170e9843a856112.html

  263. https://github.com/MaartenGr/Concept

  264. https://github.com/freedmand/semantra

  265. https://github.com/gh18l/CrawlGPT

  266. https://github.com/nyu-mll/quality

  267. https://github.com/run-llama/llama_index

  268. https://huggingface.co/blog/embedding-quantization

  269. 7b0d2c6e7d4974f9a367fac80ca288f8121bed84.html

  270. https://kagi.com/summarizer/api.html

  271. 69763f036df4b073e71007d3c4f3ccf35a7ba272.html

  272. https://kenschutte.com/gzip-knn-paper2/

  273. https://news.ycombinator.com/item?id=36616237

  274. 33f181f306ffbe723764e191815f4d028b69c23a.html

  275. https://news.ycombinator.com/item?id=38703943

  276. 11b342a54b70f98ad2abcb88e165662b2c9e7477.html

  277. https://openai.com/blog/chatgpt-plugins

  278. https://openai.com/blog/introducing-text-and-code-embeddings/

  279. https://openai.com/index/new-embedding-models-and-api-updates/

  280. https://platform.openai.com/docs/gptbot

  281. 616202c012ede04b03ea65cc1d5466b95e632256.html

  282. https://platform.openai.com/docs/guides/embeddings/use-cases

  283. a3b4bf458001b18ebe1ff66b052631833625ece4.html

  284. https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/multi_vector/

  285. f3a028b0595997643446f2787d9290076b26c11a.html

  286. https://research.google/blog/kelm-integrating-knowledge-graphs-with-language-model-pre-training-corpora/

  287. https://rom1504.github.io/clip-retrieval/

  288. https://searchengineland.com/how-google-search-ranking-works-pandu-nayak-435395#h-navboost-system-a-k-a-glue

  289. https://sigmoidprime.com/post/searchthearxiv/

  290. 37882894d2742a7fffd0a019d690f0240aeec1cf.html

  291. https://simonwillison.net/2024/Apr/17/ai-for-data-journalism/

  292. https://til.simonwillison.net/llms/claude-hacker-news-themes

  293. ed1f733c1c0f4b2ea83349b860e24a835a56fc8e.html

  294. https://tomhazledine.com/llm-related-posts/

  295. acb47ea4bb20184c6511d04c12b023be2b867fdb.html

  296. https://txt.cohere.com/int8-binary-embeddings/

  297. fd4ac24d83d49b2cc0c3e026f8af187838e47f23.html

  298. https://tyleransom.substack.com/p/using-llms-to-fuzzy-merge

  299. https://www.anthropic.com/news/claude-2-1-prompting

  300. https://www.askviable.com/blog/why-we-chose-gpt-3-embeddings-for-the-clustering-behind-our-feedback-reports

  301. 31ddb5cc20ec19860de9803a3df7abce079680cb.html

  302. https://www.buildt.ai/blog/viral-ripout

  303. 578673ced29982f87eb8e930f5e6d692a44fed4e.html

  304. https://www.oreilly.com/radar/what-we-learned-from-a-year-of-building-with-llms-part-i/

  305. https://www.reddit.com/r/ChatGPT/comments/12a0ajb/i_gave_gpt4_persistent_memory_and_the_ability_to/

  306. https://www.reddit.com/r/MachineLearning/comments/117yw1w/d_maybe_a_new_prompt_injection_method_against/

  307. https://www.reddit.com/r/MachineLearning/comments/1fyb9jj/p_model2vec_distill_a_small_fast_model_from_any/

  308. https://www.youtube.com/watch?v=QRJW5jT5VRA

  309. https://www.youtube.com/watch?v=hhiLw5Q_UFg&t=1098s

  310. https://x.com/Afinetheorem/status/1634516697515261953

  311. https://x.com/BlackHC/status/1678881236582912000

  312. https://x.com/D_Rod_Tweets/status/1628030272745746432

  313. https://x.com/D_Rod_Tweets/status/1628449917898264576

  314. https://x.com/IntuitMachine/status/1722727424947859896

  315. https://x.com/OpenAI/status/1676072388436594688

  316. https://x.com/aibreakfast/status/1621668590738079744

  317. https://x.com/andrewwhite01/status/1616933106786738176

  318. https://x.com/arvind_io/status/1488257004783112192

  319. https://x.com/bentossell/status/1598673037976543240

  320. https://x.com/dust4ai/status/1587104029712203778

  321. https://x.com/emollick/status/1625701942574960646

  322. https://x.com/emollick/status/1630472741127180288

  323. https://x.com/emollick/status/1766864861928001617

  324. https://x.com/gdb/status/1707082027584106669

  325. https://x.com/gfodor/status/1626270272314839041

  326. https://x.com/gstsdn/status/1570489762489958406

  327. https://x.com/jdjkelly/status/1617381388977831936

  328. https://x.com/jheitzeb/status/1612130278293803009

  329. https://x.com/justindsmith/status/1681166014711746564

  330. https://x.com/mathemagic1an/status/1595410144522813440

  331. https://x.com/metzlerd/status/1614029603471003648

  332. https://x.com/natfriedman/status/1575631194032549888

  333. https://x.com/pixeljets/status/1643609901833371652

  334. https://x.com/qdrant_engine/status/1721097971830260030

  335. https://x.com/repligate/status/1630593115407937536

  336. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

  337. https%253A%252F%252Farxiv.org%252Fabs%252F2406.13121%2523google.html

  338. OpenAI’s Colin Jarvis predicts "exponential" advancements in large language model capabilities during AI Summit London keynote

  339. https%253A%252F%252Faibusiness.com%252Fnlp%252Fopenai-chief-architect-predicts-huge-large-language-model-leaps.html

  340. Retrieval Head Mechanistically Explains Long-Context Factuality

  341. Yizhong Wang—University of Washington

  342. https%253A%252F%252Farxiv.org%252Fabs%252F2404.15574.html

  343. Long-form factuality in large language models

  344. https%253A%252F%252Farxiv.org%252Fabs%252F2403.18802%2523deepmind.html

  345. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations (HSTU)

  346. https%253A%252F%252Farxiv.org%252Fabs%252F2402.17152%2523facebook.html

  347. RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

  348. https%253A%252F%252Farxiv.org%252Fabs%252F2401.08406%2523microsoft.html

  349. Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

  350. https%253A%252F%252Farxiv.org%252Fabs%252F2311.16452%2523microsoft.html

  351. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  352. Jason Wei

  353. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03214%2523google.html

  354. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

  355. https%253A%252F%252Farxiv.org%252Fabs%252F2306.15626.html

  356. TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models

  357. Yu Sun

  358. https%253A%252F%252Farxiv.org%252Fabs%252F2305.18466.html

  359. Landmark Attention: Random-Access Infinite Context Length for Transformers

  360. https%253A%252F%252Farxiv.org%252Fabs%252F2305.16300.html

  361. ImageBind: One Embedding Space To Bind Them All

  362. Zhuang Liu’s Homepage

  363. https%253A%252F%252Farxiv.org%252Fabs%252F2305.05665%2523facebook.html

  364. Unlimiformer: Long-Range Transformers with Unlimited Length Input

  365. https%253A%252F%252Farxiv.org%252Fabs%252F2305.01625.html

  366. q2d: Turning Questions into Dialogs to Teach Models How to Search

  367. https%253A%252F%252Farxiv.org%252Fabs%252F2304.14318%2523google.html

  368. Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

  369. https%253A%252F%252Farxiv.org%252Fabs%252F2304.06762%2523nvidia.html

  370. ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

  371. https%253A%252F%252Farxiv.org%252Fabs%252F2302.12433.html

  372. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

  373. https%253A%252F%252Farxiv.org%252Fabs%252F2302.12173.html

  374. Precise Zero-Shot Dense Retrieval without Relevance Labels

  375. https%253A%252F%252Farxiv.org%252Fabs%252F2212.10496.html

  376. Less is More: Parameter-Free Text Classification with Gzip

  377. https%253A%252F%252Farxiv.org%252Fabs%252F2212.09410.html

  378. One Embedder, Any Task: Instruction-Finetuned Text Embeddings (INSTRUCTOR)

  379. Yizhong Wang—University of Washington

  380. Luke Zettlemoyer

  381. https%253A%252F%252Farxiv.org%252Fabs%252F2212.09741.html

  382. Text Embeddings by Weakly-Supervised Contrastive Pre-training

  383. Furu Wei

  384. https%253A%252F%252Farxiv.org%252Fabs%252F2212.03533%2523microsoft.html

  385. NPM: Nonparametric Masked Language Modeling

  386. Mike Lewis

  387. Hannaneh Hajishirzi—University of Washington

  388. Luke Zettlemoyer

  389. https%253A%252F%252Farxiv.org%252Fabs%252F2212.01349%2523facebook.html

  390. Retrieval-Augmented Multimodal Language Modeling

  391. Percy Liang

  392. Mike Lewis

  393. Luke Zettlemoyer

  394. https%253A%252F%252Farxiv.org%252Fabs%252F2211.12561%2523facebook.html

  395. Large Language Models Struggle to Learn Long-Tail Knowledge

  396. Colin Raffel

  397. https%253A%252F%252Farxiv.org%252Fabs%252F2211.08411.html

  398. RARR: Attributed Text Generation via Post-hoc Research and Revision

  399. https%253A%252F%252Farxiv.org%252Fabs%252F2210.08726%2523google.html

  400. Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)

  401. Noah A. Smith

  402. Mike Lewis

  403. https%253A%252F%252Farxiv.org%252Fabs%252F2210.03350%2523allen.html

  404. Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners

  405. Luke Zettlemoyer

  406. https%253A%252F%252Farxiv.org%252Fabs%252F2209.01975.html

  407. NewsStories: Illustrating articles with visual summaries

  408. https%253A%252F%252Farxiv.org%252Fabs%252F2207.13061.html

  409. Re2G: Retrieve, Rerank, Generate

  410. https%253A%252F%252Farxiv.org%252Fabs%252F2207.06300%2523ibm.html

  411. Large-Scale Retrieval for Reinforcement Learning

  412. https%253A%252F%252Farxiv.org%252Fabs%252F2206.05314%2523deepmind.html

  413. Boosting Search Engines with Interactive Agents

  414. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253D0ZbPmmB61g%2523google.html

  415. NaturalProver: Grounded Mathematical Proof Generation with Language Models

  416. Hannaneh Hajishirzi—University of Washington

  417. https%253A%252F%252Farxiv.org%252Fabs%252F2205.12910%2523allen.html

  418. Unifying Language Learning Paradigms

  419. Yi Tay

  420. Neil Houlsby

  421. https%253A%252F%252Farxiv.org%252Fabs%252F2205.05131%2523google.html

  422. Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

  423. https%253A%252F%252Farxiv.org%252Fabs%252F2203.13224%2523facebook.html

  424. Text and Code Embeddings by Contrastive Pre-Training

  425. Alec Radford

  426. Jong Wook Kim

  427. Gretchen Krueger

  428. Lil'Log

  429. https%253A%252F%252Farxiv.org%252Fabs%252F2201.10005%2523openai.html

  430. Contriever: Towards Unsupervised Dense Information Retrieval with Contrastive Learning

  431. https%253A%252F%252Farxiv.org%252Fabs%252F2112.09118%2523facebook.html

  432. WebGPT: Browser-assisted question-answering with human feedback

  433. Jacob Hilton's Homepage

  434. Gretchen Krueger

  435. John Schulman’s Homepage

  436. https%253A%252F%252Farxiv.org%252Fabs%252F2112.09332%2523openai.html

  437. WebGPT: Improving the factual accuracy of language models through web browsing

  438. Jacob Hilton's Homepage

  439. John Schulman’s Homepage

  440. https%253A%252F%252Fopenai.com%252Fresearch%252Fwebgpt.html

  441. Large Dual Encoders Are Generalizable Retrievers

  442. https%253A%252F%252Farxiv.org%252Fabs%252F2112.07899%2523google.html

  443. Improving language models by retrieving from trillions of tokens

  444. Karen Simonyan

  445. https%253A%252F%252Farxiv.org%252Fabs%252F2112.04426%2523deepmind.html

  446. Florence: A New Foundation Model for Computer Vision

  447. Jianfeng Gao at Microsoft Research

  448. https%253A%252F%252Farxiv.org%252Fabs%252F2111.11432%2523microsoft.html

  449. LiT: Zero-Shot Transfer with Locked-image Text Tuning

  450. Lucas Beyer

  451. https%253A%252F%252Farxiv.org%252Fabs%252F2111.07991%2523google.html

  452. Scaling Law for Recommendation Models: Towards General-purpose User Representations

  453. https%253A%252F%252Farxiv.org%252Fabs%252F2111.11294.html

  454. Memorizing Transformers

  455. Yuhuai (Tony) Wu’s Home Page

  456. https%253A%252F%252Farxiv.org%252Fabs%252F2203.08913%2523google.html

  457. CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

  458. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253Dqw674L9PfQE.html

  459. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

  460. https%253A%252F%252Farxiv.org%252Fabs%252F2108.08877%2523google.html

  461. Internet-Augmented Dialogue Generation

  462. https%253A%252F%252Farxiv.org%252Fabs%252F2107.07566%2523facebook.html

  463. CLIP2Video: Mastering Video-Text Retrieval via Image CLIP

  464. https%253A%252F%252Farxiv.org%252Fabs%252F2106.11097.html

  465. Retrieval Augmentation Reduces Hallucination in Conversation

  466. https%253A%252F%252Farxiv.org%252Fabs%252F2104.07567%2523facebook.html

  467. China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.

  468. https%253A%252F%252Fsyncedreview.com%252F2021%252F03%252F23%252Fchinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0%252F%2523baai.html

  469. ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

  470. https%253A%252F%252Farxiv.org%252Fabs%252F2102.05918%2523google.html

  471. The Importance of Deconstruction

  472. Welcome

  473. https%253A%252F%252Fwww.youtube.com%252Fwatch%253Fv%253DkY2NHSKBi10.html

  474. Dynamic Evaluation of Transformer Language Models

  475. https%253A%252F%252Farxiv.org%252Fabs%252F1904.08378.html

  476. This Week’s Citation Classic: Nearest Neighbor Pattern Classification

  477. %252Fdoc%252Fai%252Fnn%252Fretrieval%252F1982-cover.pdf.html