Bibliography:

  1. ‘AI’ tag

  2. ‘Danbooru AI’ tag

  3. ‘data-augmented GANs’ tag

  4. ‘GAN’ tag

  5. ‘instruct-tuning LLMs’ tag

  6. ‘RL exploration’ tag

  7. Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

  8. BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

  9. HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

  10. Centaur: a foundation model of human cognition

  11. SimpleStrat: Diversifying Language Model Generation with Stratification

  12. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

  13. Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

  14. Seeing Faces in Things: A Model and Dataset for Pareidolia

  15. H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark

  16. How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

  17. To Code, or Not To Code? Exploring Impact of Code in Pre-training

  18. Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names

  19. ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning

  20. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  21. Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  22. Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets

  23. APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

  24. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

  25. OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

  26. DataComp-LM: In search of the next generation of training sets for language models

  27. GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

  28. Newswire: A Large-Scale Structured Database of a Century of Historical News

  29. Are We Done with MMLU?

  30. MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

  31. LLMs achieve adult human performance on higher-order theory of mind tasks

  32. DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

  33. Sakuga-42M Dataset: Scaling Up Cartoon Research

  34. Can Language Models Explain Their Own Classification Behavior?

  35. Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models

  36. ImageInWords: Unlocking Hyper-Detailed Image Descriptions

  37. GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

  38. Building a Large Japanese Web Corpus for Large Language Models

  39. CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge

  40. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

  41. Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

  42. How Tech Giants Cut Corners to Harvest Data for AI: OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems

  43. Vulnerability Detection with Code Language Models: How Far Are We?

  44. Long-form factuality in large language models

  45. COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

  46. RewardBench: Evaluating Reward Models for Language Modeling

  47. Evaluating Text to Image Synthesis: Survey and Taxonomy of Image Quality Metrics

  48. Hierarchical Feature Warping and Blending for Talking Head Animation

  49. Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

  50. ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

  51. Investigating Continual Pretraining in Large Language Models: Insights and Implications

  52. Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

  53. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

  54. DE-COP: Detecting Copyrighted Content in Language Models Training Data

  55. I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench

  56. Can AI Assistants Know What They Don’t Know?

  57. AnimeDiffusion: Anime Diffusion Colorization

  58. I am a Strange Dataset: Metalinguistic Tests for Language Models

  59. Generative AI for Math: Part I—MathPile: A Billion-Token-Scale Pretraining Corpus for Math

  60. WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

  61. Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

  62. StarVector: Generating Scalable Vector Graphics Code from Images

  63. Rich Human Feedback for Text-to-Image Generation

  64. TinyGSM: achieving >80% on GSM8k with small language models

  65. EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

  66. Retrieving Conditions from Reference Images for Diffusion Models

  67. Sequential Modeling Enables Scalable Learning for Large Vision Models

  68. BioCLIP: A Vision Foundation Model for the Tree of Life

  69. Efficient Transformer Knowledge Distillation: A Performance Review

  70. GPQA: A Graduate-Level Google-Proof Q&A Benchmark

  71. Dazed & Confused: A Large-Scale Real-World User Study of reCAPTCHAv2

  72. Instruction-Following Evaluation for Large Language Models

  73. In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

  74. AnyText: Multilingual Visual Text Generation And Editing

  75. GLaMM: Pixel Grounding Large Multimodal Model

  76. Don’t Make Your LLM an Evaluation Benchmark Cheater

  77. CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

  78. FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

  79. MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

  80. Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

  81. Llemma: An Open Language Model For Mathematics

  82. From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

  83. TabLib: A Dataset of 627M Tables with Context

  84. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

  85. OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

  86. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  87. UltraFeedback: Boosting Language Models with High-quality Feedback

  88. MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book

  89. Demystifying CLIP Data

  90. The Cambridge Law Corpus: A Corpus for Legal AI Research

  91. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

  92. LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

  93. SlimPajama-DC: Understanding Data Combinations for LLM Training

  94. MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

  95. GoodWiki

  96. From Sparse to Dense: GPT-4 Summarization with Chain of Density (CoD) Prompting

  97. FIMO: A Challenge Formal Dataset for Automated Theorem Proving

  98. American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

  99. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

  100. The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain

  101. Android in the Wild: A Large-Scale Dataset for Android Device Control

  102. DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

  103. AlpaGasus: Training A Better Alpaca with Fewer Data

  104. InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

  105. Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

  106. Test-Time Training on Video Streams

  107. HEADLINES: A Massive Scale Semantic Similarity Dataset of Historical English

  108. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

  109. SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

  110. ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

  111. Understanding Social Reasoning in Language Models with Language Models

  112. OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

  113. AI Is a Lot of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging—and not going anywhere

  114. Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model

  115. ChessGPT: Bridging Policy Learning and Language Modeling

  116. Why YouTube Could Give Google an Edge in AI

  117. Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

  118. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

  119. Let’s Verify Step by Step

  120. WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

  121. SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

  122. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

  123. TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

  124. Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

  125. LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

  126. Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings with Humans and Models

  127. ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification

  128. Parsing-Conditioned Anime Translation: A New Dataset and Method

  129. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

  130. Abstraction-Perception Preserving Cartoon Face Synthesis

  131. How well do Large Language Models perform in Arithmetic tasks?

  132. The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

  133. Large Language Models Are State-of-the-Art Evaluators of Translation Quality

  134. Benchmarks for Automated Commonsense Reasoning: A Survey

  135. Data Selection for Language Models via Importance Resampling

  136. Off-the-Grid MARL (OG-MARL): Datasets with Baselines for Offline Multi-Agent Reinforcement Learning

  137. The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

  138. The Semantic Scholar Open Data Platform

  139. Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation with Interaction

  140. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

  141. Med-PaLM: Large Language Models Encode Clinical Knowledge

  142. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

  143. HALIE: Evaluating Human-Language Model Interaction

  144. A Whack-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

  145. Text Embeddings by Weakly-Supervised Contrastive Pre-training

  146. The Stack: 3 TB of permissively licensed source code

  147. UniSumm: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuning

  148. A Creative Industry Image Generation Dataset Based on Captions

  149. AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

  150. AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies

  151. MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

  152. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  153. Dungeons and Data: A Large-Scale NetHack Dataset

  154. Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

  155. Large Language Models Can Self-Improve

  156. CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning

  157. MTEB: Massive Text Embedding Benchmark

  158. Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio

  159. Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)

  160. Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

  161. Brain Imaging Generation with Latent Diffusion Models

  162. PaLI: A Jointly-Scaled Multilingual Language-Image Model

  163. FOLIO: Natural Language Reasoning with First-Order Logic

  164. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  165. Bugs in the Data: How ImageNet Misrepresents Biodiversity

  166. Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning

  167. Benchmarking Compositionality with Formal Languages

  168. Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

  169. Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

  170. Few-shot Adaptation Works with UnpredicTable Data

  171. Language Models Can Teach Themselves to Program Better

  172. RealTime QA: What’s the Answer Right Now?

  173. NewsStories: Illustrating articles with visual summaries

  174. CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

  175. Why do tree-based models still outperform deep learning on tabular data?

  176. Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

  177. Forecasting Future World Events with Neural Networks

  178. RST: reStructured Pre-training

  179. Learning to Generate Artistic Character Line Drawing

  180. Dataset Condensation via Efficient Synthetic-Data Parameterization

  181. Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

  182. Fine-grained Image Captioning with CLIP Reward

  183. FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

  184. InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

  185. Learning to Model Editing Processes

  186. Flexible Diffusion Modeling of Long Videos

  187. Housekeep: Tidying Virtual Households using Commonsense Reasoning

  188. Instruction Induction: From Few Examples to Natural Language Task Descriptions

  189. Down and Across: Introducing Crossword-Solving as a New NLP Benchmark

  190. Automated Crossword Solving

  191. Dialog Inpainting: Turning Documents into Dialogues

  192. SymphonyNet: Symphony Generation with Permutation Invariant Language Model

  193. Building Machine Translation Systems for the Next Thousand Languages

  194. When does dough become a bagel? Analyzing the remaining mistakes on ImageNet

  195. Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

  196. A Challenging Benchmark of Anime Style Recognition

  197. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  198. Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

  199. KNN-Diffusion: Image Generation via Large-Scale Retrieval

  200. ByT5 model for massively multilingual grapheme-to-phoneme conversion

  201. STaR: Bootstrapping Reasoning With Reasoning

  202. CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning

  203. Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

  204. Self-Distilled StyleGAN: Towards Generation from Internet Photos

  205. RuCLIP—new models and experiments: a technical report

  206. Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework

  207. ROME: Locating and Editing Factual Associations in GPT

  208. DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

  209. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

  210. StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

  211. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

  212. Can Wikipedia Help Offline Reinforcement Learning?

  213. SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models

  214. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities

  215. WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

  216. SynthBio: A Case Study in Faster Curation of Text Datasets

  217. BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations

  218. ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

  219. A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

  220. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

  221. QuALITY: Question Answering with Long Input Texts, Yes!

  222. FRUIT: Faithfully Reflecting Updated Information in Text

  223. Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

  224. WebGPT: Browser-assisted question-answering with human feedback

  225. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

  226. MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

  227. BASIC: Combined Scaling for Open-Vocabulary Image Classification

  228. It’s About Time: Analog Clock Reading in the Wild

  229. Solving Probability and Statistics Problems by Program Synthesis

  230. Few-Shot Self-Rationalization with Natural Language Prompts

  231. AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment

  232. RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

  233. An Explanation of In-context Learning as Implicit Bayesian Inference

  234. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

  235. Training Verifiers to Solve Math Word Problems

  236. A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection

  237. HTCN: Harmonious Text Colorization Network for Visual-Textual Presentation Design

  238. T0: Multitask Prompted Training Enables Zero-Shot Task Generalization

  239. Can Machines Learn Morality? The Delphi Experiment

  240. Situated Dialogue Learning through Procedural Environment Generation

  241. MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

  242. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  243. MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

  244. LAION-400-Million Open Dataset

  245. Transfer Learning for Pose Estimation of Illustrated Characters

  246. MuSiQue: Multi-hop Questions via Single-hop Question Composition

  247. Scaling Vision Transformers

  248. QASPER: A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers

  249. XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond

  250. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

  251. SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

  252. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

  253. NaturalProofs: Mathematical Theorem Proving in Natural Language

  254. Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence (VitaminC)

  255. Are NLP Models really able to Solve Simple Math Word Problems?

  256. Measuring Mathematical Problem Solving With the MATH Dataset

  257. WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

  258. A massive 7T fMRI dataset to bridge cognitive and computational neuroscience

  259. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

  260. ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

  261. Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling

  262. Scaling Laws for Transfer

  263. Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning

  264. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

  265. CLIP: Learning Transferable Visual Models From Natural Language Supervision

  266. CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3

  267. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  268. Selective Eye-gaze Augmentation To Enhance Imitation Learning In Atari Games

  269. VoxLingua107: a Dataset for Spoken Language Recognition

  270. MoGaze: A Dataset of Full-Body Motions that Includes Workspace Geometry and Eye-Gaze

  271. End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks

  272. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

  273. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

  274. Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

  275. Open-Domain Question Answering Goes Conversational via Question Rewriting

  276. Digital Voicing of Silent Speech

  277. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

  278. MMLU: Measuring Massive Multitask Language Understanding

  279. ETHICS: Aligning AI With Shared Human Values

  280. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

  281. CoVoST 2 and Massively Multilingual Speech-to-Text Translation

  282. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

  283. The NetHack Learning Environment

  284. Anime Crop Datasets: Faces, Figures, & Hands

  285. ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data

  286. Shortcut Learning in Deep Neural Networks

  287. D4RL: Datasets for Deep Data-Driven Reinforcement Learning

  288. TyDiQA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

  289. SAYCam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective

  290. ImageNet-A: Natural Adversarial Examples

  291. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

  292. Libri-Light: A Benchmark for ASR with Limited or No Supervision

  293. How Can We Know What Language Models Know?

  294. SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling

  295. How Machine Learning Can Help Unlock the World of Ancient Japan

  296. Compressive Transformers for Long-Range Sequence Modeling

  297. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

  298. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

  299. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  300. Restoring ancient text using deep learning (Pythia): a case study on Greek epigraphy

  301. CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

  302. PubMedQA: A Dataset for Biomedical Research Question Answering

  303. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

  304. No Press Diplomacy: Modeling Multi-Agent Gameplay

  305. Language Modeling State-of-the-art leaderboards

  306. LVIS: A Dataset for Large Vocabulary Instance Segmentation

  307. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

  308. A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time

  309. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

  310. ImageNet-Sketch: Learning Robust Global Representations by Penalizing Local Predictive Power

  311. Cold Case: The Lost MNIST Digits

  312. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

  313. The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors

  314. ProductNet: a Collection of High-Quality Datasets for Product Representation Learning

  315. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

  316. Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset

  317. LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game

  318. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

  319. A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images

  320. Language Models are Unsupervised Multitask Learners

  321. The Omniglot challenge: a 3-year progress report

  322. Do We Train on Test Data? Purging CIFAR of Near-Duplicates

  323. The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions

  324. FIGR: Few-shot Image Generation with Reptile

  325. Natural Questions: A Benchmark for Question Answering Research

  326. A Style-Based Generator Architecture for Generative Adversarial Networks

  327. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

  328. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

  329. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

  330. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

  331. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

  332. CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

  333. A Short Note about Kinetics-600

  334. Cartoon Set

  335. Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations

  336. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

  337. Know What You Don’t Know: Unanswerable Questions for SQuAD

  338. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning

  339. Exploring the Limits of Weakly Supervised Pretraining

  340. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

  341. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

  342. The Sound of Pixels

  343. FEVER: a large-scale dataset for Fact Extraction and VERification

  344. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

  345. SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction

  346. 11K Hands: Gender recognition and biometric identification using a large dataset of hand images

  347. Progressive Growing of GANs for Improved Quality, Stability, and Variation

  348. OpenML Benchmarking Suites

  349. WebVision Database: Visual Learning and Understanding from Web Data

  350. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

  351. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

  352. Driver Identification Using Automobile Sensor Data from a Single Turn

  353. StreetStyle: Exploring world-wide clothing styles from millions of photos

  354. The Kinetics Human Action Video Dataset

  355. WebVision Challenge: Visual Learning and Understanding With Web Data

  356. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

  357. Dense-Captioning Events in Videos

  358. BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography

  359. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

  360. RACE: Large-scale ReAding Comprehension Dataset From Examinations

  361. NewsQA: A Machine Comprehension Dataset

  362. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

  363. Lip Reading Sentences in the Wild

  364. Pointer Sentinel Mixture Models

  365. Deep Learning the City: Quantifying Urban Perception At A Global Scale

  366. Solving General Arithmetic Word Problems

  367. The LAMBADA dataset: Word prediction requiring a broad discourse context

  368. SQuAD: 100,000+ Questions for Machine Comprehension of Text

  369. Matching Networks for One Shot Learning

  370. Convolutional Sketch Inversion

  371. The MovieLens Datasets: History and Context

  372. Neural Module Networks

  373. Sketch-based Manga Retrieval using Manga109 Dataset

  374. Amazon Reviews: Image-based Recommendations on Styles and Substitutes

  375. Teaching Machines to Read and Comprehend

  376. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

  377. VQA: Visual Question Answering

  378. YFCC100M: The New Data in Multimedia Research

  379. ImageNet Large Scale Visual Recognition Challenge

  380. Microsoft COCO: Common Objects in Context

  381. N-gram Counts and Language Models from the Common Crawl

  382. Ukiyo-e Search

  383. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

  384. The Caltech-UCSD Birds-200-2011 Dataset

  385. Unbiased look at dataset bias

  386. Caltech-UCSD Birds 200

  387. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments

  388. Building a Large Annotated Corpus of English: The Penn Treebank

  389. About the Test Data

  390. DataGemma: AI Open Models Connecting LLMs to Google’s Data Commons

  391. 412e052c87cdece32165dd01da74a55852ab5107.html

  392. Scale AI Secures $1B Funding at $14B Valuation As Its CEO Predicts Big Revenue Growth and Profitability by Year-End [On Very High Quality Data]

  393. No Robots: Look Ma, an instruction dataset that wasn’t generated by GPTs!

  394. Psych-101 Dataset [For Centaur]

  395. FineWeb: Decanting the Web for the Finest Text Data at Scale

  396. Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-Tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works

  397. Lip Reading Sentences in the Wild [Video]

  398. design#future-tag-features

    [Transclude the forward-link's context]

  399. 2023-pilaut-figure1-interactivechainpromptingforqaabouttranslationambiguities.jpg

  400. 2020-caswell-table2-examplesofmisleadingtextlanguageassociations.png

  401. 2008-sandhaus.pdf

  402. http://cl-informatik.uibk.ac.at/cek/holstep/ckfccs-holstep-submitted.pdf

  403. e0d6678fd4d64cea7e57de4e2de167be8e89e13a.pdf

  404. http://millionsongdataset.com/

  405. f84bcb50febe16269d8a130bff0f4a28ab04f186.html

  406. http://www.j-archive.com/

  407. 64e5b8dffdffc96e8a84f580a979a0f7a2386487.html

  408. https://aclanthology.org/2020.wmt-1.1.pdf

  409. e0ceee45b691fe5fa5c00ccba35a78fae6f334dd.pdf

  410. https://ai.facebook.com/research/publications/ego4d-unscripted-first-person-video-from-around-the-world-and-a-benchmark-suite-for-egocentric-perception

  411. https://annas-blog.org/duxiu-exclusive.html

  412. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=12d941c445ec477501f78b15dcf84f98173121cf

  413. https://commonvoice.mozilla.org/en

  414. https://crfm.stanford.edu/2023/03/13/alpaca.html

  415. https://danielpovey.com/files/2015_icassp_librispeech.pdf

  416. https://github.com/Farama-Foundation/D4RL

  417. https://github.com/lucidrains/openproteinset

  418. https://github.com/mbzuai-nlp/LaMini-LM

  419. https://github.com/nomic-ai/gpt4all

  420. https://github.com/openai/prm800k

  421. https://github.com/xq-meng/AnimeDiffusion

  422. https://huggingface.co/datasets/HuggingFaceFW/fineweb

  423. c8d4bc552e347626b3fe0fef3133895ebbd68965.html

  424. https://huggingface.co/datasets/PleIAs/YouTube-Commons

  425. 476e7773cf7fb95df4716d901b8d4a9ad8cd1ce6.html

  426. https://karpathy.github.io/2011/04/27/manually-classifying-cifar10/

  427. 80484448fd5f4d489296bcddc89c9c6c2636ea8b.html

  428. https://keithito.com/LJ-Speech-Dataset/

  429. https://laion.ai/blog/strategic-game-dataset/

  430. https://mattmahoney.net/dc/textdata.html

  431. https://openaccess.thecvf.com/content_cvpr_2014/papers/Andriluka_2D_Human_Pose_2014_CVPR_paper.pdf

  432. e7289c173c9284b1bea58b33f2154e8146f74429.pdf

  433. https://openai.com/index/mle-bench/

  434. https://paperswithcode.com/dataset/flickr30k

  435. https://paperswithcode.com/datasets

  436. https://parti.research.google/

  437. https://rom1504.github.io/clip-retrieval/

  438. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37648.pdf

  439. https://together.ai/blog/redpajama-data-v2

  440. https://www.bloomberg.com/news/features/2023-04-24/a-high-school-teacher-s-free-image-database-powers-ai-unicorns

  441. https://www.brown.edu/news/2023-04-25/open-web-text

  442. c0bcbe0d3fa93bef8abde06c137af0115131a2a8.html

  443. https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajamacr

  444. 8268afb70530ea03cb37dd29acb46077f8683241.html

  445. https://www.wired.com/story/battle-over-books3/

  446. 4866e6a99d9b852aca1de4265043c630dc67cb9a.html

  447. https://x.com/felix_red_panda/status/1723324786808692887

  448. BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

  449. https%253A%252F%252Farxiv.org%252Fabs%252F2411.13543.html

  450. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

  451. Lil'Log

  452. Homepage: Aleksander Mądry

  453. https%253A%252F%252Farxiv.org%252Fabs%252F2410.07095%2523openai.html

  454. ImagiNet: A Multi-Content Dataset for Generalizable Synthetic Image Detection via Contrastive Learning

  455. https%253A%252F%252Farxiv.org%252Fabs%252F2407.20020.html

  456. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  457. Owain Evans, AI Alignment Researcher

  458. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04694.html

  459. Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  460. Sam Bowman

  461. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04108.html

  462. Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets

  463. https%253A%252F%252Farxiv.org%252Fabs%252F2406.18906.html

  464. APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

  465. Caiming Xiong—Home Page

  466. https%253A%252F%252Farxiv.org%252Fabs%252F2406.18518%2523salesforce.html

  467. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

  468. https%253A%252F%252Farxiv.org%252Fabs%252F2406.13121%2523google.html

  469. DataComp-LM: In search of the next generation of training sets for language models

  470. Luke Zettlemoyer

  471. https%253A%252F%252Farxiv.org%252Fabs%252F2406.11794.html

  472. LLMs achieve adult human performance on higher-order theory of mind tasks

  473. https%253A%252F%252Farxiv.org%252Fabs%252F2405.18870%2523google.html

  474. DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

  475. https%253A%252F%252Farxiv.org%252Fabs%252F2405.15306.html

  476. ImageInWords: Unlocking Hyper-Detailed Image Descriptions

  477. https%253A%252F%252Farxiv.org%252Fabs%252F2405.02793%2523google.html

  478. GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

  479. https%253A%252F%252Farxiv.org%252Fabs%252F2405.00332%2523scale.html

  480. CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge

  481. https%253A%252F%252Farxiv.org%252Fabs%252F2404.06664.html

  482. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

  483. https%253A%252F%252Farxiv.org%252Fabs%252F2404.05955.html

  484. How Tech Giants Cut Corners to Harvest Data for AI: OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems

  485. https://x.com/cademetz

  486. https%253A%252F%252Fwww.nytimes.com%252F2024%252F04%252F06%252Ftechnology%252Ftech-giants-harvest-data-artificial-intelligence.html.html

  487. Vulnerability Detection with Code Language Models: How Far Are We?

  488. https%253A%252F%252Farxiv.org%252Fabs%252F2403.18624.html

  489. Long-form factuality in large language models

  490. https%253A%252F%252Farxiv.org%252Fabs%252F2403.18802%2523deepmind.html

  491. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

  492. https%253A%252F%252Farxiv.org%252Fabs%252F2402.11753.html

  493. StarVector: Generating Scalable Vector Graphics Code from Images

  494. https%253A%252F%252Farxiv.org%252Fabs%252F2312.11556.html

  495. EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

  496. https%253A%252F%252Farxiv.org%252Fabs%252F2312.06281.html

  497. Efficient Transformer Knowledge Distillation: A Performance Review

  498. https%253A%252F%252Farxiv.org%252Fabs%252F2311.13657.html

  499. CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

  500. Jonathan Frankle—Chief Neural Network Scientist at Databricks

  501. https%253A%252F%252Farxiv.org%252Fabs%252F2310.16825.html

  502. OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

  503. https%253A%252F%252Farxiv.org%252Fabs%252F2310.06786.html

  504. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

  505. Jason Wei

  506. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03214%2523google.html

  507. UltraFeedback: Boosting Language Models with High-quality Feedback

  508. Ning Ding

  509. https%253A%252F%252Farxiv.org%252Fabs%252F2310.01377.html

  510. Demystifying CLIP Data

  511. Luke Zettlemoyer

  512. https%253A%252F%252Farxiv.org%252Fabs%252F2309.16671.html

  513. The Cambridge Law Corpus: A Corpus for Legal AI Research

  514. https%253A%252F%252Farxiv.org%252Fabs%252F2309.12269.html

  515. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

  516. https%253A%252F%252Farxiv.org%252Fabs%252F2309.12284.html

  517. SlimPajama-DC: Understanding Data Combinations for LLM Training

  518. https%253A%252F%252Farxiv.org%252Fabs%252F2309.10818%2523cerebras.html

  519. From Sparse to Dense: GPT-4 Summarization with Chain of Density (CoD) Prompting

  520. https%253A%252F%252Farxiv.org%252Fabs%252F2309.04269.html

  521. American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

  522. https%253A%252F%252Farxiv.org%252Fabs%252F2308.12477.html

  523. AlpaGasus: Training A Better Alpaca with Fewer Data

  524. https%253A%252F%252Farxiv.org%252Fabs%252F2307.08701%2523samsung.html

  525. Test-Time Training on Video Streams

  526. Yu Sun

  527. https%253A%252F%252Farxiv.org%252Fabs%252F2307.05014.html

  528. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

  529. https%253A%252F%252Farxiv.org%252Fabs%252F2306.15626.html

  530. ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

  531. https%253A%252F%252Farxiv.org%252Fabs%252F2306.12587.html

  532. Understanding Social Reasoning in Language Models with Language Models

  533. https%253A%252F%252Farxiv.org%252Fabs%252F2306.15448.html

  534. AI Is a Lot of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging—and not going anywhere

  535. https%253A%252F%252Fwww.theverge.com%252Ffeatures%252F23764584%252Fai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots.html

  536. Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model

  537. %252Fdoc%252Fai%252Fanime%252Fdanbooru%252F2023-yi.pdf.html

  538. Why YouTube Could Give Google an Edge in AI

  539. https%253A%252F%252Fwww.theinformation.com%252Farticles%252Fwhy-youtube-could-give-google-an-edge-in-ai.html

  540. Let’s Verify Step by Step

  541. Jan Leike

  542. John Schulman’s Homepage

  543. https%253A%252F%252Farxiv.org%252Fabs%252F2305.20050%2523openai.html

  544. SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

  545. https%253A%252F%252Farxiv.org%252Fabs%252F2305.11840%2523google.html

  546. TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

  547. https%253A%252F%252Farxiv.org%252Fabs%252F2305.07759%2523microsoft.html

  548. Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

  549. Omer Levy

  550. https%253A%252F%252Farxiv.org%252Fabs%252F2305.01569.html

  551. ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification

  552. https%253A%252F%252Farxiv.org%252Fabs%252F2304.05538.html

  553. How well do Large Language Models perform in Arithmetic tasks?

  554. https%253A%252F%252Farxiv.org%252Fabs%252F2304.02015%2523alibaba.html

  555. Large Language Models Are State-of-the-Art Evaluators of Translation Quality

  556. https%253A%252F%252Farxiv.org%252Fabs%252F2302.14520.html

  557. Data Selection for Language Models via Importance Resampling

  558. Percy Liang

  559. https%253A%252F%252Farxiv.org%252Fabs%252F2302.03169.html

  560. Med-PaLM: Large Language Models Encode Clinical Knowledge

  561. Jason Wei

  562. https%253A%252F%252Farxiv.org%252Fabs%252F2212.13138%2523google.html

  563. Text Embeddings by Weakly-Supervised Contrastive Pre-training

  564. Furu Wei

  565. https%253A%252F%252Farxiv.org%252Fabs%252F2212.03533%2523microsoft.html

  566. The Stack: 3 TB of permissively licensed source code

  567. Thomas Wolf

  568. Dzmitry Bahdanau

  569. https%253A%252F%252Farxiv.org%252Fabs%252F2211.15533.html

  570. AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

  571. https%253A%252F%252Farxiv.org%252Fabs%252F2211.06679%2523baai.html

  572. BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning

  573. Thomas Wang

  574. Stella Biderman

  575. Teven Le Scao

  576. Sheng Shen’s Homepage

  577. Colin Raffel

  578. https%253A%252F%252Farxiv.org%252Fabs%252F2211.01786.html

  579. Large Language Models Can Self-Improve

  580. https%253A%252F%252Farxiv.org%252Fabs%252F2210.11610%2523google.html

  581. CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning

  582. https%253A%252F%252Farxiv.org%252Fabs%252F2210.07792%2523eleutherai.html

  583. Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio

  584. https%253A%252F%252Faclanthology.org%252F2022.cai-1.2.pdf.html

  585. Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)

  586. Noah A. Smith

  587. Mike Lewis

  588. https%253A%252F%252Farxiv.org%252Fabs%252F2210.03350%2523allen.html

  589. FOLIO: Natural Language Reasoning with First-Order Logic

  590. Caiming Xiong—Home Page

  591. https%253A%252F%252Farxiv.org%252Fabs%252F2209.00840.html

  592. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  593. About Me

  594. Saurav Kadavath

  595. Andy Jones

  596. Sam Bowman

  597. Sam McCandlish

  598. Jared Kaplan

  599. https://jack-clark.net/about/

  600. https%253A%252F%252Fwww.anthropic.com%252Fred_teaming.pdf.html

  601. Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning

  602. https%253A%252F%252Farxiv.org%252Fabs%252F2208.08831%2523deepmind.html

  603. Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

  604. https%253A%252F%252Farxiv.org%252Fabs%252F2208.05516.html

  605. NewsStories: Illustrating articles with visual summaries

  606. https%253A%252F%252Farxiv.org%252Fabs%252F2207.13061.html

  607. Forecasting Future World Events with Neural Networks

  608. Andy Zou

  609. Mantas Mazeika

  610. Jacob Steinhardt

  611. Owain Evans, AI Alignment Researcher

  612. https://people.eecs.berkeley.edu/~hendrycks/

  613. https%253A%252F%252Farxiv.org%252Fabs%252F2206.15474.html

  614. Automated Crossword Solving

  615. https%253A%252F%252Farxiv.org%252Fabs%252F2205.09665%2523bair.html

  616. Dialog Inpainting: Turning Documents into Dialogues

  617. https%253A%252F%252Farxiv.org%252Fabs%252F2205.09073%2523google.html

  618. Building Machine Translation Systems for the Next Thousand Languages

  619. https%253A%252F%252Farxiv.org%252Fabs%252F2205.03983%2523google.html

  620. When does dough become a bagel? Analyzing the remaining mistakes on ImageNet

  621. https%253A%252F%252Farxiv.org%252Fabs%252F2205.04596%2523google.html

  622. Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

  623. https%253A%252F%252Farxiv.org%252Fabs%252F2205.01397.html

  624. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  625. Yizhong Wang—University of Washington

  626. Noah A. Smith

  627. Hannaneh Hajishirzi—University of Washington

  628. https%253A%252F%252Farxiv.org%252Fabs%252F2204.07705.html

  629. ByT5 model for massively multilingual grapheme-to-phoneme conversion

  630. https%253A%252F%252Farxiv.org%252Fabs%252F2204.03067.html

  631. CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning

  632. https%253A%252F%252Farxiv.org%252Fabs%252F2203.11096.html

  633. Self-Distilled StyleGAN: Towards Generation from Internet Photos

  634. https%253A%252F%252Farxiv.org%252Fabs%252F2202.12211%2523google.html

  635. Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework

  636. https%253A%252F%252Farxiv.org%252Fabs%252F2202.06767%2523huawei.html

  637. StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

  638. https%253A%252F%252Farxiv.org%252Fabs%252F2202.00273.html

  639. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

  640. Caiming Xiong—Home Page

  641. https%253A%252F%252Farxiv.org%252Fabs%252F2201.12086%2523salesforce.html

  642. SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models

  643. Ross Girshick

  644. Laurens Van Der Maaten

  645. https%253A%252F%252Farxiv.org%252Fabs%252F2201.08371%2523facebook.html

  646. WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

  647. Noah A. Smith

  648. https%253A%252F%252Fswabhs.com%252Fassets%252Fpdf%252Fwanli.pdf%2523allen.html

  649. BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations

  650. Antonio Torralba—MIT-IBM Watson AI Lab

  651. https%253A%252F%252Farxiv.org%252Fabs%252F2201.04684.html

  652. ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

  653. Yu Sun

  654. https%253A%252F%252Farxiv.org%252Fabs%252F2112.15283%2523baidu.html

  655. WebGPT: Browser-assisted question-answering with human feedback

  656. Jacob Hilton's Homepage

  657. Gretchen Krueger

  658. John Schulman’s Homepage

  659. https%253A%252F%252Farxiv.org%252Fabs%252F2112.09332%2523openai.html

  660. BASIC: Combined Scaling for Open-Vocabulary Image Classification

  661. Zihang Dai

  662. https%253A%252F%252Farxiv.org%252Fabs%252F2111.10050%2523google.html

  663. It’s About Time: Analog Clock Reading in the Wild

  664. https%253A%252F%252Farxiv.org%252Fabs%252F2111.09162.html

  665. Solving Probability and Statistics Problems by Program Synthesis

  666. https%253A%252F%252Farxiv.org%252Fabs%252F2111.08267.html

  667. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

  668. https%253A%252F%252Farxiv.org%252Fabs%252F2111.02114%2523laion.html

  669. Training Verifiers to Solve Math Word Problems

  670. Jacob Hilton's Homepage

  671. John Schulman’s Homepage

  672. https%253A%252F%252Farxiv.org%252Fabs%252F2110.14168%2523openai.html

  673. A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection

  674. https%253A%252F%252Felifesciences.org%252Farticles%252F66039.html

  675. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  676. Jacob Hilton's Homepage

  677. Owain Evans, AI Alignment Researcher

  678. https%253A%252F%252Farxiv.org%252Fabs%252F2109.07958.html

  679. LAION-400-Million Open Dataset

  680. https%253A%252F%252Flaion.ai%252Fblog%252Flaion-400-open-dataset%252F.html

  681. Scaling Vision Transformers

  682. Neil Houlsby

  683. Lucas Beyer

  684. https%253A%252F%252Farxiv.org%252Fabs%252F2106.04560%2523google.html

  685. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

  686. https%253A%252F%252Farxiv.org%252Fabs%252F2103.14749.html

  687. ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

  688. https%253A%252F%252Farxiv.org%252Fabs%252F2102.05918%2523google.html

  689. Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling

  690. https%253A%252F%252Farxiv.org%252Fabs%252F2102.01951%2523scaling%2526org%253Ddeepmind.html

  691. CLIP: Learning Transferable Visual Models From Natural Language Supervision

  692. Alec Radford

  693. Jong Wook Kim

  694. Aditya A. Ramesh

  695. Sandhini Agarwal

  696. About Me

  697. https://jack-clark.net/about/

  698. Gretchen Krueger

  699. https%253A%252F%252Fcdn.openai.com%252Fpapers%252FLearning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf.html

  700. CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3

  701. Alec Radford

  702. Jong Wook Kim

  703. Gretchen Krueger

  704. Sandhini Agarwal

  705. https%253A%252F%252Fopenai.com%252Findex%252Fclip%252F.html

  706. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  707. Leo Gao

  708. Stella Biderman

  709. https://x.com/NoaNabeshima

  710. https://x.com/theshawwn

  711. https%253A%252F%252Farxiv.org%252Fabs%252F2101.00027%2523eleutherai.html

  712. Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

  713. https%253A%252F%252Farxiv.org%252Fabs%252F2010.14571%2523google.html

  714. MMLU: Measuring Massive Multitask Language Understanding

  715. https://people.eecs.berkeley.edu/~hendrycks/

  716. Steven's Web Thoughts

  717. Andy Zou

  718. Mantas Mazeika

  719. Jacob Steinhardt

  720. https%253A%252F%252Farxiv.org%252Fabs%252F2009.03300.html

  721. Anime Crop Datasets: Faces, Figures, & Hands

  722. Gwern.net Homepage

    [Transclude the forward-link's context]

  723. https://x.com/arfafax

  724. https://x.com/theshawwn

  725. %252Fcrop.html

  726. Compressive Transformers for Long-Range Sequence Modeling

  727. https%253A%252F%252Farxiv.org%252Fabs%252F1911.05507%2523deepmind.html

  728. Language Modeling State-of-the-art leaderboards

  729. https%253A%252F%252Fpaperswithcode.com%252Ftask%252Flanguage-modelling.html

  730. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

  731. Alex Wang—Personal Site

  732. Nikita Nangia

  733. Amanpreet Singh

  734. Julian Michael

  735. Language Understanding Grounded in Perception and Action

  736. Omer Levy

  737. Sam Bowman

  738. https%253A%252F%252Farxiv.org%252Fabs%252F1905.00537.html

  739. CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

  740. https%253A%252F%252Farxiv.org%252Fabs%252F1808.01097.html

  741. A Short Note about Kinetics-600

  742. https%253A%252F%252Farxiv.org%252Fabs%252F1808.01340%2523deepmind.html

  743. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

  744. %252Fdoc%252Fai%252Fnn%252Fdiffusion%252F2018-sharma.pdf%2523google.html

  745. Exploring the Limits of Weakly Supervised Pretraining

  746. Ross Girshick

  747. Laurens Van Der Maaten

  748. https%253A%252F%252Farxiv.org%252Fabs%252F1805.00932%2523facebook.html

  749. A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

  750. Ilya Loshchilov

  751. Profile – Machine Learning Lab

  752. https%253A%252F%252Farxiv.org%252Fabs%252F1707.08819.html

  753. WebVision Challenge: Visual Learning and Understanding With Web Data

  754. https%253A%252F%252Farxiv.org%252Fabs%252F1705.05640.html

  755. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

  756. Kyunghyun Cho

  757. https%253A%252F%252Farxiv.org%252Fabs%252F1704.05179.html

  758. N-gram Counts and Language Models from the Common Crawl

  759. http%253A%252F%252Fwww.lrec-conf.org%252Fproceedings%252Flrec2014%252Fpdf%252F1097_Paper.pdf.html

  760. Unbiased look at dataset bias

  761. Antonio Torralba—MIT-IBM Watson AI Lab

  762. %252Fdoc%252Fai%252Fdataset%252F2011-torralba.pdf.html