- See Also
-
Links
- “GPQA: A Graduate-Level Google-Proof Q&A Benchmark”, Rein et al 2023
- “GLaMM: Pixel Grounding Large Multimodal Model”, Rasheed et al 2023
- “Don’t Make Your LLM an Evaluation Benchmark Cheater”, Zhou et al 2023
- “From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions”, Lai et al 2023
- “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023
- “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Vu et al 2023
- “Demystifying CLIP Data”, Xu et al 2023
- “LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models”, Chen et al 2023
- “The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023
- “GoodWiki”, Choi 2023
- “MADLAD-400: A Multilingual And Document-Level Large Audited Dataset”, Kudugunta et al 2023
- “From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, Adams et al 2023
- “American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers”, Dell et al 2023
- “The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain”, Moskvichev et al 2023
- “DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI”, Zhang et al 2023
- “Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
- “AlpaGasus: Training A Better Alpaca With Fewer Data”, Chen et al 2023
- “InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation”, Wang et al 2023
- “Instruction Mining: High-Quality Instruction Data Selection for Large Language Models”, Cao et al 2023
- “Test-Time Training on Video Streams”, Wang et al 2023
- “HEADLINES: A Massive Scale Semantic Similarity Dataset of Historical English”, Silcock & Dell 2023
- “LeanDojo: Theorem Proving With Retrieval-Augmented Language Models”, Yang et al 2023
- “SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality”, Hsieh et al 2023
- “OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”, Laurençon et al 2023
- “ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”, D’Arcy et al 2023
- “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—and Not Going Anywhere”, Dzieza 2023
- “Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model”, Yi et al 2023
- “ChessGPT: Bridging Policy Learning and Language Modeling”, Feng et al 2023
- “Why YouTube Could Give Google an Edge in AI”, Victor 2023
- “Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks”, Veselovsky et al 2023
- “The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora With Web Data, and Web Data Only”, Penedo et al 2023
- “Let’s Verify Step by Step”, Lightman et al 2023
- “SeeGULL: A Stereotype Benchmark With Broad Geo-Cultural Coverage Leveraging Generative Models”, Jha et al 2023
- “C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models”, Huang et al 2023
- “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”, Eldan & Li 2023
- “Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”, Kirstain et al 2023
- “LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
- “Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings With Humans and Models”, Wei et al 2023
- “ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification”, Taesiri et al 2023
- “Parsing-Conditioned Anime Translation: A New Dataset and Method”, Li et al 2023c
- “Abstraction-Perception Preserving Cartoon Face Synthesis”, Ho et al 2023
- “AnimeDiffusion: Anime Face Line Drawing Colorization via Diffusion Models”, Cao et al 2023
- “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
- “Benchmarks for Automated Commonsense Reasoning: A Survey”, Davis 2023
- “The BabyLM Challenge: Sample-efficient Pretraining on a Developmentally Plausible Corpus”, Warstadt et al 2023
- “Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
- “The Semantic Scholar Open Data Platform”, Kinney et al 2023
- “How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection”, Guo et al 2023
- “Med-PaLM: Large Language Models Encode Clinical Knowledge”, Singhal et al 2022
- “Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor”, Honovich et al 2022
- “A Whack-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others”, Li et al 2022
- “Text Embeddings by Weakly-Supervised Contrastive Pre-training”, Wang et al 2022
- “The Stack: 3 TB of Permissively Licensed Source Code”, Kocetkov et al 2022
- “UniSumm: Unified Few-shot Summarization With Multi-Task Pre-Training and Prefix-Tuning”, Chen et al 2022
- “A Creative Industry Image Generation Dataset Based on Captions”, Yuejia et al 2022
- “AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities”, Chen et al 2022
- “MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation”, Feng et al 2022
- “AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies”, Siyao et al 2022
- “BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, Muennighoff et al 2022
- “Dungeons and Data: A Large-Scale NetHack Dataset”, Hambro et al 2022
- “Will We Run out of Data? An Analysis of the Limits of Scaling Datasets in Machine Learning”, Villalobos et al 2022
- “Large Language Models Can Self-Improve”, Huang et al 2022
- “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, Castricato et al 2022
- “MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
- “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Roush et al 2022
- “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022
- “Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Lu et al 2022
- “Brain Imaging Generation With Latent Diffusion Models”, Pinaya et al 2022
- “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Chen et al 2022
- “FOLIO: Natural Language Reasoning With First-Order Logic”, Han et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “Bugs in the Data: How ImageNet Misrepresents Biodiversity”, Luccioni & Rolnick 2022
- “Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Wiles et al 2022
- “Benchmarking Compositionality With Formal Languages”, Valvoda et al 2022
- “Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, Nguyen et al 2022
- “Learning to Generalize With Object-centric Agents in the Open World Survival Game Crafter”, Stanić et al 2022
- “Few-shot Adaptation Works With UnpredicTable Data”, Chan et al 2022
- “Language Models Can Teach Themselves to Program Better”, Haluptzok et al 2022
- “RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
- “NewsStories: Illustrating Articles With Visual Summaries”, Tan et al 2022
- “CelebV-HQ: A Large-Scale Video Facial Attributes Dataset”, Zhu et al 2022
- “Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?”, Grinsztajn et al 2022
- “Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset”, Henderson et al 2022
- “Forecasting Future World Events With Neural Networks”, Zou et al 2022
- “RST: ReStructured Pre-training”, Yuan & Liu 2022
- “Learning to Generate Artistic Character Line Drawing”, Fang et al 2022
- “Dataset Condensation via Efficient Synthetic-Data Parameterization”, Kim et al 2022
- “Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions”, Jiang et al 2022
- “Fine-grained Image Captioning With CLIP Reward”, Cho et al 2022
- “InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning”, Gupta et al 2022
- “Learning to Model Editing Processes”, Reid & Neubig 2022
- “Flexible Diffusion Modeling of Long Videos”, Harvey et al 2022
- “Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Honovich et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
- “Down and Across: Introducing Crossword-Solving As a New NLP Benchmark”, Kulshreshtha et al 2022
- “Automated Crossword Solving”, Wallace et al 2022
- “Dialog Inpainting: Turning Documents into Dialogues”, Dai et al 2022
- “SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Liu et al 2022
- “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, Vasudevan et al 2022
- “Building Machine Translation Systems for the Next Thousand Languages”, Bapna et al 2022
- “Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”, Fang et al 2022
- “A Challenging Benchmark of Anime Style Recognition”, Li et al 2022
- “Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, Wang et al 2022
- “Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality”, Thrush et al 2022
- “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Zhu et al 2022
- “KNN-Diffusion: Image Generation via Large-Scale Retrieval”, Ashual et al 2022
- “STaR: Bootstrapping Reasoning With Reasoning”, Zelikman et al 2022
- “CLIP Meets GamePhysics: Towards Bug Identification in Gameplay Videos Using Zero-shot Transfer Learning”, Taesiri et al 2022
- “Bamboo: Building Mega-Scale Vision Dataset Continually With Human-Machine Synergy”, Zhang et al 2022
- “Self-Distilled StyleGAN: Towards Generation from Internet Photos”, Mokady et al 2022
- “RuCLIP—new Models and Experiments: a Technical Report”, Shonenkov et al 2022
- “Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework”, Gu et al 2022
- “ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
- “DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, Cho et al 2022
- “PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
- “StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets”, Sauer et al 2022
- “Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
- “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation”, Li et al 2022
- “SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, Singh et al 2022
- “CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities”, Lee et al 2022
- “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
- “SynthBio: A Case Study in Faster Curation of Text Datasets”, Yuan et al 2022
- “BigDatasetGAN: Synthesizing ImageNet With Pixel-wise Annotations”, Li et al 2022
- “ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Zhang et al 2021
- “A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision”, Tejankar et al 2021
- “GLIDE: Towards Photorealistic Image Generation and Editing With Text-Guided Diffusion Models”, Nichol et al 2021
- “WebGPT: Browser-assisted Question-answering With Human Feedback”, Nakano et al 2021
- “Models in the Loop: Aiding Crowdworkers With Generative Annotation Assistants”, Bartolo et al 2021
- “FRUIT: Faithfully Reflecting Updated Information in Text”, IV et al 2021
- “GLaM: Efficient Scaling of Language Models With Mixture-of-Experts”, Du et al 2021
- “MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions”, Soldan et al 2021
- “BASIC: Combined Scaling for Open-Vocabulary Image Classification”, Pham et al 2021
- “Few-Shot Self-Rationalization With Natural Language Prompts”, Marasović et al 2021
- “Solving Probability and Statistics Problems by Program Synthesis”, Tang et al 2021
- “AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment”, Kim et al 2021
- “RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning”, Ramos et al 2021
- “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Schuhmann et al 2021
- “An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
- “Training Verifiers to Solve Math Word Problems”, Cobbe et al 2021
- “A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-dependent Action Selection”, Hulse et al 2021
- “HTCN: Harmonious Text Colorization Network for Visual-Textual Presentation Design”, Yang et al 2021c
- “T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Sanh et al 2021
- “Can Machines Learn Morality? The Delphi Experiment”, Jiang et al 2021
- “Situated Dialogue Learning through Procedural Environment Generation”, Ammanabrolu et al 2021
- “MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research”, Samvelyan et al 2021
- “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
- “MiniF2F: a Cross-system Benchmark for Formal Olympiad-level Mathematics”, Zheng et al 2021
- “LAION-400-Million Open Dataset”, Schuhmann 2021
- “Transfer Learning for Pose Estimation of Illustrated Characters”, Chen & Zwicker 2021
- “MuSiQue: Multi-hop Questions via Single-hop Question Composition”, Trivedi et al 2021
- “Scaling Vision Transformers”, Zhai et al 2021
- “XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond”, Barbieri et al 2021
- “SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”, Chan et al 2021
- “Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks”, Northcutt et al 2021
- “NaturalProofs: Mathematical Theorem Proving in Natural Language”, Welleck et al 2021
- “Get Your Vitamin C! Robust Fact Verification With Contrastive Evidence (VitaminC)”, Schuster et al 2021
- “Are NLP Models Really Able to Solve Simple Math Word Problems?”, Patel et al 2021
- “Measuring Mathematical Problem Solving With the MATH Dataset”, Hendrycks et al 2021
- “WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning”, Srinivasan et al 2021
- “A Massive 7T FMRI Dataset to Bridge Cognitive and Computational Neuroscience”, Allen et al 2021
- “Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”, Changpinyo et al 2021
- “ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, Jia et al 2021
- “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, Lazaridou et al 2021
- “Scaling Laws for Transfer”, Hernandez et al 2021
- “MSR-VTT: A Large Video Description Dataset for Bridging Video and Language”, Xu et al 2021
- “CLIP: Connecting Text and Images: We’re Introducing a Neural Network Called CLIP Which Efficiently Learns Visual Concepts from Natural Language Supervision. CLIP Can Be Applied to Any Visual Classification Benchmark by Simply Providing the Names of the Visual Categories to Be Recognized, Similar to the ‘zero-shot’ Capabilities of GPT-2 and GPT-3”, Radford et al 2021
- “CLIP: Learning Transferable Visual Models From Natural Language Supervision”, Radford et al 2021
- “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
- “MoGaze: A Dataset of Full-Body Motions That Includes Workspace Geometry and Eye-Gaze”, Kratzer et al 2020
- “End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks”, Xue 2020
- “Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps”, Ho et al 2020
- “Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, Caswell et al 2020
- “Open-Domain Question Answering Goes Conversational via Question Rewriting”, Anantha et al 2020
- “Digital Voicing of Silent Speech”, Gaddy & Klein 2020
- “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing”, Gu et al 2020
- “The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization”, Hendrycks et al 2020
- “The NetHack Learning Environment”, Küttler et al 2020
- “Anime Crop Datasets: Faces, Figures, & Hands”, Branwen et al 2020
- “ForecastQA: A Question Answering Challenge for Event Forecasting With Temporal Text Data”, Jin et al 2020
- “Shortcut Learning in Deep Neural Networks”, Geirhos et al 2020
- “D4RL: Datasets for Deep Data-Driven Reinforcement Learning”, Fu et al 2020
- “TyDiQA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages”, Clark et al 2020
- “SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded from the Infant’s Perspective”, Sullivan et al 2020
- “ImageNet-A: Natural Adversarial Examples”, Hendrycks et al 2020
- “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”, Keysers et al 2019
- “Libri-Light: A Benchmark for ASR With Limited or No Supervision”, Kahn et al 2019
- “How Can We Know What Language Models Know?”, Jiang et al 2019
- “SimpleBooks: Long-term Dependency Book Dataset With Simplified English Vocabulary for Word-level Language Modeling”, Nguyen 2019
- “Compressive Transformers for Long-Range Sequence Modelling”, Rae et al 2019
- “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
- “CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data”, Wenzek et al 2019
- “T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Raffel et al 2019
- “Restoring Ancient Text Using Deep Learning (Pythia): a Case Study on Greek Epigraphy”, Assael et al 2019
- “CATER: A Diagnostic Dataset for Compositional Actions and TEmporal Reasoning”, Girdhar & Ramanan 2019
- “PubMedQA: A Dataset for Biomedical Research Question Answering”, Jin et al 2019
- “ObjectNet: A Large-scale Bias-controlled Dataset for Pushing the Limits of Object Recognition Models”, Barbu et al 2019
- “No Press Diplomacy: Modeling Multi-Agent Gameplay”, Paquette et al 2019
- “LVIS: A Dataset for Large Vocabulary Instance Segmentation”, Gupta et al 2019
- “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Socher et al 2019
- “A Large Single-participant FMRI Dataset for Probing Brain Responses to Naturalistic Stimuli in Space and Time”, Seeliger et al 2019
- “OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge”, Marino et al 2019
- “ImageNet-Sketch: Learning Robust Global Representations by Penalizing Local Predictive Power”, Wang et al 2019
- “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems”, Wang et al 2019
- “The MineRL 2019 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, Guss et al 2019
- “ProductNet: a Collection of High-Quality Datasets for Product Representation Learning”, Wang et al 2019
- “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, Hendrycks & Dietterich 2019
- “LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game”, Urbanek et al 2019
- “A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images”, Leuner 2019
- “Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
- “The Omniglot Challenge: a 3-year Progress Report”, Lake et al 2019
- “Do We Train on Test Data? Purging CIFAR of Near-Duplicates”, Barz & Denzler 2019
- “FIGR: Few-shot Image Generation With Reptile”, Clouâtre & Demers 2019
- “Natural Questions: A Benchmark for Question Answering Research”, Kwiatkowski et al 2019
- “A Style-Based Generator Architecture for Generative Adversarial Networks”, Karras et al 2018
- “ImageNet-trained CNNs Are Biased towards Texture; Increasing Shape Bias Improves Accuracy and Robustness”, Geirhos et al 2018
- “The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale”, Kuznetsova et al 2018
- “CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge”, Talmor et al 2018
- “HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering”, Yang et al 2018
- “Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization”, Narayan et al 2018
- “A Short Note about Kinetics-600”, Carreira et al 2018
- “CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, Guo et al 2018
- “Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, Hendrycks & Dietterich 2018
- “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning”, Sharma et al 2018
- “Know What You Don’t Know: Unanswerable Questions for SQuAD”, Rajpurkar et al 2018
- “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”, Yu et al 2018
- “Exploring the Limits of Weakly Supervised Pretraining”, Mahajan et al 2018
- “Newsroom: A Dataset of 1.3 Million Summaries With Diverse Extractive Strategies”, Grusky et al 2018
- “GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding”, Wang et al 2018
- “The Sound of Pixels”, Zhao et al 2018
- “Think You Have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge”, Clark et al 2018
- “FEVER: a Large-scale Dataset for Fact Extraction and VERification”, Thorne et al 2018
- “SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction”, Liang et al 2018
- “11K Hands: Gender Recognition and Biometric Identification Using a Large Dataset of Hand Images”, Afifi 2017
- “WebVision Database: Visual Learning and Understanding from Web Data”, Li et al 2017
- “A Downsampled Variant of ImageNet As an Alternative to the CIFAR Datasets”, Chrabaszcz et al 2017
- “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”, Sun et al 2017
- “Driver Identification Using Automobile Sensor Data from a Single Turn”, Hallac et al 2017
- “The Kinetics Human Action Video Dataset”, Kay et al 2017
- “TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”, Joshi et al 2017
- “Dense-Captioning Events in Videos”, Krishna et al 2017
- “BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography”, Wilber et al 2017
- “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Dunn et al 2017
- “RACE: Large-scale ReAding Comprehension Dataset From Examinations”, Lai et al 2017
- “NewsQA: A Machine Comprehension Dataset”, Trischler et al 2016
- “MS MARCO: A Human Generated MAchine Reading COmprehension Dataset”, Bajaj et al 2016
- “Pointer Sentinel Mixture Models”, Merity et al 2016
- “Solving General Arithmetic Word Problems”, Roy & Roth 2016
- “The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context”, Paperno et al 2016
- “SQuAD: 100,000+ Questions for Machine Comprehension of Text”, Rajpurkar et al 2016
- “Convolutional Sketch Inversion”, Güçlütürk et al 2016
- “Danbooru2021: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset”, Gwern 2015
- “Neural Module Networks”, Andreas et al 2015
- “Sketch-based Manga Retrieval Using Manga109 Dataset”, Matsui et al 2015
- “Amazon Reviews: Image-based Recommendations on Styles and Substitutes”, McAuley et al 2015
- “LSUN: Construction of a Large-scale Image Dataset Using Deep Learning With Humans in the Loop”, Yu et al 2015
- “Teaching Machines to Read and Comprehend”, Hermann et al 2015
- “YFCC100M: The New Data in Multimedia Research”, Thomee et al 2015
- “ImageNet Large Scale Visual Recognition Challenge”, Russakovsky et al 2014
- “Microsoft COCO: Common Objects in Context”, Lin et al 2014
- “N-gram Counts and Language Models from the Common Crawl”, Buck et al 2014
- “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild”, Soomro et al 2012
- “The Caltech-UCSD Birds-200-2011 Dataset”, Wah et al 2011
- “Unbiased Look at Dataset Bias”, Torralba & Efros 2011
- “Caltech-UCSD Birds 200”, Welinder et al 2010
- “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments”, Huang et al 2008
- “Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“GPQA: A Graduate-Level Google-Proof Q&A Benchmark”, Rein et al 2023
“GLaMM: Pixel Grounding Large Multimodal Model”, Rasheed et al 2023
“Don’t Make Your LLM an Evaluation Benchmark Cheater”, Zhou et al 2023
“From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions”, Lai et al 2023
“From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions”
“OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023
“OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”
“FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, Vu et al 2023
“FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation”
“Demystifying CLIP Data”, Xu et al 2023
“LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models”, Chen et al 2023
“LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models”
“The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023
“GoodWiki”, Choi 2023
“MADLAD-400: A Multilingual And Document-Level Large Audited Dataset”, Kudugunta et al 2023
“MADLAD-400: A Multilingual And Document-Level Large Audited Dataset”
“From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, Adams et al 2023
“From Sparse to Dense: GPT-4 Summarization with Chain of Density (CoD) Prompting”
“American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers”, Dell et al 2023
“American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers”
“The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain”, Moskvichev et al 2023
“The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain”
“DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI”, Zhang et al 2023
“DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI”
“Android in the Wild: A Large-Scale Dataset for Android Device Control”, Rawles et al 2023
“Android in the Wild: A Large-Scale Dataset for Android Device Control”
“AlpaGasus: Training A Better Alpaca With Fewer Data”, Chen et al 2023
“InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation”, Wang et al 2023
“InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation”
“Instruction Mining: High-Quality Instruction Data Selection for Large Language Models”, Cao et al 2023
“Instruction Mining: High-Quality Instruction Data Selection for Large Language Models”
“Test-Time Training on Video Streams”, Wang et al 2023
“HEADLINES: A Massive Scale Semantic Similarity Dataset of Historical English”, Silcock & Dell 2023
“HEADLINES: A Massive Scale Semantic Similarity Dataset of Historical English”
“LeanDojo: Theorem Proving With Retrieval-Augmented Language Models”, Yang et al 2023
“LeanDojo: Theorem Proving with Retrieval-Augmented Language Models”
“SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality”, Hsieh et al 2023
“SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality”
“OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”, Laurençon et al 2023
“OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents”
“ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”, D’Arcy et al 2023
“ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”
“AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—and Not Going Anywhere”, Dzieza 2023
“Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model”, Yi et al 2023
“Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model”
“ChessGPT: Bridging Policy Learning and Language Modeling”, Feng et al 2023
“Why YouTube Could Give Google an Edge in AI”, Victor 2023
“Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks”, Veselovsky et al 2023
“The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora With Web Data, and Web Data Only”, Penedo et al 2023
“Let’s Verify Step by Step”, Lightman et al 2023
“SeeGULL: A Stereotype Benchmark With Broad Geo-Cultural Coverage Leveraging Generative Models”, Jha et al 2023
“SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models”
“C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models”, Huang et al 2023
“C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models”
“TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”, Eldan & Li 2023
“TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”
“Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”, Kirstain et al 2023
“Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023
“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”
“Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings With Humans and Models”, Wei et al 2023
“Multi-Party Chat (MultiLIGHT): Conversational Agents in Group Settings with Humans and Models”
“ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification”, Taesiri et al 2023
“Parsing-Conditioned Anime Translation: A New Dataset and Method”, Li et al 2023c
“Parsing-Conditioned Anime Translation: A New Dataset and Method”
“Abstraction-Perception Preserving Cartoon Face Synthesis”, Ho et al 2023
“AnimeDiffusion: Anime Face Line Drawing Colorization via Diffusion Models”, Cao et al 2023
“AnimeDiffusion: Anime Face Line Drawing Colorization via Diffusion Models”
“How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023
“How well do Large Language Models perform in Arithmetic tasks?”
“Benchmarks for Automated Commonsense Reasoning: A Survey”, Davis 2023
“The BabyLM Challenge: Sample-efficient Pretraining on a Developmentally Plausible Corpus”, Warstadt et al 2023
“The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus”
“Interactive-Chain-Prompting (INTERCPT): Ambiguity Resolution for Crosslingual Conditional Generation With Interaction”, Pilault et al 2023
“The Semantic Scholar Open Data Platform”, Kinney et al 2023
“How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection”, Guo et al 2023
“How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection”
“Med-PaLM: Large Language Models Encode Clinical Knowledge”, Singhal et al 2022
“Unnatural Instructions: Tuning Language Models With (Almost) No Human Labor”, Honovich et al 2022
“Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor”
“A Whack-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others”, Li et al 2022
“A Whack-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others”
“Text Embeddings by Weakly-Supervised Contrastive Pre-training”, Wang et al 2022
“Text Embeddings by Weakly-Supervised Contrastive Pre-training”
“The Stack: 3 TB of Permissively Licensed Source Code”, Kocetkov et al 2022
“UniSumm: Unified Few-shot Summarization With Multi-Task Pre-Training and Prefix-Tuning”, Chen et al 2022
“UniSumm: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuning”
“A Creative Industry Image Generation Dataset Based on Captions”, Yuejia et al 2022
“A Creative Industry Image Generation Dataset Based on Captions”
“AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities”, Chen et al 2022
“AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities”
“MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation”, Feng et al 2022
“MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation”
“AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies”, Siyao et al 2022
“AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies”
“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”, Muennighoff et al 2022
“BLOOMZ/mT0: Crosslingual Generalization through Multitask Finetuning”
“Dungeons and Data: A Large-Scale NetHack Dataset”, Hambro et al 2022
“Will We Run out of Data? An Analysis of the Limits of Scaling Datasets in Machine Learning”, Villalobos et al 2022
“Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning”
“Large Language Models Can Self-Improve”, Huang et al 2022
“CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, Castricato et al 2022
“CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”
“MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
“Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Roush et al 2022
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022
“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Lu et al 2022
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”
“Brain Imaging Generation With Latent Diffusion Models”, Pinaya et al 2022
“PaLI: A Jointly-Scaled Multilingual Language-Image Model”, Chen et al 2022
“FOLIO: Natural Language Reasoning With First-Order Logic”, Han et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”
“Bugs in the Data: How ImageNet Misrepresents Biodiversity”, Luccioni & Rolnick 2022
“Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Wiles et al 2022
“Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning”
“Benchmarking Compositionality With Formal Languages”, Valvoda et al 2022
“Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, Nguyen et al 2022
“Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”
“Learning to Generalize With Object-centric Agents in the Open World Survival Game Crafter”, Stanić et al 2022
“Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter”
“Few-shot Adaptation Works With UnpredicTable Data”, Chan et al 2022
“Language Models Can Teach Themselves to Program Better”, Haluptzok et al 2022
“RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022
“NewsStories: Illustrating Articles With Visual Summaries”, Tan et al 2022
“CelebV-HQ: A Large-Scale Video Facial Attributes Dataset”, Zhu et al 2022
“Why Do Tree-based Models Still Outperform Deep Learning on Tabular Data?”, Grinsztajn et al 2022
“Why do tree-based models still outperform deep learning on tabular data?”
“Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset”, Henderson et al 2022
“Forecasting Future World Events With Neural Networks”, Zou et al 2022
“RST: ReStructured Pre-training”, Yuan & Liu 2022
“Learning to Generate Artistic Character Line Drawing”, Fang et al 2022
“Dataset Condensation via Efficient Synthetic-Data Parameterization”, Kim et al 2022
“Dataset Condensation via Efficient Synthetic-Data Parameterization”
“Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions”, Jiang et al 2022
“Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions”
“Fine-grained Image Captioning With CLIP Reward”, Cho et al 2022
“InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning”, Gupta et al 2022
“InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning”
“Learning to Model Editing Processes”, Reid & Neubig 2022
“Flexible Diffusion Modeling of Long Videos”, Harvey et al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”, Honovich et al 2022
“Instruction Induction: From Few Examples to Natural Language Task Descriptions”
“Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
“Housekeep: Tidying Virtual Households using Commonsense Reasoning”
“Down and Across: Introducing Crossword-Solving As a New NLP Benchmark”, Kulshreshtha et al 2022
“Down and Across: Introducing Crossword-Solving as a New NLP Benchmark”
“Automated Crossword Solving”, Wallace et al 2022
“Dialog Inpainting: Turning Documents into Dialogues”, Dai et al 2022
“SymphonyNet: Symphony Generation With Permutation Invariant Language Model”, Liu et al 2022
“SymphonyNet: Symphony Generation with Permutation Invariant Language Model”
“When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, Vasudevan et al 2022
“When does dough become a bagel? Analyzing the remaining mistakes on ImageNet”
“Building Machine Translation Systems for the Next Thousand Languages”, Bapna et al 2022
“Building Machine Translation Systems for the Next Thousand Languages”
“Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”, Fang et al 2022
“Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”
“A Challenging Benchmark of Anime Style Recognition”, Li et al 2022
“Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, Wang et al 2022
“Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”
“Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality”, Thrush et al 2022
“Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality”
“ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Zhu et al 2022
“ByT5 model for massively multilingual grapheme-to-phoneme conversion”
“KNN-Diffusion: Image Generation via Large-Scale Retrieval”, Ashual et al 2022
“STaR: Bootstrapping Reasoning With Reasoning”, Zelikman et al 2022
“CLIP Meets GamePhysics: Towards Bug Identification in Gameplay Videos Using Zero-shot Transfer Learning”, Taesiri et al 2022
“Bamboo: Building Mega-Scale Vision Dataset Continually With Human-Machine Synergy”, Zhang et al 2022
“Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy”
“Self-Distilled StyleGAN: Towards Generation from Internet Photos”, Mokady et al 2022
“Self-Distilled StyleGAN: Towards Generation from Internet Photos”
“RuCLIP—new Models and Experiments: a Technical Report”, Shonenkov et al 2022
“Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework”, Gu et al 2022
“ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
“DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, Cho et al 2022
“DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”
“StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets”, Sauer et al 2022
“Can Wikipedia Help Offline Reinforcement Learning?”, Reid et al 2022
“BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation”, Li et al 2022
“SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, Singh et al 2022
“SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”
“CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities”, Lee et al 2022
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”
“SynthBio: A Case Study in Faster Curation of Text Datasets”, Yuan et al 2022
“SynthBio: A Case Study in Faster Curation of Text Datasets”
“BigDatasetGAN: Synthesizing ImageNet With Pixel-wise Annotations”, Li et al 2022
“BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations”
“ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Zhang et al 2021
“ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”
“A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision”, Tejankar et al 2021
“A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision”
“GLIDE: Towards Photorealistic Image Generation and Editing With Text-Guided Diffusion Models”, Nichol et al 2021
“GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models”
“WebGPT: Browser-assisted Question-answering With Human Feedback”, Nakano et al 2021
“WebGPT: Browser-assisted question-answering with human feedback”
“Models in the Loop: Aiding Crowdworkers With Generative Annotation Assistants”, Bartolo et al 2021
“Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants”
“FRUIT: Faithfully Reflecting Updated Information in Text”, IV et al 2021
“GLaM: Efficient Scaling of Language Models With Mixture-of-Experts”, Du et al 2021
“GLaM: Efficient Scaling of Language Models with Mixture-of-Experts”
“MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions”, Soldan et al 2021
“MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions”
“BASIC: Combined Scaling for Open-Vocabulary Image Classification”, Pham et al 2021
“BASIC: Combined Scaling for Open-Vocabulary Image Classification”
“Few-Shot Self-Rationalization With Natural Language Prompts”, Marasović et al 2021
“Few-Shot Self-Rationalization with Natural Language Prompts”
“Solving Probability and Statistics Problems by Program Synthesis”, Tang et al 2021
“Solving Probability and Statistics Problems by Program Synthesis”
“AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment”, Kim et al 2021
“AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment”
“RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning”, Ramos et al 2021
“RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning”
“LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Schuhmann et al 2021
“LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”
“An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
“An Explanation of In-context Learning as Implicit Bayesian Inference”
“Training Verifiers to Solve Math Word Problems”, Cobbe et al 2021
“A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-dependent Action Selection”, Hulse et al 2021
“HTCN: Harmonious Text Colorization Network for Visual-Textual Presentation Design”, Yang et al 2021c
“HTCN: Harmonious Text Colorization Network for Visual-Textual Presentation Design”
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”, Sanh et al 2021
“T0: Multitask Prompted Training Enables Zero-Shot Task Generalization”
“Can Machines Learn Morality? The Delphi Experiment”, Jiang et al 2021
“Situated Dialogue Learning through Procedural Environment Generation”, Ammanabrolu et al 2021
“Situated Dialogue Learning through Procedural Environment Generation”
“MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research”, Samvelyan et al 2021
“MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research”
“TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Lin et al 2021
“MiniF2F: a Cross-system Benchmark for Formal Olympiad-level Mathematics”, Zheng et al 2021
“MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics”
“LAION-400-Million Open Dataset”, Schuhmann 2021
“Transfer Learning for Pose Estimation of Illustrated Characters”, Chen & Zwicker 2021
“Transfer Learning for Pose Estimation of Illustrated Characters”
“MuSiQue: Multi-hop Questions via Single-hop Question Composition”, Trivedi et al 2021
“MuSiQue: Multi-hop Questions via Single-hop Question Composition”
“Scaling Vision Transformers”, Zhai et al 2021
“XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond”, Barbieri et al 2021
“XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond”
“SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”, Chan et al 2021
“SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network”
“Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks”, Northcutt et al 2021
“Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks”
“NaturalProofs: Mathematical Theorem Proving in Natural Language”, Welleck et al 2021
“NaturalProofs: Mathematical Theorem Proving in Natural Language”
“Get Your Vitamin C! Robust Fact Verification With Contrastive Evidence (VitaminC)”, Schuster et al 2021
“Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence (VitaminC)”
“Are NLP Models Really Able to Solve Simple Math Word Problems?”, Patel et al 2021
“Are NLP Models really able to Solve Simple Math Word Problems?”
“Measuring Mathematical Problem Solving With the MATH Dataset”, Hendrycks et al 2021
“Measuring Mathematical Problem Solving With the MATH Dataset”
“WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning”, Srinivasan et al 2021
“WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning”
“A Massive 7T FMRI Dataset to Bridge Cognitive and Computational Neuroscience”, Allen et al 2021
“A massive 7T fMRI dataset to bridge cognitive and computational neuroscience”
“Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”, Changpinyo et al 2021
“Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts”
“ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, Jia et al 2021
“ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, Lazaridou et al 2021
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”
“Scaling Laws for Transfer”, Hernandez et al 2021
“MSR-VTT: A Large Video Description Dataset for Bridging Video and Language”, Xu et al 2021
“MSR-VTT: A Large Video Description Dataset for Bridging Video and Language”
“CLIP: Connecting Text and Images: We’re Introducing a Neural Network Called CLIP Which Efficiently Learns Visual Concepts from Natural Language Supervision. CLIP Can Be Applied to Any Visual Classification Benchmark by Simply Providing the Names of the Visual Categories to Be Recognized, Similar to the ‘zero-shot’ Capabilities of GPT-2 and GPT-3”, Radford et al 2021
“CLIP: Learning Transferable Visual Models From Natural Language Supervision”, Radford et al 2021
“CLIP: Learning Transferable Visual Models From Natural Language Supervision”
“The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, Gao et al 2021
“The Pile: An 800GB Dataset of Diverse Text for Language Modeling”
“MoGaze: A Dataset of Full-Body Motions That Includes Workspace Geometry and Eye-Gaze”, Kratzer et al 2020
“MoGaze: A Dataset of Full-Body Motions that Includes Workspace Geometry and Eye-Gaze”
“End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks”, Xue 2020
“End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks”
“Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps”, Ho et al 2020
“Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps”
“Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, Caswell et al 2020
“Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”
“Open-Domain Question Answering Goes Conversational via Question Rewriting”, Anantha et al 2020
“Open-Domain Question Answering Goes Conversational via Question Rewriting”
“Digital Voicing of Silent Speech”, Gaddy & Klein 2020
“Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing”, Gu et al 2020
“Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing”
“The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization”, Hendrycks et al 2020
“The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization”
“The NetHack Learning Environment”, Küttler et al 2020
“Anime Crop Datasets: Faces, Figures, & Hands”, Branwen et al 2020
“ForecastQA: A Question Answering Challenge for Event Forecasting With Temporal Text Data”, Jin et al 2020
“ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data”
“Shortcut Learning in Deep Neural Networks”, Geirhos et al 2020
“D4RL: Datasets for Deep Data-Driven Reinforcement Learning”, Fu et al 2020
“D4RL: Datasets for Deep Data-Driven Reinforcement Learning”
“TyDiQA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages”, Clark et al 2020
“TyDiQA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages”
“SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded from the Infant’s Perspective”, Sullivan et al 2020
“SAYCam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective”
“ImageNet-A: Natural Adversarial Examples”, Hendrycks et al 2020
“Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”, Keysers et al 2019
“Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”
“Libri-Light: A Benchmark for ASR With Limited or No Supervision”, Kahn et al 2019
“Libri-Light: A Benchmark for ASR with Limited or No Supervision”
“How Can We Know What Language Models Know?”, Jiang et al 2019
“SimpleBooks: Long-term Dependency Book Dataset With Simplified English Vocabulary for Word-level Language Modeling”, Nguyen 2019
“Compressive Transformers for Long-Range Sequence Modelling”, Rae et al 2019
“Compressive Transformers for Long-Range Sequence Modelling”
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”
“CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data”, Wenzek et al 2019
“CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data”
“T5: Exploring the Limits of Transfer Learning With a Unified Text-to-Text Transformer”, Raffel et al 2019
“T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”
“Restoring Ancient Text Using Deep Learning (Pythia): a Case Study on Greek Epigraphy”, Assael et al 2019
“Restoring ancient text using deep learning (Pythia): a case study on Greek epigraphy”
“CATER: A Diagnostic Dataset for Compositional Actions and TEmporal Reasoning”, Girdhar & Ramanan 2019
“CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning”
“PubMedQA: A Dataset for Biomedical Research Question Answering”, Jin et al 2019
“PubMedQA: A Dataset for Biomedical Research Question Answering”
“ObjectNet: A Large-scale Bias-controlled Dataset for Pushing the Limits of Object Recognition Models”, Barbu et al 2019
“No Press Diplomacy: Modeling Multi-Agent Gameplay”, Paquette et al 2019
“LVIS: A Dataset for Large Vocabulary Instance Segmentation”, Gupta et al 2019
“LVIS: A Dataset for Large Vocabulary Instance Segmentation”
“Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Socher et al 2019
“Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”
“A Large Single-participant FMRI Dataset for Probing Brain Responses to Naturalistic Stimuli in Space and Time”, Seeliger et al 2019
“OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge”, Marino et al 2019
“OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge”
“ImageNet-Sketch: Learning Robust Global Representations by Penalizing Local Predictive Power”, Wang et al 2019
“ImageNet-Sketch: Learning Robust Global Representations by Penalizing Local Predictive Power”
“SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems”, Wang et al 2019
“SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems”
“The MineRL 2019 Competition on Sample Efficient Reinforcement Learning Using Human Priors”, Guss et al 2019
“The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors”
“ProductNet: a Collection of High-Quality Datasets for Product Representation Learning”, Wang et al 2019
“ProductNet: a Collection of High-Quality Datasets for Product Representation Learning”
“Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”, Hendrycks & Dietterich 2019
“Benchmarking Neural Network Robustness to Common Corruptions and Perturbations”
“LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game”, Urbanek et al 2019
“LIGHT: Learning to Speak and Act in a Fantasy Text Adventure Game”
“A Replication Study: Machine Learning Models Are Capable of Predicting Sexual Orientation From Facial Images”, Leuner 2019
“Language Models Are Unsupervised Multitask Learners”, Radford et al 2019
“The Omniglot Challenge: a 3-year Progress Report”, Lake et al 2019
“Do We Train on Test Data? Purging CIFAR of Near-Duplicates”, Barz & Denzler 2019
“Do We Train on Test Data? Purging CIFAR of Near-Duplicates”
“FIGR: Few-shot Image Generation With Reptile”, Clouâtre & Demers 2019
“Natural Questions: A Benchmark for Question Answering Research”, Kwiatkowski et al 2019
“Natural Questions: A Benchmark for Question Answering Research”
“A Style-Based Generator Architecture for Generative Adversarial Networks”, Karras et al 2018
“A Style-Based Generator Architecture for Generative Adversarial Networks”
“ImageNet-trained CNNs Are Biased towards Texture; Increasing Shape Bias Improves Accuracy and Robustness”, Geirhos et al 2018
“The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale”, Kuznetsova et al 2018
“CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge”, Talmor et al 2018
“CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge”
“HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering”, Yang et al 2018
“HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering”
“Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization”, Narayan et al 2018
“A Short Note about Kinetics-600”, Carreira et al 2018
“CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, Guo et al 2018
“CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”
“Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”, Hendrycks & Dietterich 2018
“Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations”
“Know What You Don’t Know: Unanswerable Questions for SQuAD”, Rajpurkar et al 2018
“Know What You Don’t Know: Unanswerable Questions for SQuAD”
“BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”, Yu et al 2018
“BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”
“Exploring the Limits of Weakly Supervised Pretraining”, Mahajan et al 2018
“Newsroom: A Dataset of 1.3 Million Summaries With Diverse Extractive Strategies”, Grusky et al 2018
“Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies”
“GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding”, Wang et al 2018
“GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding”
“The Sound of Pixels”, Zhao et al 2018
“Think You Have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge”, Clark et al 2018
“Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge”
“FEVER: a Large-scale Dataset for Fact Extraction and VERification”, Thorne et al 2018
“FEVER: a large-scale dataset for Fact Extraction and VERification”
“SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction”, Liang et al 2018
“SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction”
“11K Hands: Gender Recognition and Biometric Identification Using a Large Dataset of Hand Images”, Afifi 2017
“11K Hands: Gender recognition and biometric identification using a large dataset of hand images”
“WebVision Database: Visual Learning and Understanding from Web Data”, Li et al 2017
“WebVision Database: Visual Learning and Understanding from Web Data”
“A Downsampled Variant of ImageNet As an Alternative to the CIFAR Datasets”, Chrabaszcz et al 2017
“A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets”
“Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”, Sun et al 2017
“Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”
“Driver Identification Using Automobile Sensor Data from a Single Turn”, Hallac et al 2017
“Driver Identification Using Automobile Sensor Data from a Single Turn”
“The Kinetics Human Action Video Dataset”, Kay et al 2017
“TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”, Joshi et al 2017
“TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”
“Dense-Captioning Events in Videos”, Krishna et al 2017
“BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography”, Wilber et al 2017
“BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography”
“SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Dunn et al 2017
“SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine”
“RACE: Large-scale ReAding Comprehension Dataset From Examinations”, Lai et al 2017
“RACE: Large-scale ReAding Comprehension Dataset From Examinations”
“NewsQA: A Machine Comprehension Dataset”, Trischler et al 2016
“MS MARCO: A Human Generated MAchine Reading COmprehension Dataset”, Bajaj et al 2016
“MS MARCO: A Human Generated MAchine Reading COmprehension Dataset”
“Pointer Sentinel Mixture Models”, Merity et al 2016
“Solving General Arithmetic Word Problems”, Roy & Roth 2016
“The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context”, Paperno et al 2016
“The LAMBADA dataset: Word prediction requiring a broad discourse context”
“SQuAD: 100,000+ Questions for Machine Comprehension of Text”, Rajpurkar et al 2016
“SQuAD: 100,000+ Questions for Machine Comprehension of Text”
“Convolutional Sketch Inversion”, Güçlütürk et al 2016
“Danbooru2021: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset”, Gwern 2015
“Danbooru2021: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset”
“Neural Module Networks”, Andreas et al 2015
“Sketch-based Manga Retrieval Using Manga109 Dataset”, Matsui et al 2015
“Amazon Reviews: Image-based Recommendations on Styles and Substitutes”, McAuley et al 2015
“Amazon Reviews: Image-based Recommendations on Styles and Substitutes”
“LSUN: Construction of a Large-scale Image Dataset Using Deep Learning With Humans in the Loop”, Yu et al 2015
“LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop”
“Teaching Machines to Read and Comprehend”, Hermann et al 2015
“YFCC100M: The New Data in Multimedia Research”, Thomee et al 2015
“ImageNet Large Scale Visual Recognition Challenge”, Russakovsky et al 2014
“Microsoft COCO: Common Objects in Context”, Lin et al 2014
“N-gram Counts and Language Models from the Common Crawl”, Buck et al 2014
“UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild”, Soomro et al 2012
“UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild”
“The Caltech-UCSD Birds-200-2011 Dataset”, Wah et al 2011
“Unbiased Look at Dataset Bias”, Torralba & Efros 2011
“Caltech-UCSD Birds 200”, Welinder et al 2010
“Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments”, Huang et al 2008
“Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments”
“Solving Math Word Problems: We’ve Trained a System That Solves Grade School Math Problems With Nearly Twice the Accuracy of a Fine-tuned GPT-3 Model. It Solves about 90% As Many Problems As Real Kids: a Small Sample of 9-12 Year Olds Scored 60% on a Test from Our Dataset, While Our System Scored 55% on Those Same Problems. This Is Important Because Today’s AI Is Still Quite Weak at Commonsense Multistep Reasoning, Which Is Easy Even for Grade School Kids. We Achieved These Results by Training Our Model to Recognize Its Mistakes, so That It Can Try Repeatedly Until It Finds a Solution That Works”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
nlp
multimodal
datalarge
Wikipedia
Miscellaneous
-
/doc/ai/dataset/2023-pilaut-figure1-interactivechainpromptingforqaabouttranslationambiguities.png
-
/doc/ai/dataset/2020-caswell-table2-examplesofmisleadingtextlanguageassociations.png
-
http://cl-informatik.uibk.ac.at/cek/holstep/ckfccs-holstep-submitted.pdf
-
https://openaccess.thecvf.com/content_cvpr_2014/papers/Andriluka_2D_Human_Pose_2014_CVPR_paper.pdf
-
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37648.pdf
-
https://twitter.com/felix_red_panda/status/1723324786808692887
Link Bibliography
-
https://arxiv.org/abs/2310.06786
: “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba -
https://arxiv.org/abs/2310.03214#google
: “FreshLLMs: Refreshing Large Language Models With Search Engine Augmentation”, -
https://arxiv.org/abs/2309.16671
: “Demystifying CLIP Data”, -
https://arxiv.org/abs/2309.12269
: “The Cambridge Law Corpus: A Corpus for Legal AI Research”, -
https://arxiv.org/abs/2309.04269
: “From Sparse to Dense: GPT-4 Summarization With Chain of Density (CoD) Prompting”, Griffin Adams, Alexander Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad -
https://arxiv.org/abs/2308.12477
: “American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers”, -
https://arxiv.org/abs/2307.08701#samsung
: “AlpaGasus: Training A Better Alpaca With Fewer Data”, -
https://arxiv.org/abs/2307.05014
: “Test-Time Training on Video Streams”, Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang -
https://arxiv.org/abs/2306.12587
: “ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews”, Mike D’Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey -
https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
: “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—and Not Going Anywhere”, Josh Dzieza -
2023-yi.pdf
: “Anime Character Identification and Tag Prediction by Multimodality Modeling: Dataset and Model”, Fan Yi, Jiaxiang Wu, Minyi Zhao, Shuigeng Zhou -
https://www.theinformation.com/articles/why-youtube-could-give-google-an-edge-in-ai
: “Why YouTube Could Give Google an Edge in AI”, Jon Victor -
https://arxiv.org/abs/2305.20050#openai
: “Let’s Verify Step by Step”, -
https://arxiv.org/abs/2305.07759#microsoft
: “TinyStories: How Small Can Language Models Be and Still Speak Coherent English?”, Ronen Eldan, Yuanzhi Li -
https://arxiv.org/abs/2305.01569
: “Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”, Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy -
https://arxiv.org/abs/2304.05538
: “ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification”, Mohammad Reza Taesiri, Giang Nguyen, Sarra Habchi, Cor-Paul Bezemer, Anh Nguyen -
https://arxiv.org/abs/2304.02015#alibaba
: “How Well Do Large Language Models Perform in Arithmetic Tasks?”, Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang -
https://arxiv.org/abs/2212.13138#google
: “Med-PaLM: Large Language Models Encode Clinical Knowledge”, -
https://arxiv.org/abs/2212.03533#microsoft
: “Text Embeddings by Weakly-Supervised Contrastive Pre-training”, Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei -
https://arxiv.org/abs/2211.15533
: “The Stack: 3 TB of Permissively Licensed Source Code”, -
https://arxiv.org/abs/2211.06679#baai
: “AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities”, Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu -
https://arxiv.org/abs/2210.11610#google
: “Large Language Models Can Self-Improve”, Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han -
https://arxiv.org/abs/2210.07792#eleutherai
: “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, -
https://aclanthology.org/2022.cai-1.2.pdf
: “Most Language Models Can Be Poets Too: An AI Writing Assistant and Constrained Text Generation Studio”, Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy -
https://arxiv.org/abs/2210.03350#allen
: “Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis -
https://arxiv.org/abs/2209.00840
: “FOLIO: Natural Language Reasoning With First-Order Logic”, -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2208.08831#deepmind
: “Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Olivia Wiles, Isabela Albuquerque, Sven Gowal -
https://arxiv.org/abs/2208.05516
: “Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP”, Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt -
https://arxiv.org/abs/2207.13061
: “NewsStories: Illustrating Articles With Visual Summaries”, Reuben Tan, Bryan A. Plummer, Kate Saenko, J. P. Lewis, Avneesh Sud, Thomas Leung -
https://arxiv.org/abs/2206.15474
: “Forecasting Future World Events With Neural Networks”, -
https://arxiv.org/abs/2205.09665#bair
: “Automated Crossword Solving”, Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein -
https://arxiv.org/abs/2205.09073#google
: “Dialog Inpainting: Turning Documents into Dialogues”, Zhuyun Dai, Arun Tejasvi Chaganty, Vincent Zhao, Aida Amini, Qazi Mamunur Rashid, Mike Green, Kelvin Guu -
https://arxiv.org/abs/2205.04596#google
: “When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet”, Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes, Sara Fridovich-Keil, Rebecca Roelofs -
https://arxiv.org/abs/2205.03983#google
: “Building Machine Translation Systems for the Next Thousand Languages”, -
https://arxiv.org/abs/2205.01397
: “Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)”, Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt -
https://arxiv.org/abs/2204.07705
: “Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks”, -
https://arxiv.org/abs/2204.03067
: “ByT5 Model for Massively Multilingual Grapheme-to-phoneme Conversion”, Jian Zhu, Cong Zhang, David Jurgens -
https://arxiv.org/abs/2203.11096
: “CLIP Meets GamePhysics: Towards Bug Identification in Gameplay Videos Using Zero-shot Transfer Learning”, Mohammad Reza Taesiri, Finlay Macklon, Cor-Paul Bezemer -
https://arxiv.org/abs/2202.12211#google
: “Self-Distilled StyleGAN: Towards Generation from Internet Photos”, Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri -
https://arxiv.org/abs/2202.06767#huawei
: “Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework”, -
https://arxiv.org/abs/2202.00273
: “StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets”, Axel Sauer, Katja Schwarz, Andreas Geiger -
https://arxiv.org/abs/2201.12086#salesforce
: “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation”, Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi -
https://arxiv.org/abs/2201.08371#facebook
: “SWAG: Revisiting Weakly Supervised Pre-Training of Visual Perception Models”, -
https://swabhs.com/assets/pdf/wanli.pdf#allen
: “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi -
https://arxiv.org/abs/2201.04684
: “BigDatasetGAN: Synthesizing ImageNet With Pixel-wise Annotations”, Daiqing Li, Huan Ling, Seung Wook Kim, Karsten Kreis, Adela Barriuso, Sanja Fidler, Antonio Torralba -
https://arxiv.org/abs/2112.15283#baidu
: “ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang -
https://arxiv.org/abs/2112.09332#openai
: “WebGPT: Browser-assisted Question-answering With Human Feedback”, -
https://arxiv.org/abs/2111.10050#google
: “BASIC: Combined Scaling for Open-Vocabulary Image Classification”, -
https://arxiv.org/abs/2111.08267
: “Solving Probability and Statistics Problems by Program Synthesis”, Leonard Tang, Elizabeth Ke, Nikhil Singh, Nakul Verma, Iddo Drori -
https://arxiv.org/abs/2111.02114#laion
: “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, -
https://arxiv.org/abs/2110.14168#openai
: “Training Verifiers to Solve Math Word Problems”, Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman -
https://elifesciences.org/articles/66039
: “A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-dependent Action Selection”, -
https://arxiv.org/abs/2109.07958
: “TruthfulQA: Measuring How Models Mimic Human Falsehoods”, Stephanie Lin, Jacob Hilton, Owain Evans -
https://laion.ai/blog/laion-400-open-dataset/
: “LAION-400-Million Open Dataset”, Christoph Schuhmann -
https://arxiv.org/abs/2106.04560#google
: “Scaling Vision Transformers”, Xiaohua Zhai, Alexander Kolesnikov, Neil Houlsby, Lucas Beyer -
https://arxiv.org/abs/2103.14749
: “Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks”, Curtis G. Northcutt, Anish Athalye, Jonas Mueller -
https://arxiv.org/abs/2102.05918#google
: “ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision”, -
https://arxiv.org/abs/2102.01951#scaling&org=deepmind
: “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling”, -
https://openai.com/research/clip
: “CLIP: Connecting Text and Images: We’re Introducing a Neural Network Called CLIP Which Efficiently Learns Visual Concepts from Natural Language Supervision. CLIP Can Be Applied to Any Visual Classification Benchmark by Simply Providing the Names of the Visual Categories to Be Recognized, Similar to the ‘zero-shot’ Capabilities of GPT-2 and GPT-3”, Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal -
https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language_Supervision.pdf
: “CLIP: Learning Transferable Visual Models From Natural Language Supervision”, -
https://arxiv.org/abs/2101.00027#eleutherai
: “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”, -
https://arxiv.org/abs/2010.14571#google
: “Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus”, Isaac Caswell, Theresa Breiner, Daan van Esch, Ankur Bapna -
crop
: “Anime Crop Datasets: Faces, Figures, & Hands”, Gwern Branwen, Arfafax, Shawn Presser, Anonymous, Danbooru Community -
https://arxiv.org/abs/1911.05507#deepmind
: “Compressive Transformers for Long-Range Sequence Modelling”, Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap -
https://arxiv.org/abs/1905.00537
: “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems”, -
https://arxiv.org/abs/1808.01340#deepmind
: “A Short Note about Kinetics-600”, Joao Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, Andrew Zisserman -
https://arxiv.org/abs/1808.01097
: “CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images”, Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, Dinglong Huang -
2018-sharma.pdf#google
: “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning”, Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut -
https://arxiv.org/abs/1805.00932#facebook
: “Exploring the Limits of Weakly Supervised Pretraining”, -
https://arxiv.org/abs/1707.08819
: “A Downsampled Variant of ImageNet As an Alternative to the CIFAR Datasets”, Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter -
https://arxiv.org/abs/1704.05179
: “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho -
danbooru2021
: “Danbooru2021: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset”, Gwern -
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1097_Paper.pdf
: “N-gram Counts and Language Models from the Common Crawl”, Christian Buck, Kenneth Heafield, Bas van Ooyen -
2011-torralba.pdf
: “Unbiased Look at Dataset Bias”, Antonio Torralba, Alexei A. Efros