CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Contrastive Representation Learning: A Framework and Review
Multi-trait analysis of genome-wide association summary statistics using MTAG
Assessing the Big Five personality traits using real-life static facial images
https://www.lesswrong.com/posts/K7AyY8LMrcKhwfbyj/no-really-attention-is-all-you-need-attention-can-do
Chinchilla: Training Compute-Optimal Large Language Models
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Abandoning Objectives: Evolution Through the Search for Novelty Alone
SDXL § Micro-Conditioning: Conditioning the Model on Image Size
GPT-2 Preference Learning for Music Generation § Optimization by Backprop, Not Blackbox
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
Progressive Growing of GANs for Improved Quality, Stability, and Variation
not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
dynamic-evaluation#scaling-laws
Virtual comments: idea for LLM support for writing LessWrong posts
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Co-Writing Screenplays and Theatre Scripts with Language Models (Dramatron): An Evaluation by Industry Professionals
Calculating The Gaussian Expected Maximum § Probability of Bivariate Maximum
The Relationship Of Validity Coefficients To The Practical Effectiveness Of Tests In Selection: Discussion And Tables
Chaff Bugs: Deterring Attackers by Making Software Buggier
Activation Addition: Steering Language Models Without Optimization
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
The Art of the Shadow: How Painters Have Gotten It Wrong for Centuries [From The Visual World of Shadows]
Analytic and Algorithmic Solution of Random Satisfiability Problems
DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
gpt-2-preference-learning#differentiable-sorting
Unsupervised Neural Machine Translation with Generative Language Models Only
https://www.crosslabs.org/blog/diffusion-with-offset-noise
Progressive Distillation for Fast Sampling of Diffusion Models
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Rectified Flow: A Marginal Preserving Approach to Optimal Transport
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Absolute Unit NNs: Regression-Based MLPs for Everything § Memorize All The Things
DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
A Simple Framework for Contrastive Learning of Visual Representations
Training GANs with Stronger Augmentations via Contrastive Discriminator (ContraD)
Self-conditioned Image Generation via Generating Representations
Stochastic Weight Averaging and the Ornstein-Uhlenbeck Process
Connecting Generative Adversarial Networks and Actor-Critic Methods
Making Anime Faces With StyleGAN § Reversing StyleGAN To Control & Modify Images
Policy Learning and Evaluation with Randomized Quasi-Monte Carlo
Top-K Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks
Generator Knows What Discriminator Should Learn in Unconditional GANs
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks
Sem-GAN: Semantically-Consistent Image-to-Image Translation
Improving Shape Deformation in Unsupervised Image-to-Image Translation
A U-Net Based Discriminator for Generative Adversarial Networks
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
[D] RL: GANs As MCTS Environment Simulator for Deep Model-Based Planning?
The Shattered Gradients Problem: If resnets are the answer, then what is the question?
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Data-dependent Initializations of Convolutional Neural Networks
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
SMASH: One-Shot Model Architecture Search through HyperNetworks
https://www.lesswrong.com/posts/2JJtxitp6nqu6ffak/basic-facts-about-language-models-during-training-1#M3wsmwiGBCxd4dHHW
GPT-2 Preference Learning for Music Generation § Bradley-Terry Preference Learning
GPT-2 Preference Learning for Music Generation § Decision Transformers: Preference Learning As Simple As Possible
Diversifying AI: Towards Creative Chess with AlphaZero (AZdb)
XLNet: Generalized Autoregressive Pretraining for Language Understanding
https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation
SolidGoldMagikarp II: Technical Details and More Recent Findings
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
scaling-hypothesis#blessings-of-scale
Computer Optimization: Your Computer Is Faster Than You Think § DL
Motion Planning for Dynamic Knotting of a Flexible Rope with a High-speed Robot Arm
Motion Planning for Dynamic Folding of a Cloth with Two High-Speed Robot Hands and Two High-Speed Sliders
The Surprising Number of American Adults Who Think Chocolate Milk Comes from Brown Cows
https://www.juliansanchez.com/2009/12/08/the-redactors-dilemma/
Formal Theory of Creativity & Fun & Intrinsic Motivation (1990–2010)
Magic, Explanations, and Evil: The Origins and Design of Witches and Sorcerers [and replies]