Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
Attention versus Contrastive Learning of Tabular Data—A Data-centric Benchmarking
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Sequential Modeling Enables Scalable Learning for Large Vision Models
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation
Finding Neurons in a Haystack: Case Studies with Sparse Probing
TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Closing the gap: Exact maximum likelihood training of generative autoencoders using invertible layers (AEF)
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Autoencoders
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
High-Resolution Image Synthesis with Latent Diffusion Models
VQ-DDM: Global Context with Discrete Diffusion in Vector Quantized Modeling for Image Generation
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons
MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection
NWT: Towards natural audio-to-video generation with representation learning
TSDAE: Using Transformer-based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning
Deep Generative Modeling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models
Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction
DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language
VQ-GAN: Taming Transformers for High-Resolution Image Synthesis
Multimodal dynamics modeling for off-road autonomous vehicles
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy
In-field whole plant maize architecture characterized by Latent Space Phenotyping
Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables
Hierarchical Autoregressive Image Models with Auxiliary Decoders
Practical Lossless Compression with Latent Variables using Bits Back Coding
Neural probabilistic motor primitives for humanoid control
IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis
InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)
The challenge of realistic music generation: modeling raw audio at scale
GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training
XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Discovering objects and their relations from entangled scene representations
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Improving Sampling from Generative Autoencoders with Markov Chains
Language as a Latent Variable: Discrete Generative Models for Sentence Compression
Neural Photo Editing with Introspective Adversarial Networks
Early Visual Concept Learning with Unsupervised Deep Learning
Improving Variational Inference with Inverse Autoregressive Flow
How far can we go without convolution: Improving fully-connected networks
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Building high-level features using large scale unsupervised learning
A Connection Between Score Matching and Denoising Autoencoders
Randomly Traversing the Manifold of Faces (2): Dataset: Labeled Faces in the Wild (LFW); Model: Variational Autoencoder (VAE) / Deep Latent Gaussian Model (DLGM).
2021-zhang-figure4-ernievilggeneratedsamplesopendomain.png
https://transformer-circuits.pub/2024/jan-update/index.html#dict-learning-resampling
1130e7e865e2fceb93dfd2f36bdd430a3015511f.html#dict-learning-resampling
https://www.lesswrong.com/posts/bD4B2MF7nsGAfH9fj/basic-mathematics-of-predictive-coding
https://www.lesswrong.com/posts/wqRqb7h6ZC48iDgfK/tentatively-found-600-monosemantic-features-in-a-small-lm
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
https%253A%252F%252Farxiv.org%252Fabs%252F2404.02905%2523bytedance.html
TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
https%253A%252F%252Farxiv.org%252Fabs%252F2205.04421%2523microsoft.html
TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
https%253A%252F%252Farxiv.org%252Fabs%252F2204.03638%2523facebook.html
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values
https%253A%252F%252Farxiv.org%252Fabs%252F2110.04627%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2201.07520%2523facebook.html
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fclip%252F2022-liu-2.pdf.html
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
https%253A%252F%252Farxiv.org%252Fabs%252F2112.15283%2523baidu.html
High-Resolution Image Synthesis with Latent Diffusion Models
https%253A%252F%252Farxiv.org%252Fabs%252F2106.04615%2523deepmind.html
DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language
https%253A%252F%252Fopenai.com%252Fresearch%252Fdall-e.html
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
https%253A%252F%252Farxiv.org%252Fabs%252F2011.10650%2523openai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2007.03898%2523nvidia.html
https%253A%252F%252Fcdn.openai.com%252Fpapers%252Fjukebox.pdf.html
Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.
https%253A%252F%252Fopenai.com%252Fresearch%252Fjukebox.html
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
https%253A%252F%252Farxiv.org%252Fabs%252F1910.13461%2523facebook.html
https%253A%252F%252Fopenai.com%252Fresearch%252Fhow-ai-training-scales.html
A Connection Between Score Matching and Denoising Autoencoders
%252Fdoc%252Fai%252Fnn%252Fdiffusion%252F2011-vincent.pdf.html
Wikipedia Bibliography: