- See Also
- Gwern
-
Links
- “Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99%”, Zhu et al 2024
- “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024
- “Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data”, Gerstgrasser et al 2024
- “Attention versus Contrastive Learning of Tabular Data—A Data-Centric Benchmarking”, Rabbani et al 2024
- “GIVT: Generative Infinite-Vocabulary Transformers”, Tschannen et al 2023
- “Sequential Modeling Enables Scalable Learning for Large Vision Models”, Bai et al 2023
- “Finite Scalar Quantization (FSQ): VQ-VAE Made Simple”, Mentzer et al 2023
- “DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation”, Duan et al 2023
- “Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
- “TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023
- “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, Zhao et al 2023
- “Bridging Discrete and Backpropagation: Straight-Through and Beyond”, Liu et al 2023
- “Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder”, Valin et al 2022
- “IRIS: Transformers Are Sample-Efficient World Models”, Micheli et al 2022
- “Understanding Diffusion Models: A Unified Perspective”, Luo 2022
- “Vector Quantized Image-To-Image Translation”, Chen et al 2022
- “Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022
- “UViM: A Unified Modeling Approach for Vision With Learned Guiding Codes”, Kolesnikov et al 2022
- “Closing the Gap: Exact Maximum Likelihood Training of Generative Autoencoders Using Invertible Layers (AEF)”, Silvestri et al 2022
- “AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars”, Hong et al 2022
- “AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling”, Tu et al 2022
- “NaturalSpeech: End-To-End Text to Speech Synthesis With Human-Level Quality”, Tan et al 2022
- “VQGAN-CLIP: Open Domain Image Generation and Editing With Natural Language Guidance”, Crowson et al 2022
- “TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer”, Ge et al 2022
- “Diffusion Probabilistic Modeling for Video Generation”, Yang et al 2022
- “Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values”, Humayun et al 2022
- “Vector-Quantized Image Modeling With Improved VQGAN”, Yu et al 2022
- “Variational Autoencoders Without the Variation”, Daly et al 2022
- “Truncated Diffusion Probabilistic Models and Diffusion-Based Adversarial Autoencoders”, Zheng et al 2022
- “MLR: A Model of Working Memory for Latent Representations”, Hedayati et al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022
- “Design Guidelines for Prompt Engineering Text-To-Image Generative Models”, Liu & Chilton 2022b
- “DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents”, Pandey et al 2022
- “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021
- “High-Resolution Image Synthesis With Latent Diffusion Models”, Rombach et al 2021
- “Discovering State Variables Hidden in Experimental Data”, Chen et al 2021
- “VQ-DDM: Global Context With Discrete Diffusion in Vector Quantized Modeling for Image Generation”, Hu et al 2021
- “Vector Quantized Diffusion Model for Text-To-Image Synthesis”, Gu et al 2021
- “Passive Non-Line-Of-Sight Imaging Using Optimal Transport”, Geng et al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021
- “Unsupervised Deep Learning Identifies Semantic Disentanglement in Single Inferotemporal Face Patch Neurons”, Higgins et al 2021
- “Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021
- “Illiterate DALL·E Learns to Compose”, Singh et al 2021
- “MeLT: Message-Level Transformer With Masked Document Representations As Pre-Training for Stance Detection”, Matero et al 2021
- “Score-Based Generative Modeling in Latent Space”, Vahdat et al 2021
- “NWT: Towards Natural Audio-To-Video Generation With Representation Learning”, Mama et al 2021
- “Vector Quantized Models for Planning”, Ozair et al 2021
- “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021
- “TSDAE: Using Transformer-Based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning”, Wang et al 2021
- “Symbolic Music Generation With Diffusion Models”, Mittal et al 2021
- “Deep Generative Modeling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models”, Bond-Taylor et al 2021
- “Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, Wu et al 2021
- “CW-VAE: Clockwork Variational Autoencoders”, Saxena et al 2021
- “Denoising Diffusion Implicit Models”, Song et al 2021
- “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021
- “VQ-GAN: Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020
- “Multimodal Dynamics Modeling for Off-Road Autonomous Vehicles”, Tremblay et al 2020
- “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, Child 2020
- “NVAE: A Deep Hierarchical Variational Autoencoder”, Vahdat & Kautz 2020
- “Jukebox: A Generative Model for Music”, Dhariwal et al 2020
- “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Dhariwal et al 2020
- “RL Agents Implicitly Learning Human Preferences”, Wichers 2020
- “Encoding Musical Style With Transformer Autoencoders”, Choi et al 2019
- “Generating Furry Face Art from Sketches Using a GAN”, Yu 2019
- “BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension”, Lewis et al 2019
- “Bayesian Parameter Estimation Using Conditional Variational Autoencoders for Gravitational-Wave Astronomy”, Gabbard et al 2019
- “In-Field Whole Plant Maize Architecture Characterized by Latent Space Phenotyping”, Gage et al 2019
- “Generating Diverse High-Fidelity Images With VQ-VAE-2”, Razavi et al 2019
- “Bit-Swap: Recursive Bits-Back Coding for Lossless Compression With Hierarchical Latent Variables”, Kingma et al 2019
- “Hierarchical Autoregressive Image Models With Auxiliary Decoders”, Fauw et al 2019
- “Practical Lossless Compression With Latent Variables Using Bits Back Coding”, Townsend et al 2019
- “An Empirical Model of Large-Batch Training”, McCandlish et al 2018
- “How AI Training Scales”, McCandlish et al 2018
- “Neural Probabilistic Motor Primitives for Humanoid Control”, Merel et al 2018
- “Piano Genie”, Donahue et al 2018
- “IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis”, Huang et al 2018
- “InfoNCE: Representation Learning With Contrastive Predictive Coding (CPC)”, Oord et al 2018
- “The Challenge of Realistic Music Generation: Modeling Raw Audio at Scale”, Dieleman et al 2018
- “Self-Net: Lifelong Learning via Continual Self-Modeling”, Camp et al 2018
- “GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training”, Akcay et al 2018
- “XGAN: Unsupervised Image-To-Image Translation for Many-To-Many Mappings”, Royer et al 2017
- “VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017
- “Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration”, Rahmatizadeh et al 2017
- “Β-VAE: Learning Basic Visual Concepts With a Constrained Variational Framework”, Higgins et al 2017
- “Neural Audio Synthesis of Musical Notes With WaveNet Autoencoders”, Engel et al 2017
- “Prediction and Control With Temporal Segment Models”, Mishra et al 2017
- “Discovering Objects and Their Relations from Entangled Scene Representations”, Raposo et al 2017
- “Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016
- “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016
- “Improving Sampling from Generative Autoencoders With Markov Chains”, Creswell et al 2016
- “Language As a Latent Variable: Discrete Generative Models for Sentence Compression”, Miao & Blunsom 2016
- “Neural Photo Editing With Introspective Adversarial Networks”, Brock et al 2016
- “Early Visual Concept Learning With Unsupervised Deep Learning”, Higgins et al 2016
- “Improving Variational Inference With Inverse Autoregressive Flow”, Kingma et al 2016
- “How Far Can We Go without Convolution: Improving Fully-Connected Networks”, Lin et al 2015
- “Semi-Supervised Sequence Learning”, Dai & Le 2015
- “MADE: Masked Autoencoder for Distribution Estimation”, Germain et al 2015
- “Analyzing Noise in Autoencoders and Deep Networks”, Poole et al 2014
- “Stochastic Backpropagation and Approximate Inference in Deep Generative Models”, Rezende et al 2014
- “Auto-Encoding Variational Bayes”, Kingma & Welling 2013
- “Building High-Level Features Using Large Scale Unsupervised Learning”, Le et al 2011
- “A Connection Between Score Matching and Denoising Autoencoders”, Vincent 2011
- “Reducing the Dimensionality of Data With Neural Networks”, Hinton & Salakhutdinov 2006
- “Generating Large Images from Latent Vectors”, Ha 2024
- “Transformers As Variational Autoencoders”
- “Randomly Traversing the Manifold of Faces (2): Dataset: Labeled Faces in the Wild (LFW); Model: Variational Autoencoder (VAE) / Deep Latent Gaussian Model (DLGM).”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Gwern
“Anime Neural Net Graveyard”, Gwern 2019
Links
“Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99%”, Zhu et al 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
“Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction”, Tian et al 2024
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
“Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data”, Gerstgrasser et al 2024
“Attention versus Contrastive Learning of Tabular Data—A Data-Centric Benchmarking”, Rabbani et al 2024
Attention versus Contrastive Learning of Tabular Data—A Data-centric Benchmarking
“GIVT: Generative Infinite-Vocabulary Transformers”, Tschannen et al 2023
“Sequential Modeling Enables Scalable Learning for Large Vision Models”, Bai et al 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
“Finite Scalar Quantization (FSQ): VQ-VAE Made Simple”, Mentzer et al 2023
“DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation”, Duan et al 2023
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation
“Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
“TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, Ghosal et al 2023
TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
“ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, Zhao et al 2023
ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
“Bridging Discrete and Backpropagation: Straight-Through and Beyond”, Liu et al 2023
Bridging Discrete and Backpropagation: Straight-Through and Beyond
“Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder”, Valin et al 2022
Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder
“IRIS: Transformers Are Sample-Efficient World Models”, Micheli et al 2022
“Understanding Diffusion Models: A Unified Perspective”, Luo 2022
“Vector Quantized Image-To-Image Translation”, Chen et al 2022
“Draft-And-Revise: Effective Image Generation With Contextual RQ-Transformer”, Lee et al 2022
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
“UViM: A Unified Modeling Approach for Vision With Learned Guiding Codes”, Kolesnikov et al 2022
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
“Closing the Gap: Exact Maximum Likelihood Training of Generative Autoencoders Using Invertible Layers (AEF)”, Silvestri et al 2022
“AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars”, Hong et al 2022
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
“AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling”, Tu et al 2022
AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling
“NaturalSpeech: End-To-End Text to Speech Synthesis With Human-Level Quality”, Tan et al 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
“VQGAN-CLIP: Open Domain Image Generation and Editing With Natural Language Guidance”, Crowson et al 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
“TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer”, Ge et al 2022
TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
“Diffusion Probabilistic Modeling for Video Generation”, Yang et al 2022
“Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values”, Humayun et al 2022
“Vector-Quantized Image Modeling With Improved VQGAN”, Yu et al 2022
“Variational Autoencoders Without the Variation”, Daly et al 2022
“Truncated Diffusion Probabilistic Models and Diffusion-Based Adversarial Autoencoders”, Zheng et al 2022
Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Autoencoders
“MLR: A Model of Working Memory for Latent Representations”, Hedayati et al 2022
“CM3: A Causal Masked Multimodal Model of the Internet”, Aghajanyan et al 2022
“Design Guidelines for Prompt Engineering Text-To-Image Generative Models”, Liu & Chilton 2022b
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
“DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents”, Pandey et al 2022
DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents
“ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, Zhang et al 2021
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
“High-Resolution Image Synthesis With Latent Diffusion Models”, Rombach et al 2021
High-Resolution Image Synthesis with Latent Diffusion Models
“Discovering State Variables Hidden in Experimental Data”, Chen et al 2021
“VQ-DDM: Global Context With Discrete Diffusion in Vector Quantized Modeling for Image Generation”, Hu et al 2021
VQ-DDM: Global Context with Discrete Diffusion in Vector Quantized Modeling for Image Generation
“Vector Quantized Diffusion Model for Text-To-Image Synthesis”, Gu et al 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
“Passive Non-Line-Of-Sight Imaging Using Optimal Transport”, Geng et al 2021
“L-Verse: Bidirectional Generation Between Image and Text”, Kim et al 2021
“Unsupervised Deep Learning Identifies Semantic Disentanglement in Single Inferotemporal Face Patch Neurons”, Higgins et al 2021
“Telling Creative Stories Using Generative Visual Aids”, Ali & Parikh 2021
“Illiterate DALL·E Learns to Compose”, Singh et al 2021
“MeLT: Message-Level Transformer With Masked Document Representations As Pre-Training for Stance Detection”, Matero et al 2021
“Score-Based Generative Modeling in Latent Space”, Vahdat et al 2021
“NWT: Towards Natural Audio-To-Video Generation With Representation Learning”, Mama et al 2021
NWT: Towards natural audio-to-video generation with representation learning
“Vector Quantized Models for Planning”, Ozair et al 2021
“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Yan et al 2021
“TSDAE: Using Transformer-Based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning”, Wang et al 2021
“Symbolic Music Generation With Diffusion Models”, Mittal et al 2021
“Deep Generative Modeling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models”, Bond-Taylor et al 2021
“Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction”, Wu et al 2021
Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction
“CW-VAE: Clockwork Variational Autoencoders”, Saxena et al 2021
“Denoising Diffusion Implicit Models”, Song et al 2021
“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Ramesh et al 2021
“VQ-GAN: Taming Transformers for High-Resolution Image Synthesis”, Esser et al 2020
VQ-GAN: Taming Transformers for High-Resolution Image Synthesis
“Multimodal Dynamics Modeling for Off-Road Autonomous Vehicles”, Tremblay et al 2020
Multimodal dynamics modeling for off-road autonomous vehicles
“Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, Child 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
“NVAE: A Deep Hierarchical Variational Autoencoder”, Vahdat & Kautz 2020
“Jukebox: A Generative Model for Music”, Dhariwal et al 2020
“Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, Dhariwal et al 2020
“RL Agents Implicitly Learning Human Preferences”, Wichers 2020
“Encoding Musical Style With Transformer Autoencoders”, Choi et al 2019
“Generating Furry Face Art from Sketches Using a GAN”, Yu 2019
“BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension”, Lewis et al 2019
“Bayesian Parameter Estimation Using Conditional Variational Autoencoders for Gravitational-Wave Astronomy”, Gabbard et al 2019
“In-Field Whole Plant Maize Architecture Characterized by Latent Space Phenotyping”, Gage et al 2019
In-field whole plant maize architecture characterized by Latent Space Phenotyping
“Generating Diverse High-Fidelity Images With VQ-VAE-2”, Razavi et al 2019
“Bit-Swap: Recursive Bits-Back Coding for Lossless Compression With Hierarchical Latent Variables”, Kingma et al 2019
Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables
“Hierarchical Autoregressive Image Models With Auxiliary Decoders”, Fauw et al 2019
Hierarchical Autoregressive Image Models with Auxiliary Decoders
“Practical Lossless Compression With Latent Variables Using Bits Back Coding”, Townsend et al 2019
Practical Lossless Compression with Latent Variables using Bits Back Coding
“An Empirical Model of Large-Batch Training”, McCandlish et al 2018
“How AI Training Scales”, McCandlish et al 2018
“Neural Probabilistic Motor Primitives for Humanoid Control”, Merel et al 2018
“Piano Genie”, Donahue et al 2018
“IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis”, Huang et al 2018
IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis
“InfoNCE: Representation Learning With Contrastive Predictive Coding (CPC)”, Oord et al 2018
InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)
“The Challenge of Realistic Music Generation: Modeling Raw Audio at Scale”, Dieleman et al 2018
The challenge of realistic music generation: modeling raw audio at scale
“Self-Net: Lifelong Learning via Continual Self-Modeling”, Camp et al 2018
“GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training”, Akcay et al 2018
GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training
“XGAN: Unsupervised Image-To-Image Translation for Many-To-Many Mappings”, Royer et al 2017
XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings
“VQ-VAE: Neural Discrete Representation Learning”, Oord et al 2017
“Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration”, Rahmatizadeh et al 2017
“Β-VAE: Learning Basic Visual Concepts With a Constrained Variational Framework”, Higgins et al 2017
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
“Neural Audio Synthesis of Musical Notes With WaveNet Autoencoders”, Engel et al 2017
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
“Prediction and Control With Temporal Segment Models”, Mishra et al 2017
“Discovering Objects and Their Relations from Entangled Scene Representations”, Raposo et al 2017
Discovering objects and their relations from entangled scene representations
“Categorical Reparameterization With Gumbel-Softmax”, Jang et al 2016
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Maddison et al 2016
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
“Improving Sampling from Generative Autoencoders With Markov Chains”, Creswell et al 2016
Improving Sampling from Generative Autoencoders with Markov Chains
“Language As a Latent Variable: Discrete Generative Models for Sentence Compression”, Miao & Blunsom 2016
Language as a Latent Variable: Discrete Generative Models for Sentence Compression
“Neural Photo Editing With Introspective Adversarial Networks”, Brock et al 2016
Neural Photo Editing with Introspective Adversarial Networks
“Early Visual Concept Learning With Unsupervised Deep Learning”, Higgins et al 2016
Early Visual Concept Learning with Unsupervised Deep Learning
“Improving Variational Inference With Inverse Autoregressive Flow”, Kingma et al 2016
Improving Variational Inference with Inverse Autoregressive Flow
“How Far Can We Go without Convolution: Improving Fully-Connected Networks”, Lin et al 2015
How far can we go without convolution: Improving fully-connected networks
“Semi-Supervised Sequence Learning”, Dai & Le 2015
“MADE: Masked Autoencoder for Distribution Estimation”, Germain et al 2015
“Analyzing Noise in Autoencoders and Deep Networks”, Poole et al 2014
“Stochastic Backpropagation and Approximate Inference in Deep Generative Models”, Rezende et al 2014
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
“Auto-Encoding Variational Bayes”, Kingma & Welling 2013
“Building High-Level Features Using Large Scale Unsupervised Learning”, Le et al 2011
Building high-level features using large scale unsupervised learning
“A Connection Between Score Matching and Denoising Autoencoders”, Vincent 2011
A Connection Between Score Matching and Denoising Autoencoders
“Reducing the Dimensionality of Data With Neural Networks”, Hinton & Salakhutdinov 2006
“Generating Large Images from Latent Vectors”, Ha 2024
Generating Large Images from Latent Vectors:
View External Link:
https://blog.otoro.net/2016/04/01/generating-large-images-from-latent-vectors/
“Transformers As Variational Autoencoders”
“Randomly Traversing the Manifold of Faces (2): Dataset: Labeled Faces in the Wild (LFW); Model: Variational Autoencoder (VAE) / Deep Latent Gaussian Model (DLGM).”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
latent-phenotyping
multimodal-training
autoregressive-modeling
generative-models music-image fusion autoencoding crafts image-synthesis visual-storytelling generative-models
Wikipedia
Miscellaneous
Bibliography
-
https://arxiv.org/abs/2406.11837
: “Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99%”, -
https://arxiv.org/abs/2312.02116
: “GIVT: Generative Infinite-Vocabulary Transformers”, -
https://arxiv.org/abs/2309.15505
: “Finite Scalar Quantization (FSQ): VQ-VAE Made Simple”, -
https://arxiv.org/abs/2304.13731
: “TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model”, -
https://arxiv.org/abs/2304.13705
: “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware”, -
https://arxiv.org/abs/2209.00588
: “IRIS: Transformers Are Sample-Efficient World Models”, -
https://arxiv.org/abs/2205.08535
: “AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars”, -
https://arxiv.org/abs/2205.04421#microsoft
: “NaturalSpeech: End-To-End Text to Speech Synthesis With Human-Level Quality”, -
https://arxiv.org/abs/2204.03638#facebook
: “TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer”, -
https://arxiv.org/abs/2203.01993
: “Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values”, -
https://arxiv.org/abs/2110.04627#google
: “Vector-Quantized Image Modeling With Improved VQGAN”, -
https://arxiv.org/abs/2201.07520#facebook
: “CM3: A Causal Masked Multimodal Model of the Internet”, -
2022-liu-2.pdf
: “Design Guidelines for Prompt Engineering Text-To-Image Generative Models”, -
https://arxiv.org/abs/2112.15283#baidu
: “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation”, -
https://arxiv.org/abs/2112.10752
: “High-Resolution Image Synthesis With Latent Diffusion Models”, -
https://arxiv.org/abs/2111.11133
: “L-Verse: Bidirectional Generation Between Image and Text”, -
https://arxiv.org/abs/2106.04615#deepmind
: “Vector Quantized Models for Planning”, -
https://arxiv.org/abs/2104.10157
: “VideoGPT: Video Generation Using VQ-VAE and Transformers”, -
https://openai.com/research/dall-e
: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, -
https://arxiv.org/abs/2011.10650#openai
: “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images”, -
https://arxiv.org/abs/2007.03898#nvidia
: “NVAE: A Deep Hierarchical Variational Autoencoder”, -
https://cdn.openai.com/papers/jukebox.pdf
: “Jukebox: A Generative Model for Music”, -
https://openai.com/research/jukebox
: “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples.”, -
https://arxiv.org/abs/1910.13461#facebook
: “BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension”, -
https://openai.com/research/how-ai-training-scales
: “How AI Training Scales”, -
2011-vincent.pdf
: “A Connection Between Score Matching and Denoising Autoencoders”,