Bibliography:

  1. ‘neural net’ tag

  2. ‘masked autoencoder’ tag

  3. ‘GAN’ tag

  4. ‘DALL·E’ tag

  5. ‘Jukebox’ tag

  6. Anime Neural Net Graveyard

  7. Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

  8. Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction

  9. Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

  10. Neural Network Parameter Diffusion

  11. Attention versus Contrastive Learning of Tabular Data—A Data-centric Benchmarking

  12. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

  13. GIVT: Generative Infinite-Vocabulary Transformers

  14. Sequential Modeling Enables Scalable Learning for Large Vision Models

  15. Finite Scalar Quantization (FSQ): VQ-VAE Made Simple

  16. DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation

  17. Finding Neurons in a Haystack: Case Studies with Sparse Probing

  18. TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

  19. ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

  20. Bridging Discrete and Backpropagation: Straight-Through and Beyond

  21. Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder

  22. IRIS: Transformers are Sample-Efficient World Models

  23. Understanding Diffusion Models: A Unified Perspective

  24. Vector Quantized Image-to-Image Translation

  25. Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

  26. UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

  27. Closing the gap: Exact maximum likelihood training of generative autoencoders using invertible layers (AEF)

  28. AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

  29. AdaVAE: Exploring Adaptive GPT-2s in Variational Autoencoders for Language Modeling

  30. NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

  31. VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

  32. TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

  33. Diffusion Probabilistic Modeling for Video Generation

  34. Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

  35. Vector-quantized Image Modeling with Improved VQGAN

  36. Variational Autoencoders Without the Variation

  37. Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Autoencoders

  38. MLR: A model of working memory for latent representations

  39. CM3: A Causal Masked Multimodal Model of the Internet

  40. Design Guidelines for Prompt Engineering Text-to-Image Generative Models

  41. DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

  42. ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

  43. High-Resolution Image Synthesis with Latent Diffusion Models

  44. Discovering State Variables Hidden in Experimental Data

  45. VQ-DDM: Global Context with Discrete Diffusion in Vector Quantized Modeling for Image Generation

  46. Vector Quantized Diffusion Model for Text-to-Image Synthesis

  47. Passive Non-Line-of-Sight Imaging Using Optimal Transport

  48. L-Verse: Bidirectional Generation Between Image and Text

  49. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

  50. Telling Creative Stories Using Generative Visual Aids

  51. Illiterate DALL·E Learns to Compose

  52. MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

  53. Score-based Generative Modeling in Latent Space

  54. NWT: Towards natural audio-to-video generation with representation learning

  55. Vector Quantized Models for Planning

  56. VideoGPT: Video Generation using VQ-VAE and Transformers

  57. TSDAE: Using Transformer-based Sequential Denoising Autoencoder for Unsupervised Sentence Embedding Learning

  58. Symbolic Music Generation with Diffusion Models

  59. Deep Generative Modeling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models

  60. Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction

  61. CW-VAE: Clockwork Variational Autoencoders

  62. Denoising Diffusion Implicit Models

  63. DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

  64. VQ-GAN: Taming Transformers for High-Resolution Image Synthesis

  65. Multimodal dynamics modeling for off-road autonomous vehicles

  66. Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

  67. NVAE: A Deep Hierarchical Variational Autoencoder

  68. Jukebox: A Generative Model for Music

  69. Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

  70. RL agents Implicitly Learning Human Preferences

  71. Encoding Musical Style with Transformer Autoencoders

  72. Generating Furry Face Art from Sketches using a GAN

  73. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

  74. Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy

  75. In-field whole plant maize architecture characterized by Latent Space Phenotyping

  76. Generating Diverse High-Fidelity Images with VQ-VAE-2

  77. Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables

  78. Hierarchical Autoregressive Image Models with Auxiliary Decoders

  79. Practical Lossless Compression with Latent Variables using Bits Back Coding

  80. An Empirical Model of Large-Batch Training

  81. How AI Training Scales

  82. Neural probabilistic motor primitives for humanoid control

  83. Piano Genie

  84. IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

  85. InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)

  86. The challenge of realistic music generation: modeling raw audio at scale

  87. Self-Net: Lifelong Learning via Continual Self-Modeling

  88. GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training

  89. XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings

  90. VQ-VAE: Neural Discrete Representation Learning

  91. Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration

  92. β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

  93. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

  94. Prediction and Control with Temporal Segment Models

  95. Discovering objects and their relations from entangled scene representations

  96. Categorical Reparameterization with Gumbel-Softmax

  97. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

  98. Improving Sampling from Generative Autoencoders with Markov Chains

  99. Language as a Latent Variable: Discrete Generative Models for Sentence Compression

  100. Neural Photo Editing with Introspective Adversarial Networks

  101. Early Visual Concept Learning with Unsupervised Deep Learning

  102. Improving Variational Inference with Inverse Autoregressive Flow

  103. How far can we go without convolution: Improving fully-connected networks

  104. Semi-supervised Sequence Learning

  105. MADE: Masked Autoencoder for Distribution Estimation

  106. Analyzing noise in autoencoders and deep networks

  107. Stochastic Backpropagation and Approximate Inference in Deep Generative Models

  108. Auto-Encoding Variational Bayes

  109. Building high-level features using large scale unsupervised learning

  110. A Connection Between Score Matching and Denoising Autoencoders

  111. Reducing the Dimensionality of Data with Neural Networks

  112. Generating Large Images from Latent Vectors

  113. Transformers As Variational Autoencoders

  114. Randomly Traversing the Manifold of Faces (2): Dataset: Labeled Faces in the Wild (LFW); Model: Variational Autoencoder (VAE) / Deep Latent Gaussian Model (DLGM).

  115. design#future-tag-features

    [Transclude the forward-link's context]

  116. 2021-zhang-figure4-ernievilggeneratedsamplesopendomain.png

  117. https://euclaise.xyz/vq-is-mlp

  118. https://opus-codec.org/demo/opus-1.5/

  119. 619f5130ce1a1f43a51fad3ba6349dd05a72957e.html

  120. https://sander.ai/2023/07/20/perspectives.html

  121. https://transformer-circuits.pub/2024/jan-update/index.html#dict-learning-resampling

  122. 1130e7e865e2fceb93dfd2f36bdd430a3015511f.html#dict-learning-resampling

  123. https://www.lesswrong.com/posts/bD4B2MF7nsGAfH9fj/basic-mathematics-of-predictive-coding

  124. https://www.lesswrong.com/posts/wqRqb7h6ZC48iDgfK/tentatively-found-600-monosemantic-features-in-a-small-lm

  125. https://x.com/haruu1367/status/1579286947519864833

  126. Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

  127. https%253A%252F%252Farxiv.org%252Fabs%252F2406.11837.html

  128. Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction

  129. https%253A%252F%252Farxiv.org%252Fabs%252F2404.02905%2523bytedance.html

  130. GIVT: Generative Infinite-Vocabulary Transformers

  131. https%253A%252F%252Farxiv.org%252Fabs%252F2312.02116.html

  132. Finite Scalar Quantization (FSQ): VQ-VAE Made Simple

  133. https%253A%252F%252Farxiv.org%252Fabs%252F2309.15505.html

  134. TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

  135. https%253A%252F%252Farxiv.org%252Fabs%252F2304.13731.html

  136. ACT: Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

  137. Sergey Levine

  138. https%253A%252F%252Farxiv.org%252Fabs%252F2304.13705.html

  139. IRIS: Transformers are Sample-Efficient World Models

  140. https%253A%252F%252Farxiv.org%252Fabs%252F2209.00588.html

  141. AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

  142. https%253A%252F%252Farxiv.org%252Fabs%252F2205.08535.html

  143. NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

  144. https%253A%252F%252Farxiv.org%252Fabs%252F2205.04421%2523microsoft.html

  145. TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer

  146. https%253A%252F%252Farxiv.org%252Fabs%252F2204.03638%2523facebook.html

  147. Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values

  148. https%253A%252F%252Farxiv.org%252Fabs%252F2203.01993.html

  149. Vector-quantized Image Modeling with Improved VQGAN

  150. https%253A%252F%252Farxiv.org%252Fabs%252F2110.04627%2523google.html

  151. CM3: A Causal Masked Multimodal Model of the Internet

  152. Mike Lewis

  153. Luke Zettlemoyer

  154. https%253A%252F%252Farxiv.org%252Fabs%252F2201.07520%2523facebook.html

  155. Design Guidelines for Prompt Engineering Text-to-Image Generative Models

  156. %252Fdoc%252Fai%252Fnn%252Ftransformer%252Fclip%252F2022-liu-2.pdf.html

  157. ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

  158. Yu Sun

  159. https%253A%252F%252Farxiv.org%252Fabs%252F2112.15283%2523baidu.html

  160. High-Resolution Image Synthesis with Latent Diffusion Models

  161. Robin Rombach

  162. Prof. Dr. Björn Ommer—Computer Vision & Learning Group

  163. https%253A%252F%252Farxiv.org%252Fabs%252F2112.10752.html

  164. L-Verse: Bidirectional Generation Between Image and Text

  165. https%253A%252F%252Farxiv.org%252Fabs%252F2111.11133.html

  166. Vector Quantized Models for Planning

  167. Sherjil Ozair

  168. https%253A%252F%252Farxiv.org%252Fabs%252F2106.04615%2523deepmind.html

  169. VideoGPT: Video Generation using VQ-VAE and Transformers

  170. Aravind Srinivas

  171. https%253A%252F%252Farxiv.org%252Fabs%252F2104.10157.html

  172. DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language

  173. Aditya A. Ramesh

  174. Speaker Details: EmTech MIT 2023

  175. Vedant Misra

  176. Gretchen Krueger

  177. Sandhini Agarwal

  178. https%253A%252F%252Fopenai.com%252Fresearch%252Fdall-e.html

  179. Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

  180. https%253A%252F%252Farxiv.org%252Fabs%252F2011.10650%2523openai.html

  181. NVAE: A Deep Hierarchical Variational Autoencoder

  182. https%253A%252F%252Farxiv.org%252Fabs%252F2007.03898%2523nvidia.html

  183. Jukebox: A Generative Model for Music

  184. Jong Wook Kim

  185. Alec Radford

  186. https%253A%252F%252Fcdn.openai.com%252Fpapers%252Fjukebox.pdf.html

  187. Jukebox: We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

  188. Jong Wook Kim

  189. Alec Radford

  190. https%253A%252F%252Fopenai.com%252Fresearch%252Fjukebox.html

  191. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

  192. Mike Lewis

  193. Omer Levy

  194. Luke Zettlemoyer

  195. https%253A%252F%252Farxiv.org%252Fabs%252F1910.13461%2523facebook.html

  196. How AI Training Scales

  197. Sam McCandlish

  198. Jared Kaplan

  199. https%253A%252F%252Fopenai.com%252Fresearch%252Fhow-ai-training-scales.html

  200. A Connection Between Score Matching and Denoising Autoencoders

  201. %252Fdoc%252Fai%252Fnn%252Fdiffusion%252F2011-vincent.pdf.html