‘autoencoder NN’ directory

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Wikipedia

Miscellaneous

Bibliography

https://arxiv.org/abs/2503.17074: “Emuru: Zero-Shot Styled Text Image Generation, but Make It Autoregressive ”⁠, Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli …, Alessio Tonioni, Rita Cucchiara
link-bibliography⁠
https://arxiv.org/abs/2502.10248#stepfun: “Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model ”⁠, Guoqing Ma, Haoyang Huang, Kun Yan …, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou⁠, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun⁠, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo Wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang⁠, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu⁠, Jie Yang⁠, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li⁠, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, Siqi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang
link-bibliography⁠
https://arxiv.org/abs/2406.11837: “Scaling the Codebook Size of VQGAN to 100,000 With a Utilization Rate of 99% ”⁠, Lei Zhu, Fangyun Wei, Yanye Lu, Dong Chen⁠
link-bibliography⁠
https://arxiv.org/abs/2404.02905#bytedance: “Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction ”⁠, Keyu Tian, Yi Jiang⁠, Zehuan Yuan …, Bingyue Peng, Liwei Wang
link-bibliography⁠
https://arxiv.org/abs/2312.02116: “GIVT: Generative Infinite-Vocabulary Transformers ”⁠, ⁠Michael Tschannen, Cian Eastwood, Fabian Mentzer
link-bibliography⁠
https://arxiv.org/abs/2309.15505: “Finite Scalar Quantization (FSQ): VQ-VAE Made Simple ”⁠, Fabian Mentzer, David Minnen, Eirikur Agustsson, ⁠Michael Tschannen
link-bibliography⁠
https://arxiv.org/abs/2304.13731: “TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model ”⁠, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria
link-bibliography⁠
https://arxiv.org/abs/2304.13705: “ACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardware ”⁠, Tony Z. Zhao, Vikash Kumar⁠, Sergey Levine⁠, Chelsea Finn⁠
link-bibliography⁠
https://arxiv.org/abs/2209.00588: “IRIS: Transformers Are Sample-Efficient World Models ”⁠, Vincent Micheli, Eloi Alonso, François Fleuret
link-bibliography⁠
https://arxiv.org/abs/2205.08535: “AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars ”⁠, Fangzhou Hong, Mingyuan Zhang, Liang Pan …, Zhongang Cai, Lei Yang⁠, Ziwei Liu
link-bibliography⁠
https://arxiv.org/abs/2205.04421#microsoft: “NaturalSpeech: End-To-End Text to Speech Synthesis With Human-Level Quality ”⁠, Xu Tan, Jiawei Chen, Haohe Liu …, Jian Cong, Chen Zhang⁠, Yanqing Liu, Xi Wang⁠, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin⁠, Sheng Zhao, Tie-Yan Liu⁠
link-bibliography⁠
https://arxiv.org/abs/2204.03638#facebook: “TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer ”⁠, Songwei Ge, Thomas Hayes, Harry Yang …, Xi Yin⁠, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh⁠
link-bibliography⁠
https://arxiv.org/abs/2203.01993: “Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values ”⁠, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk⁠
link-bibliography⁠
https://arxiv.org/abs/2110.04627#google: “Vector-Quantized Image Modeling With Improved VQGAN ”⁠, Jiahui Yu, Xin Li, Jing Yu Koh …, Han Zhang⁠, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu⁠
link-bibliography⁠
https://arxiv.org/abs/2201.07520#facebook: “CM3: A Causal Masked Multimodal Model of the Internet ”⁠, Armen Aghajanyan, Bernie Huang, Candace Ross …, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis⁠, Luke Zettlemoyer⁠
link-bibliography⁠
2022-liu-2.pdf: “Design Guidelines for Prompt Engineering Text-To-Image Generative Models ”⁠, Vivian Liu, Lydia B. Chilton
link-bibliography⁠
https://arxiv.org/abs/2112.15283#baidu: “ERNIE-ViLG: Unified Generative Pre-Training for Bidirectional Vision-Language Generation ”⁠, Han Zhang⁠, Weichong Yin, Yewei Fang …, Lanxin Li, Boqiang Duan, Zhihua Wu, ⁠Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
link-bibliography⁠
https://arxiv.org/abs/2112.10752: “High-Resolution Image Synthesis With Latent Diffusion Models ”⁠, Robin Rombach⁠, Andreas Blattmann, Dominik Lorenz …, Patrick Esser⁠, ⁠Björn Ommer
link-bibliography⁠
https://arxiv.org/abs/2111.11133: “L-Verse: Bidirectional Generation Between Image and Text ”⁠, Taehoon Kim, Gwangmo Song, Sihaeng Lee …, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae
link-bibliography⁠
https://arxiv.org/abs/2106.04615#deepmind: “Vector Quantized Models for Planning ”⁠, Sherjil Ozair, Yazhe Li, Ali Razavi …, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals⁠
link-bibliography⁠
https://arxiv.org/abs/2104.10157: “VideoGPT: Video Generation Using VQ-VAE and Transformers ”⁠, Wilson Yan, Yunzhi Zhang, Pieter Abbeel⁠, Aravind Srinivas⁠
link-bibliography⁠
https://openai.com/index/dall-e/: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language ”⁠, Aditya A. Ramesh⁠, Mikhail Pavlov, Gabriel Goh⁠ …, Scott Gray⁠, ⁠Mark Chen, ⁠Rewon Child, Vedant Misra, Pamela Mishkin⁠, Gretchen Krueger⁠, Sandhini Agarwal⁠, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/2011.10650#openai: “Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images ”⁠, ⁠Rewon Child
link-bibliography⁠
https://arxiv.org/abs/2007.03898#nvidia: “NVAE: A Deep Hierarchical Variational Autoencoder ”⁠, Arash Vahdat, Jan Kautz
link-bibliography⁠
https://cdn.openai.com/papers/jukebox.pdf: “Jukebox: A Generative Model for Music ”⁠, ⁠Prafulla Dhariwal, Heewoo Jun, Christine Payne⁠ …, ⁠Jong Wook Kim, Alec Radford⁠, Ilya Sutskever⁠
link-bibliography⁠
https://openai.com/research/jukebox: “Jukebox: We’re Introducing Jukebox, a Neural Net That Generates Music, including Rudimentary Singing, As Raw Audio in a Variety of Genres and Artist Styles. We’re Releasing the Model Weights and Code, along With a Tool to Explore the Generated Samples. ”⁠, ⁠Prafulla Dhariwal, Heewoo Jun, Christine Payne⁠ …, ⁠Jong Wook Kim, Alec Radford⁠, Ilya Sutskever⁠
link-bibliography⁠
https://arxiv.org/abs/1910.13461#facebook: “BART: Denoising Sequence-To-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension ”⁠, Mike Lewis⁠, Yinhan Liu, Naman Goyal …, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy⁠, Ves Stoyanov, Luke Zettlemoyer⁠
link-bibliography⁠
https://openai.com/research/how-ai-training-scales: “How AI Training Scales ”⁠, Sam McCandlish⁠, Jared Kaplan, Dario Amodei⁠
link-bibliography⁠
https://openreview.net/forum?id=Sy2fzU9gl#deepmind: “Β-VAE: Learning Basic Visual Concepts With a Constrained Variational Framework ”⁠, Irina Higgins, Loic Matthey, Arka Pal …, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner
link-bibliography⁠
2011-vincent.pdf: “A Connection Between Score Matching and Denoising Autoencoders ”⁠, Pascal Vincent⁠
link-bibliography⁠