‘masked autoencoder’ directory

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

Miscellaneous

Bibliography

https://arxiv.org/abs/2410.18514: “Scaling up Masked Diffusion Models on Text ”⁠, Shen Nie, Fengqi Zhu, Chao Du …, Tianyu Pang, Qian Liu⁠, Guangtao Zeng, Min Lin, Chongxuan Li
link-bibliography⁠
https://arxiv.org/abs/2410.03755#meituan: “Denoising With a Joint-Embedding Predictive Architecture ”⁠, Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu
link-bibliography⁠
https://arxiv.org/abs/2409.16211#bytedance: “MaskBit: Embedding-Free Image Generation via Bit Tokens ”⁠, Mark Weber, Lijun Yu, Qihang Yu …, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen
link-bibliography⁠
https://arxiv.org/abs/2406.04329#deepmind: “Simplified and Generalized Masked Diffusion for Discrete Data ”⁠, Jiaxin Shi, Kehang Han, Zhe Wang …, Arnaud Doucet, Michalis K. Titsias
link-bibliography⁠
https://arxiv.org/abs/2401.14391: “Rethinking Patch Dependence for Masked Autoencoders ”⁠, Letian Fu, Long Lian, Renhao Wang …, Baifeng Shi, Xudong Wang, Adam Yala, Trevor Darrell⁠, Alexei A. Efros⁠, Ken Goldberg
link-bibliography⁠
https://arxiv.org/abs/2311.01017: “Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion ”⁠, Lunjun Zhang, Yuwen Xiong, Ze Yang …, Sergio Casas⁠, Rui Hu, Raquel Urtasun⁠
link-bibliography⁠
https://arxiv.org/abs/2307.05014: “Test-Time Training on Video Streams ”⁠, Renhao Wang, ⁠Yu Sun, Yossi Gandelsman …, Xinlei Chen, Alexei A. Efros⁠, Xiaolong Wang
link-bibliography⁠
https://arxiv.org/abs/2306.09346: “Rosetta Neurons: Mining the Common Units in a Model Zoo ”⁠, Amil Dravid, Yossi Gandelsman, Alexei A. Efros⁠, Assaf Shocher
link-bibliography⁠
https://arxiv.org/abs/2305.09636#google: “SoundStorm: Efficient Parallel Audio Generation ”⁠, Zalán Borsos, Matt Sharifi⁠, Damien Vincent …, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
link-bibliography⁠
https://arxiv.org/abs/2303.14389: “Masked Diffusion Transformer Is a Strong Image Synthesizer ”⁠, Shanghua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan
link-bibliography⁠
https://arxiv.org/abs/2301.07088#bytedance: “MUG: Vision Learners Meet Web Image-Text Pairs ”⁠, Bingchen Zhao, Quan Cui, Hao Wu …, Osamu Yoshie, Cheng Yang
link-bibliography⁠
https://arxiv.org/abs/2301.01296#microsoft: “TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models ”⁠, Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu
link-bibliography⁠
https://arxiv.org/abs/2301.00704#google: “Muse: Text-To-Image Generation via Masked Generative Transformers ”⁠, Huiwen Chang, Han Zhang⁠, Jarred Barber …, A. J. Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman⁠, Michael Rubinstein⁠, Yuanzhen Li, Dilip Krishnan
link-bibliography⁠
https://arxiv.org/abs/2212.05199#google: “MAGVIT: Masked Generative Video Transformer ”⁠, Lijun Yu, Yong Cheng, Kihyuk Sohn …, José Lezama, Han Zhang⁠, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao⁠, Irfan Essa⁠, Lu Jiang
link-bibliography⁠
https://openreview.net/forum?id=wmGlMhaBe0: “MaskDistill: A Unified View of Masked Image Modeling ”⁠, Anonymous
link-bibliography⁠
https://arxiv.org/abs/2211.09117#google: “MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis ”⁠, Tianhong Li, Huiwen Chang, Shlok Kumar Mishra …, Han Zhang⁠, Dina Katabi⁠, Dilip Krishnan
link-bibliography⁠
https://arxiv.org/abs/2211.07292: “Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces ”⁠, Dominic Rampas, Pablo Pernias, Elea Zhong, Marc Aubreville
link-bibliography⁠
https://arxiv.org/abs/2211.07636#baai: “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale ”⁠, Yuxin Fang, Wen Wang⁠, Binhui Xie …, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
link-bibliography⁠
https://arxiv.org/abs/2209.14156: “TVLT: Textless Vision-Language Transformer ”⁠, Zineng Tang, Jaemin Cho, Yixin Nie, ⁠Mohit Bansal
link-bibliography⁠
https://arxiv.org/abs/2207.13532#bytedance: “CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners ”⁠, Zhicheng Huang, Xiaojie Jin, Chengze Lu …, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng
link-bibliography⁠
https://arxiv.org/abs/2207.06991: “PIXEL: Language Modeling With Pixels ”⁠, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello …, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott⁠
link-bibliography⁠
https://arxiv.org/abs/2207.06405#facebook: “Masked Autoencoders That Listen ”⁠, Po-Yao, Huang, Hu Xu …, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer
link-bibliography⁠
https://arxiv.org/abs/2206.08356#facebook: “OmniMAE: Single Model Masked Pretraining on Images and Videos ”⁠, Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh …, Kalyan Vasudev Alwala, Armand Joulin⁠, Ishan Misra
link-bibliography⁠
https://arxiv.org/abs/2205.14204#google: “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations ”⁠, Xinyang Geng, Hao Liu, Lisa Lee⁠ …, Dale Schuurams, Sergey Levine⁠, Pieter Abbeel⁠
link-bibliography⁠
https://arxiv.org/abs/2205.09113#facebook: “Masked Autoencoders As Spatiotemporal Learners ”⁠, Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He⁠
link-bibliography⁠
https://arxiv.org/abs/2204.14217#baai: “CogView2: Faster and Better Text-To-Image Generation via Hierarchical Transformers ”⁠, Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang⁠
link-bibliography⁠
https://arxiv.org/abs/2111.09886#microsoft: “SimMIM: A Simple Framework for Masked Image Modeling ”⁠, Zhenda Xie, Zheng Zhang, Yue Cao …, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu
link-bibliography⁠
https://arxiv.org/abs/2111.06377#facebook: “MAE: Masked Autoencoders Are Scalable Vision Learners ”⁠, Kaiming He⁠, Xinlei Chen, Saining Xie …, Yanghao Li, Piotr Dollár, Ross Girshick⁠
link-bibliography⁠
https://arxiv.org/abs/2110.15349: “Learning to Ground Multi-Agent Communication With Autoencoders ”⁠, Toru Lin, Minyoung Huh, Chris Stauffer …, Ser-Nam Lim, Phillip Isola⁠
link-bibliography⁠