- See Also
-
Links
- “Masked Diffusion Transformer Is a Strong Image Synthesizer”, Et Al 2023
- “PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling”, Et Al 2023
- “MUG: Vision Learners Meet Web Image-Text Pairs”, Et Al 2023
- “TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, Et Al 2023
- “Muse: Text-To-Image Generation via Masked Generative Transformers”, Et Al 2023
- “MAGVIT: Masked Generative Video Transformer”, Et Al 2022
- “Scaling Language-Image Pre-training via Masking”, Et Al 2022
- “MaskDistill: A Unified View of Masked Image Modeling”, 2022
- “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Et Al 2022
- “Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, Et Al 2022
- “Exploring Long-Sequence Masked Autoencoders”, Et Al 2022
- “TVLT: Textless Vision-Language Transformer”, Et Al 2022
- “CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners”, Et Al 2022
- “PIXEL: Language Modelling With Pixels”, Et Al 2022
- “Masked Autoencoders That Listen”, Po-Et Al 2022
- “OmniMAE: Single Model Masked Pretraining on Images and Videos”, Et Al 2022
- “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Et Al 2022
- “Masked Autoencoders As Spatiotemporal Learners”, Et Al 2022
- “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Et Al 2022
- “MaskGIT: Masked Generative Image Transformer”, Et Al 2022
- “MAE: Masked Autoencoders Are Scalable Vision Learners”, Et Al 2021
- Miscellaneous
- Link Bibliography
See Also
Links
“Masked Diffusion Transformer Is a Strong Image Synthesizer”, Et Al 2023
“Masked Diffusion Transformer is a Strong Image Synthesizer”, 2023-03-25 ( ; similar; bibliography)
“PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling”, Et Al 2023
“PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling”, 2023-03-04 (similar)
“MUG: Vision Learners Meet Web Image-Text Pairs”, Et Al 2023
“MUG: Vision Learners Meet Web Image-Text Pairs”, 2023-01-17 ( ; similar; bibliography)
“TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, Et Al 2023
“TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, 2023-01-03 ( ; similar; bibliography)
“Muse: Text-To-Image Generation via Masked Generative Transformers”, Et Al 2023
“Muse: Text-To-Image Generation via Masked Generative Transformers”, 2023-01-02 ( ; similar; bibliography)
“MAGVIT: Masked Generative Video Transformer”, Et Al 2022
“MAGVIT: Masked Generative Video Transformer”, 2022-12-10 ( ; similar; bibliography)
“Scaling Language-Image Pre-training via Masking”, Et Al 2022
“Scaling Language-Image Pre-training via Masking”, 2022-12-01 ( ; similar)
“MaskDistill: A Unified View of Masked Image Modeling”, 2022
“MaskDistill: A Unified View of Masked Image Modeling”, 2022-11-17 ( ; similar; bibliography)
“EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Et Al 2022
“EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, 2022-11-14 ( ; similar; bibliography)
“Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, Et Al 2022
“Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, 2022-11-14 ( ; backlinks; similar; bibliography)
“Exploring Long-Sequence Masked Autoencoders”, Et Al 2022
“Exploring Long-Sequence Masked Autoencoders”, 2022-10-13 (similar)
“TVLT: Textless Vision-Language Transformer”, Et Al 2022
“TVLT: Textless Vision-Language Transformer”, 2022-09-28 ( ; similar; bibliography)
“CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners”, Et Al 2022
“CMAE: Contrastive Masked Autoencoders are Stronger Vision Learners”, 2022-07-27 (similar; bibliography)
“PIXEL: Language Modelling With Pixels”, Et Al 2022
“PIXEL: Language Modelling with Pixels”, 2022-07-14 ( ; backlinks; similar; bibliography)
“Masked Autoencoders That Listen”, Po-Et Al 2022
“Masked Autoencoders that Listen”, 2022-07-13 (similar; bibliography)
“OmniMAE: Single Model Masked Pretraining on Images and Videos”, Et Al 2022
“OmniMAE: Single Model Masked Pretraining on Images and Videos”, 2022-06-16 ( ; similar; bibliography)
“M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Et Al 2022
“M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, 2022-05-27 ( ; similar; bibliography)
“Masked Autoencoders As Spatiotemporal Learners”, Et Al 2022
“Masked Autoencoders As Spatiotemporal Learners”, 2022-05-18 ( ; similar; bibliography)
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Et Al 2022
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, 2022-04-28 ( ; similar; bibliography)
“MaskGIT: Masked Generative Image Transformer”, Et Al 2022
“MaskGIT: Masked Generative Image Transformer”, 2022-02-08 ( ; similar)
“MAE: Masked Autoencoders Are Scalable Vision Learners”, Et Al 2021
“MAE: Masked Autoencoders Are Scalable Vision Learners”, 2021-11-11 ( ; similar; bibliography)
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2303.14389
: “Masked Diffusion Transformer Is a Strong Image Synthesizer”, Shanghua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan: -
https://arxiv.org/abs/2301.07088#bytedance
: “MUG: Vision Learners Meet Web Image-Text Pairs”, Bingchen Zhao, Quan Cui, Hao Wu, Osamu Yoshie, Cheng Yang: -
https://arxiv.org/abs/2301.01296#microsoft
: “TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu: -
https://arxiv.org/abs/2301.00704#google
: “Muse: Text-To-Image Generation via Masked Generative Transformers”, : -
https://arxiv.org/abs/2212.05199
: “MAGVIT: Masked Generative Video Transformer”, : -
https://openreview.net/forum?id=wmGlMhaBe0
: “MaskDistill: A Unified View of Masked Image Modeling”, Anonymous: -
https://arxiv.org/abs/2211.07636#baai
: “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao: -
https://arxiv.org/abs/2211.07292
: “Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, Dominic Rampas, Pablo Pernias, Elea Zhong, Marc Aubreville: -
https://arxiv.org/abs/2209.14156
: “TVLT: Textless Vision-Language Transformer”, Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal: -
https://arxiv.org/abs/2207.13532#bytedance
: “CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners”, Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng: -
https://arxiv.org/abs/2207.06991
: “PIXEL: Language Modelling With Pixels”, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott: -
https://arxiv.org/abs/2207.06405#facebook
: “Masked Autoencoders That Listen”, Po-Yao, Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer: -
https://arxiv.org/abs/2206.08356#facebook
: “OmniMAE: Single Model Masked Pretraining on Images and Videos”, Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Arm, Joulin, Ishan Misra: -
https://arxiv.org/abs/2205.14204#google
: “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Xinyang Geng, Hao Liu, Lisa Lee, Dale Schuurams, Sergey Levine, Pieter Abbeel: -
https://arxiv.org/abs/2205.09113#facebook
: “Masked Autoencoders As Spatiotemporal Learners”, Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He: -
https://arxiv.org/abs/2204.14217#baai
: “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang: -
https://arxiv.org/abs/2111.06377#facebook
: “MAE: Masked Autoencoders Are Scalable Vision Learners”, Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick: