- See Also
-
Links
- “Diffusion Models Beat GANs on Image Classification”, Mukhopadhyay et al 2023
- “Test-Time Training on Video Streams”, Wang et al 2023
- “Rosetta Neurons: Mining the Common Units in a Model Zoo”, Dravid et al 2023
- “Exposing Flaws of Generative Model Evaluation Metrics and Their Unfair Treatment of Diffusion Models”, Stein et al 2023
- “Generalizable Synthetic Image Detection via Language-guided Contrastive Learning”, Wu et al 2023
- “SoundStorm: Efficient Parallel Audio Generation”, Borsos et al 2023
- “A Cookbook of Self-Supervised Learning”, Balestriero et al 2023
- “CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval”, Wu et al 2023
- “Masked Diffusion Transformer Is a Strong Image Synthesizer”, Gao et al 2023
- “PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling”, Liu et al 2023
- “MUG: Vision Learners Meet Web Image-Text Pairs”, Zhao et al 2023
- “TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, Ren et al 2023
- “Muse: Text-To-Image Generation via Masked Generative Transformers”, Chang et al 2023
- “MAGVIT: Masked Generative Video Transformer”, Yu et al 2022
- “Scaling Language-Image Pre-training via Masking”, Li et al 2022
- “MaskDistill: A Unified View of Masked Image Modeling”, Anonymous 2022
- “MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis”, Li et al 2022
- “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Fang et al 2022
- “Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, Rampas et al 2022
- “Exploring Long-Sequence Masked Autoencoders”, Hu et al 2022
- “TVLT: Textless Vision-Language Transformer”, Tang et al 2022
- “PatchDropout: Economizing Vision Transformers Using Patch Dropout”, Liu et al 2022
- “CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners”, Huang et al 2022
- “PIXEL: Language Modelling With Pixels”, Rust et al 2022
- “Masked Autoencoders That Listen”, Po-Yao et al 2022
- “OmniMAE: Single Model Masked Pretraining on Images and Videos”, Girdhar et al 2022
- “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Geng et al 2022
- “Masked Autoencoders As Spatiotemporal Learners”, Feichtenhofer et al 2022
- “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Ding et al 2022
- “Should You Mask 15% in Masked Language Modeling?”, Wettig et al 2022
- “MaskGIT: Masked Generative Image Transformer”, Chang et al 2022
- “SimMIM: A Simple Framework for Masked Image Modeling”, Xie et al 2021
- “MAE: Masked Autoencoders Are Scalable Vision Learners”, He et al 2021
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“Diffusion Models Beat GANs on Image Classification”, Mukhopadhyay et al 2023
“Test-Time Training on Video Streams”, Wang et al 2023
“Rosetta Neurons: Mining the Common Units in a Model Zoo”, Dravid et al 2023
“Exposing Flaws of Generative Model Evaluation Metrics and Their Unfair Treatment of Diffusion Models”, Stein et al 2023
“Generalizable Synthetic Image Detection via Language-guided Contrastive Learning”, Wu et al 2023
“Generalizable Synthetic Image Detection via Language-guided Contrastive Learning”
“SoundStorm: Efficient Parallel Audio Generation”, Borsos et al 2023
“A Cookbook of Self-Supervised Learning”, Balestriero et al 2023
“CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval”, Wu et al 2023
“Masked Diffusion Transformer Is a Strong Image Synthesizer”, Gao et al 2023
“Masked Diffusion Transformer is a Strong Image Synthesizer”
“PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling”, Liu et al 2023
“PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling”
“MUG: Vision Learners Meet Web Image-Text Pairs”, Zhao et al 2023
“TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, Ren et al 2023
“TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”
“Muse: Text-To-Image Generation via Masked Generative Transformers”, Chang et al 2023
“Muse: Text-To-Image Generation via Masked Generative Transformers”
“MAGVIT: Masked Generative Video Transformer”, Yu et al 2022
“Scaling Language-Image Pre-training via Masking”, Li et al 2022
“MaskDistill: A Unified View of Masked Image Modeling”, Anonymous 2022
“MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis”, Li et al 2022
“MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis”
“EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Fang et al 2022
“EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”
“Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, Rampas et al 2022
“Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”
“Exploring Long-Sequence Masked Autoencoders”, Hu et al 2022
“TVLT: Textless Vision-Language Transformer”, Tang et al 2022
“PatchDropout: Economizing Vision Transformers Using Patch Dropout”, Liu et al 2022
“PatchDropout: Economizing Vision Transformers Using Patch Dropout”
“CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners”, Huang et al 2022
“CMAE: Contrastive Masked Autoencoders are Stronger Vision Learners”
“PIXEL: Language Modelling With Pixels”, Rust et al 2022
“Masked Autoencoders That Listen”, Po-Yao et al 2022
“OmniMAE: Single Model Masked Pretraining on Images and Videos”, Girdhar et al 2022
“OmniMAE: Single Model Masked Pretraining on Images and Videos”
“M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Geng et al 2022
“M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”
“Masked Autoencoders As Spatiotemporal Learners”, Feichtenhofer et al 2022
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Ding et al 2022
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”
“Should You Mask 15% in Masked Language Modeling?”, Wettig et al 2022
“MaskGIT: Masked Generative Image Transformer”, Chang et al 2022
“SimMIM: A Simple Framework for Masked Image Modeling”, Xie et al 2021
“MAE: Masked Autoencoders Are Scalable Vision Learners”, He et al 2021
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
masked-learning
vision-autoencoder
pretraining
masked-modeling
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2307.05014
: “Test-Time Training on Video Streams”, Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang -
https://arxiv.org/abs/2306.09346
: “Rosetta Neurons: Mining the Common Units in a Model Zoo”, Amil Dravid, Yossi Gandelsman, Alexei A. Efros, Assaf Shocher -
https://arxiv.org/abs/2305.09636#google
: “SoundStorm: Efficient Parallel Audio Generation”, Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi -
https://arxiv.org/abs/2303.14389
: “Masked Diffusion Transformer Is a Strong Image Synthesizer”, Shanghua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan -
https://arxiv.org/abs/2301.07088#bytedance
: “MUG: Vision Learners Meet Web Image-Text Pairs”, Bingchen Zhao, Quan Cui, Hao Wu, Osamu Yoshie, Cheng Yang -
https://arxiv.org/abs/2301.01296#microsoft
: “TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models”, Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu -
https://arxiv.org/abs/2301.00704#google
: “Muse: Text-To-Image Generation via Masked Generative Transformers”, -
https://arxiv.org/abs/2212.05199#google
: “MAGVIT: Masked Generative Video Transformer”, -
https://openreview.net/forum?id=wmGlMhaBe0
: “MaskDistill: A Unified View of Masked Image Modeling”, Anonymous -
https://arxiv.org/abs/2211.09117#google
: “MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis”, Tianhong Li, Huiwen Chang, Shlok Kumar Mishra, Han Zhang, Dina Katabi, Dilip Krishnan -
https://arxiv.org/abs/2211.07636#baai
: “EVA: Exploring the Limits of Masked Visual Representation Learning at Scale”, Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao -
https://arxiv.org/abs/2211.07292
: “Paella: Fast Text-Conditional Discrete Denoising on Vector-Quantized Latent Spaces”, Dominic Rampas, Pablo Pernias, Elea Zhong, Marc Aubreville -
https://arxiv.org/abs/2209.14156
: “TVLT: Textless Vision-Language Transformer”, Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal -
https://arxiv.org/abs/2207.13532#bytedance
: “CMAE: Contrastive Masked Autoencoders Are Stronger Vision Learners”, Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng -
https://arxiv.org/abs/2207.06991
: “PIXEL: Language Modelling With Pixels”, Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott -
https://arxiv.org/abs/2207.06405#facebook
: “Masked Autoencoders That Listen”, Po-Yao, Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer -
https://arxiv.org/abs/2206.08356#facebook
: “OmniMAE: Single Model Masked Pretraining on Images and Videos”, Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh, Kalyan Vasudev Alwala, Arm, Joulin, Ishan Misra -
https://arxiv.org/abs/2205.14204#google
: “M3AE: Multimodal Masked Autoencoders Learn Transferable Representations”, Xinyang Geng, Hao Liu, Lisa Lee, Dale Schuurams, Sergey Levine, Pieter Abbeel -
https://arxiv.org/abs/2205.09113#facebook
: “Masked Autoencoders As Spatiotemporal Learners”, Christoph Feichtenhofer, Haoqi Fan, Yanghao Li, Kaiming He -
https://arxiv.org/abs/2204.14217#baai
: “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang -
https://arxiv.org/abs/2111.09886#microsoft
: “SimMIM: A Simple Framework for Masked Image Modeling”, Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu -
https://arxiv.org/abs/2111.06377#facebook
: “MAE: Masked Autoencoders Are Scalable Vision Learners”, Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick