CogView: Mastering Text-to-Image Generation via Transformers
https://agc.platform.baai.ac.cn/CogView/index.html
https://model.baai.ac.cn/model-detail/100041
MAE: Masked Autoencoders Are Scalable Vision Learners
Hierarchical Text-Conditional Image Generation with CLIP Latents
https://github.com/THUDM/CogView2