-
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
-
Attention Is All You Need
-
Generating Diverse High-Fidelity Images with VQ-VAE-2
-
Microsoft COCO: Common Objects in Context
-
DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language
-
M6: A Chinese Multimodal Pretrainer
-
https://x.com/ak92501/status/1398079706180763649
-
Text-To-Image Generation. The Repo for NeurIPS 2021 Paper "CogView: Mastering Text-To-Image Generation via Transformers".
-
WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
-
WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
-
CogView 以文生图
-
Controllable Generation from Pre-trained Language Models via Inverse Prompting
-
Self-distillation: Born Again Neural Networks
-
CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3
-