Zero-Shot Text-to-Image Generation
Borisdayma/dalle-Mini: DALL·E-Mini
https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA
CogView: Mastering Text-to-Image Generation via Transformers
China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.
https://openai.com/dall-e-2
GPT-3: Language Models are Few-Shot Learners
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
Generating Diverse High-Fidelity Images with VQ-VAE-2
BPEs: Neural Machine Translation of Rare Words with Subword Units
GPT-3 Creative Fiction § BPEs
VQ-VAE: Neural Discrete Representation Learning
SGVB: Auto-Encoding Variational Bayes
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
Categorical Reparameterization with Gumbel-Softmax
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
The Unusual Effectiveness of Averaging in GAN Training
CLIP: Learning Transferable Visual Models From Natural Language Supervision
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
https://arxiv.org/pdf/2105.13290.pdf#page=8
CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3