-
Zero-Shot Text-to-Image Generation
-
Borisdayma/dalle-Mini: DALL·E-Mini
-
https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA
-
CogView: Mastering Text-to-Image Generation via Transformers
-
China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.
-
https://openai.com/dall-e-2
-
GPT-3: Language Models are Few-Shot Learners
-
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
-
Generating Diverse High-Fidelity Images with VQ-VAE-2
-
BPEs: Neural Machine Translation of Rare Words with Subword Units
-
GPT-3 Creative Fiction § BPEs
-
VQ-VAE: Neural Discrete Representation Learning
-
Auto-Encoding Variational Bayes
-
Stochastic Backpropagation and Approximate Inference in Deep Generative Models
-
Categorical Reparameterization with Gumbel-Softmax
-
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
-
The Unusual Effectiveness of Averaging in GAN Training
-
CLIP: Learning Transferable Visual Models From Natural Language Supervision
-
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
-
https://arxiv.org/pdf/2105.13290.pdf#page=8
-
CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3
-