Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
MAR: Autoregressive Image Generation without Vector Quantization
STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Rejuvenating image-GPT as Strong Visual Representation Learners
Artificial intelligence and art: Identifying the esthetic judgment factors that distinguish human & machine-generated artwork
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Emojich—zero-shot emoji generation using Russian language: a technical report
LAFITE: Towards Language-Free Training for Text-to-Image Generation
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
What Users Want? WARHOL: A Generative Model for Recommendation
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
CogView: Mastering Text-to-Image Generation via Transformers
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.
DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language
Text-to-Image Generation Grounded by Fine-Grained User Attention
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
Kingnobro/IconShop: (SIGGRAPH Asia 2023) Code of "IconShop: Text-Guided Vector Icon Synthesis With Autoregressive Transformers"
The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input
https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA
https%253A%252F%252Farxiv.org%252Fabs%252F2405.09818%2523facebook.html
Visual Autoregressive Modeling (VAR): Scalable Image Generation via Next-Scale Prediction
https%253A%252F%252Farxiv.org%252Fabs%252F2404.02905%2523bytedance.html
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Fdall-e%252F1%252F2023-wu-2.pdf.html
Rejuvenating image-GPT as Strong Visual Representation Learners
Artificial intelligence and art: Identifying the esthetic judgment factors that distinguish human & machine-generated artwork
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Fdall-e%252F1%252F2023-samo.pdf.html
VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
https%253A%252F%252Farxiv.org%252Fabs%252F2301.02111%2523microsoft.html
https%253A%252F%252Farxiv.org%252Fabs%252F2211.12561%2523facebook.html
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2204.14217%2523baai.html
https%253A%252F%252Farxiv.org%252Fabs%252F2201.07520%2523facebook.html
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
https%253A%252F%252Farxiv.org%252Fabs%252F2112.15283%2523baidu.html
Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters
https%253A%252F%252Fen.pingwest.com%252Fa%252F8693%2523baai.html
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
https%253A%252F%252Farxiv.org%252Fabs%252F2105.14211%2523alibaba.html
CogView: Mastering Text-to-Image Generation via Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2105.13290%2523baai.html
China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.
https%253A%252F%252Fsyncedreview.com%252F2021%252F03%252F23%252Fchinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0%252F%2523baai.html
DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language
https%253A%252F%252Fopenai.com%252Fresearch%252Fdall-e.html
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
https%253A%252F%252Farxiv.org%252Fabs%252F2009.11278%2523allen.html
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252Fdall-e%252F1%252F2020-chen-2.pdf%2523openai.html
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
https%253A%252F%252Fopenai.com%252Findex%252Fimage-gpt%252F.html
The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism
https%253A%252F%252Fwww.technologyreview.com%252F2020%252F02%252F17%252F844721%252Fai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality%252F.html
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
%252Fdoc%252Fai%252Fnn%252Fdiffusion%252F2018-sharma.pdf%2523google.html
Wikipedia Bibliography: