- See Also
-
Links
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Et Al 2023
- “Retrieval-Augmented Multimodal Language Modeling”, Et Al 2022
- “Draft-and-Revise: Effective Image Generation With Contextual RQ-Transformer”, Et Al 2022
- “CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, Et Al 2022
- “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Et Al 2022
- “MaskGIT: Masked Generative Image Transformer”, Et Al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, Et Al 2022
- “ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Et Al 2021
- “Emojich—zero-shot Emoji Generation Using Russian Language: a Technical Report”, Et Al 2021
- “LAFITE: Towards Language-Free Training for Text-to-Image Generation”, Et Al 2021
- “NÜWA: Visual Synthesis Pre-training for Neural VisUal World CreAtion”, Et Al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, Et Al 2021
- “Telling Creative Stories Using Generative Visual Aids”, 2021
- “Unifying Multimodal Transformer for Bi-directional Image and Text Generation”, Et Al 2021
- “Illiterate DALL·E Learns to Compose”, Et Al 2021
- “What Users Want? WARHOL: A Generative Model for Recommendation”, Et Al 2021
- “ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Et Al 2021
- “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, 2021
- “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Et Al 2021
- “CogView: Mastering Text-to-Image Generation via Transformers”, Et Al 2021
- “GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Et Al 2021
- “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Et Al 2021
- “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-scale Pretraining Model.”, 2021
- “Paint by Word”, Et Al 2021
- “Generating Images With Sparse Representations”, Et Al 2021
- “M6: A Chinese Multimodal Pretrainer”, Et Al 2021
- “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Et Al 2021
- “Taming Transformers for High-Resolution Image Synthesis”, Et Al 2020
- “Text-to-Image Generation Grounded by Fine-Grained User Attention”, Et Al 2020
- “X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Et Al 2020
- “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Et Al 2020
- “IGPT: Generative Pretraining from Pixels”, Et Al 2020
- “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
- “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning”, Et Al 2018
- “Image Transformer”, Et Al 2018
- “VQ-VAE: Neural Discrete Representation Learning”, Et Al 2017
- “Categorical Reparameterization With Gumbel-Softmax”, Et Al 2016
- “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Et Al 2016
- “The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
- Miscellaneous
- Link Bibliography
See Also
Links
“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Et Al 2023
“VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers”
“Retrieval-Augmented Multimodal Language Modeling”, Et Al 2022
“Draft-and-Revise: Effective Image Generation With Contextual RQ-Transformer”, Et Al 2022
“Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer”
“CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, Et Al 2022
“CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Et Al 2022
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”
“MaskGIT: Masked Generative Image Transformer”, Et Al 2022
“CM3: A Causal Masked Multimodal Model of the Internet”, Et Al 2022
“ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Et Al 2021
“ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”
“Emojich—zero-shot Emoji Generation Using Russian Language: a Technical Report”, Et Al 2021
“Emojich—zero-shot emoji generation using Russian language: a technical report”
“LAFITE: Towards Language-Free Training for Text-to-Image Generation”, Et Al 2021
“LAFITE: Towards Language-Free Training for Text-to-Image Generation”
“NÜWA: Visual Synthesis Pre-training for Neural VisUal World CreAtion”, Et Al 2021
“NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion”
“L-Verse: Bidirectional Generation Between Image and Text”, Et Al 2021
“Telling Creative Stories Using Generative Visual Aids”, 2021
“Unifying Multimodal Transformer for Bi-directional Image and Text Generation”, Et Al 2021
“Unifying Multimodal Transformer for Bi-directional Image and Text Generation”
“Illiterate DALL·E Learns to Compose”, Et Al 2021
“What Users Want? WARHOL: A Generative Model for Recommendation”, Et Al 2021
“What Users Want? WARHOL: A Generative Model for Recommendation”
“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Et Al 2021
“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”
“Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, 2021
“Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters”
“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Et Al 2021
“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”
“CogView: Mastering Text-to-Image Generation via Transformers”, Et Al 2021
“CogView: Mastering Text-to-Image Generation via Transformers”
“GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Et Al 2021
“GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions”
“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Et Al 2021
“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-scale Pretraining Model.”, 2021
“Paint by Word”, Et Al 2021
“Generating Images With Sparse Representations”, Et Al 2021
“M6: A Chinese Multimodal Pretrainer”, Et Al 2021
“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Et Al 2021
“Taming Transformers for High-Resolution Image Synthesis”, Et Al 2020
“Text-to-Image Generation Grounded by Fine-Grained User Attention”, Et Al 2020
“Text-to-Image Generation Grounded by Fine-Grained User Attention”
“X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Et Al 2020
“X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers”
“Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Et Al 2020
“IGPT: Generative Pretraining from Pixels”, Et Al 2020
“The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
“Image Transformer”, Et Al 2018
“VQ-VAE: Neural Discrete Representation Learning”, Et Al 2017
“Categorical Reparameterization With Gumbel-Softmax”, Et Al 2016
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Et Al 2016
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”
“The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2301.02111#microsoft
: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, -
https://arxiv.org/abs/2211.12561#facebook
: “Retrieval-Augmented Multimodal Language Modeling”, -
https://arxiv.org/abs/2205.15868
: “CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang -
https://arxiv.org/abs/2204.14217#baai
: “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang -
https://arxiv.org/abs/2112.15283#baidu
: “ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang -
https://arxiv.org/abs/2111.11133
: “L-Verse: Bidirectional Generation Between Image and Text”, -
https://en.pingwest.com/a/8693#baai
: “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Chen Du -
https://arxiv.org/abs/2105.14211#alibaba
: “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang -
https://arxiv.org/abs/2105.13290#baai
: “CogView: Mastering Text-to-Image Generation via Transformers”, -
https://arxiv.org/abs/2104.10157
: “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas -
https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/#baai
: “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-scale Pretraining Model.”, Synced -
https://openai.com/research/dall-e
: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, -
https://openai.com/research/image-gpt
: “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Mark Chen, Alec Radford, Ilya Sutskever -
2020-chen-2.pdf#openai
: “IGPT: Generative Pretraining from Pixels”, Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever -
https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/
: “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Karen Hao -
2018-sharma.pdf#google
: “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning”, Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut