- See Also
-
Links
- “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Et Al 2023
- “Retrieval-Augmented Multimodal Language Modeling”, Et Al 2022
- “3DALL·E: Integrating Text-to-Image AI in 3D Design Workflows”, Et Al 2022
- “DALL·E-Bot: Introducing Web-Scale Diffusion Models to Robotics”, Et Al 2022
- “DALL·E Now Available Without Waitlist”, OpenAI 2022
- “Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Et Al 2022
- “Adversarial Attacks on Image Generation With Made-Up Words”, 2022
- “Vector Quantized Image-to-Image Translation”, Et Al 2022
- “NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, Et Al 2022
- “Training Transformers Together”, Et Al 2022
- “Draft-and-Revise: Effective Image Generation With Contextual RQ-Transformer”, Et Al 2022
- “Compositional Visual Generation With Composable Diffusion Models”, Et Al 2022
- “CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, Et Al 2022
- “Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, Et Al 2022
- “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Et Al 2022
- “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Et Al 2022 (page 16 Org Openai)
- “Hierarchical Text-Conditional Image Generation With CLIP Latents”, Et Al 2022
- “Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Et Al 2022
- “MaskGIT: Masked Generative Image Transformer”, Et Al 2022
- “DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, Et Al 2022
- “CM3: A Causal Masked Multimodal Model of the Internet”, Et Al 2022
- “Medical Domain Knowledge in Domain-agnostic Generative AI”, Et Al 2022
- “ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Et Al 2021
- “GLIDE: Towards Photorealistic Image Generation and Editing With Text-Guided Diffusion Models”, Et Al 2021
- “Emojich—zero-shot Emoji Generation Using Russian Language: a Technical Report”, Et Al 2021
- “LAFITE: Towards Language-Free Training for Text-to-Image Generation”, Et Al 2021
- “NÜWA: Visual Synthesis Pre-training for Neural VisUal World CreAtion”, Et Al 2021
- “L-Verse: Bidirectional Generation Between Image and Text”, Et Al 2021
- “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Et Al 2021
- “Telling Creative Stories Using Generative Visual Aids”, 2021
- “Unifying Multimodal Transformer for Bi-directional Image and Text Generation”, Et Al 2021
- “Illiterate DALL·E Learns to Compose”, Et Al 2021
- “What Users Want? WARHOL: A Generative Model for Recommendation”, Et Al 2021
- “On the Opportunities and Risks of Foundation Models”, Et Al 2021
- “ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Et Al 2021
- “Decision Transformer: Reinforcement Learning via Sequence Modeling”, Et Al 2021
- “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, 2021
- “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Et Al 2021
- “CogView: Mastering Text-to-Image Generation via Transformers”, Et Al 2021
- “GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Et Al 2021
- “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Et Al 2021
- “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-scale Pretraining Model.”, 2021
- “Paint by Word”, Et Al 2021
- “Generating Images With Sparse Representations”, Et Al 2021
- “M6: A Chinese Multimodal Pretrainer”, Et Al 2021
- “Zero-Shot Text-to-Image Generation”, Et Al 2021
- “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Et Al 2021
- “Taming Transformers for High-Resolution Image Synthesis”, Et Al 2020
- “Text-to-Image Generation Grounded by Fine-Grained User Attention”, Et Al 2020
- “X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Et Al 2020
- “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Et Al 2020
- “IGPT: Generative Pretraining from Pixels”, Et Al 2020
- “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
- “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning”, Et Al 2018
- “Image Transformer”, Et Al 2018
- “VQ-VAE: Neural Discrete Representation Learning”, Et Al 2017
- “Categorical Reparameterization With Gumbel-Softmax”, Et Al 2016
- “The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Et Al 2016
- “The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, Et Al 2023
“VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers”, 2023-01-05 ( ; similar; bibliography)
“Retrieval-Augmented Multimodal Language Modeling”, Et Al 2022
“Retrieval-Augmented Multimodal Language Modeling”, 2022-11-22 ( ; similar; bibliography)
“3DALL·E: Integrating Text-to-Image AI in 3D Design Workflows”, Et Al 2022
“3DALL·E: Integrating Text-to-Image AI in 3D Design Workflows”, 2022-10-20 ( ; similar)
“DALL·E-Bot: Introducing Web-Scale Diffusion Models to Robotics”, Et Al 2022
“DALL·E-Bot: Introducing Web-Scale Diffusion Models to Robotics”, 2022-10-05 ( ; similar)
“DALL·E Now Available Without Waitlist”, OpenAI 2022
“DALL·E Now Available Without Waitlist”, 2022-09-28 (similar; bibliography)
“Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Et Al 2022
“Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning”, 2022-08-18 ( ; similar; bibliography)
“Adversarial Attacks on Image Generation With Made-Up Words”, 2022
“Adversarial Attacks on Image Generation With Made-Up Words”, 2022-08-04 ( ; similar)
“Vector Quantized Image-to-Image Translation”, Et Al 2022
“Vector Quantized Image-to-Image Translation”, 2022-07-27 ( ; similar)
“NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, Et Al 2022
“NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, 2022-07-20 ( ; similar; bibliography)
“Training Transformers Together”, Et Al 2022
“Training Transformers Together”, 2022-07-07 ( ; backlinks; similar)
“Draft-and-Revise: Effective Image Generation With Contextual RQ-Transformer”, Et Al 2022
“Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer”, 2022-06-09 ( ; similar)
“Compositional Visual Generation With Composable Diffusion Models”, Et Al 2022
“Compositional Visual Generation with Composable Diffusion Models”, 2022-06-03 ( ; similar)
“CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, Et Al 2022
“CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, 2022-05-29 ( ; similar; bibliography)
“Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, Et Al 2022
“Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”, 2022-05-23 ( ; similar; bibliography)
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Et Al 2022
“CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, 2022-04-28 ( ; similar; bibliography)
“DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Et Al 2022 (page 16 Org Openai)
“DALL·E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents § 7. Limitations and Risks”, 2022-04-13 ( ; similar; bibliography)
“Hierarchical Text-Conditional Image Generation With CLIP Latents”, Et Al 2022
“Hierarchical Text-Conditional Image Generation with CLIP Latents”, 2022-04-13 (similar)
“Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Et Al 2022
“Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors”, 2022-03-24 ( ; similar; bibliography)
“MaskGIT: Masked Generative Image Transformer”, Et Al 2022
“MaskGIT: Masked Generative Image Transformer”, 2022-02-08 ( ; similar)
“DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, Et Al 2022
“DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, 2022-02-08 ( ; similar)
“CM3: A Causal Masked Multimodal Model of the Internet”, Et Al 2022
“CM3: A Causal Masked Multimodal Model of the Internet”, 2022-01-19 ( ; similar)
“Medical Domain Knowledge in Domain-agnostic Generative AI”, Et Al 2022
“Medical domain knowledge in domain-agnostic generative AI”, 2022-01-11 (similar)
“ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Et Al 2021
“ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, 2021-12-31 ( ; similar; bibliography)
“GLIDE: Towards Photorealistic Image Generation and Editing With Text-Guided Diffusion Models”, Et Al 2021
“GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models”, 2021-12-20 ( ; similar)
“Emojich—zero-shot Emoji Generation Using Russian Language: a Technical Report”, Et Al 2021
“Emojich—zero-shot emoji generation using Russian language: a technical report”, 2021-12-04 (similar)
“LAFITE: Towards Language-Free Training for Text-to-Image Generation”, Et Al 2021
“LAFITE: Towards Language-Free Training for Text-to-Image Generation”, 2021-11-27 ( ; similar)
“NÜWA: Visual Synthesis Pre-training for Neural VisUal World CreAtion”, Et Al 2021
“NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion”, 2021-11-24 ( ; similar)
“L-Verse: Bidirectional Generation Between Image and Text”, Et Al 2021
“L-Verse: Bidirectional Generation Between Image and Text”, 2021-11-22 ( ; backlinks; similar; bibliography)
“LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, Et Al 2021
“LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, 2021-11-03 ( ; similar; bibliography)
“Telling Creative Stories Using Generative Visual Aids”, 2021
“Telling Creative Stories Using Generative Visual Aids”, 2021-10-27 ( ; similar)
“Unifying Multimodal Transformer for Bi-directional Image and Text Generation”, Et Al 2021
“Unifying Multimodal Transformer for Bi-directional Image and Text Generation”, 2021-10-19 (backlinks; similar)
“Illiterate DALL·E Learns to Compose”, Et Al 2021
“Illiterate DALL·E Learns to Compose”, 2021-10-17 ( ; similar)
“What Users Want? WARHOL: A Generative Model for Recommendation”, Et Al 2021
“What Users Want? WARHOL: A Generative Model for Recommendation”, 2021-09-02 ( ; similar)
“On the Opportunities and Risks of Foundation Models”, Et Al 2021
“On the Opportunities and Risks of Foundation Models”, 2021-08-16 ( ; backlinks; similar; bibliography)
“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, Et Al 2021
“ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation”, 2021-06-10 ( ; similar)
“Decision Transformer: Reinforcement Learning via Sequence Modeling”, Et Al 2021
“Decision Transformer: Reinforcement Learning via Sequence Modeling”, 2021-06-02 ( ; backlinks; similar; bibliography)
“Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, 2021
“Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters”, 2021-06-01 ( ; similar; bibliography)
“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Et Al 2021
“M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, 2021-05-29 (similar; bibliography)
“CogView: Mastering Text-to-Image Generation via Transformers”, Et Al 2021
“CogView: Mastering Text-to-Image Generation via Transformers”, 2021-05-26 (similar; bibliography)
“GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions”, Et Al 2021
“GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions”, 2021-04-30 ( ; similar)
“VideoGPT: Video Generation Using VQ-VAE and Transformers”, Et Al 2021
“VideoGPT: Video Generation using VQ-VAE and Transformers”, 2021-04-20 ( ; backlinks; similar; bibliography)
“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-scale Pretraining Model.”, 2021
“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.”, 2021-03-23 ( ; similar; bibliography)
“Paint by Word”, Et Al 2021
“Paint by Word”, 2021-03-19 ( ; similar)
“Generating Images With Sparse Representations”, Et Al 2021
“Generating Images with Sparse Representations”, 2021-03-05 (similar)
“M6: A Chinese Multimodal Pretrainer”, Et Al 2021
“M6: A Chinese Multimodal Pretrainer”, 2021-03-01 ( ; similar)
“Zero-Shot Text-to-Image Generation”, Et Al 2021
“Zero-Shot Text-to-Image Generation”, 2021-02-24 ( ; similar)
“DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, Et Al 2021
“DALL·E 1: Creating Images from Text: We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language”, 2021-01-05 ( ; backlinks; similar; bibliography)
“Taming Transformers for High-Resolution Image Synthesis”, Et Al 2020
“Taming Transformers for High-Resolution Image Synthesis”, 2020-12-17 ( ; backlinks; similar)
“Text-to-Image Generation Grounded by Fine-Grained User Attention”, Et Al 2020
“Text-to-Image Generation Grounded by Fine-Grained User Attention”, 2020-11-07 (similar)
“X-LXMERT: Paint, Caption and Answer Questions With Multi-Modal Transformers”, Et Al 2020
“X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers”, 2020-09-23 (similar)
“Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Et Al 2020
“Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples”, 2020-06-17 ( ; backlinks; similar; bibliography)
“IGPT: Generative Pretraining from Pixels”, Et Al 2020
“iGPT: Generative Pretraining from Pixels”, 2020-06-17 ( ; similar; bibliography)
“The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, 2020
“The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism”, 2020-02-17 ( ; backlinks; similar; bibliography)
“Image Transformer”, Et Al 2018
“Image Transformer”, 2018-02-15 ( ; similar)
“VQ-VAE: Neural Discrete Representation Learning”, Et Al 2017
“VQ-VAE: Neural Discrete Representation Learning”, 2017-11-02 ( ; similar)
“Categorical Reparameterization With Gumbel-Softmax”, Et Al 2016
“Categorical Reparameterization with Gumbel-Softmax”, 2016-11-03 ( ; similar)
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, Et Al 2016
“The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables”, 2016-11-02 ( ; similar)
“The Little Red Boat Story (Make-A-Scene): Our Own Model Was Used to Generate All the Images in the Story, by Providing a Text and Simple Sketch Input”
Wikipedia
Miscellaneous
-
“
/idea
”, (backlinks; bibliography) -
http://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book.pdf
-
https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design
-
https://bakztfuture.substack.com/p/dall-e-2-emerging-content-category
-
https://bakztfuture.substack.com/p/dall-e-2-recombinant-art-and-design
-
https://bakztfuture.substack.com/p/dall-e-2-unofficial-natural-language-b14
-
https://colab.research.google.com/drive/1Gg7-c7LrUTNfQ-Fk-BVNCe9kvedZZsAh
-
https://colab.research.google.com/drive/1Tb7J4PvvegWOybPfUubl5O7m5I24CBg5
-
https://colab.research.google.com/github/ouhenio/minDALL-E_notebook/blob/main/minDALLE.ipynb
-
https://docs.google.com/document/d/11WlzjBT0xRpQhP9tFMtxzd0q6ANIdHPUBkMV-YB043U/edit
-
https://jacobmartins.com/posts/how-i-used-dalle2-to-generate-the-logo-for-octosql/
-
https://medium.com/@liam.d.eloie/pok%C3%A9mon-ai-gotta-create-em-all-3e92915fa3ad
-
https://mirror.xyz/herndondryhurst.eth/eZG6mucl9fqU897XvJs0vUUMnm5OITpSWN8S-6KWamY
-
https://nitter.moomoo.me/CRUORMOR/status/1468449934668177411
-
https://nitter.moomoo.me/MichaelFriese10/status/1464488317479636997
-
https://nitter.moomoo.me/MichaelFriese10/status/1478574471950733314
-
https://nitter.moomoo.me/NerdyRodent/status/1456659357941305348
-
https://nitter.moomoo.me/NerdyRodent/status/1456735116651335680
-
https://nitter.moomoo.me/NerdyRodent/status/1456746809154494478
-
https://nitter.moomoo.me/NerdyRodent/status/1456922505411710985
-
https://nitter.moomoo.me/NerdyRodent/status/1456984909873139720
-
https://nitter.moomoo.me/NerdyRodent/status/1457045379732721669
-
https://nitter.moomoo.me/NerdyRodent/status/1459298859717386245
-
https://nitter.moomoo.me/NerdyRodent/status/1463164183877349376
-
https://nitter.moomoo.me/SatsumaAudio/status/1550472950847098885
-
https://nitter.moomoo.me/apeoffire/status/1467432320193994756
-
https://nitter.moomoo.me/apeoffire/status/1467481904169553925
-
https://nitter.moomoo.me/bakztfuture/status/1543992740207136768
-
https://nitter.moomoo.me/benjamin_hilton/status/1529510695452164097
-
https://nitter.moomoo.me/bneyshabur/status/1529506103708602369
-
https://nitter.moomoo.me/borisdayma/status/1523777264517001216
-
https://nitter.moomoo.me/dbonneville/status/1522453742095900672
-
https://nitter.moomoo.me/mattgroh/status/1513837678172778498
-
https://nitter.moomoo.me/nickcammarata/status/1511861061988892675
-
https://nitter.moomoo.me/paultrillo/status/1547274303552438274
-
https://nitter.moomoo.me/paultrillo/status/1550551780408209408
-
https://nitter.moomoo.me/paultrillo/status/1564738511932076033
-
https://nitter.moomoo.me/raphaelmilliere/status/1529101851915952128
-
https://old.reddit.com/r/MachineLearning/comments/qmzy8a/rudalle_model_is_opensource_p/
-
https://old.reddit.com/r/MachineLearning/comments/vx89nj/p_dalle_mini_mega_demo_and_production_api/
-
https://old.reddit.com/r/MediaSynthesis/comments/rc9ft8/nsfw_rudalle_texttoimage_model_finetuned_to/
-
https://old.reddit.com/r/MediaSynthesis/comments/rxpz4d/an_experiment_with_openais_glide_apples_in/
-
https://old.reddit.com/r/bigsleep/comments/ql9n81/new_texttoimage_ai_models_rudalle_example_from/
-
https://old.reddit.com/r/dalle2/comments/u79ut4/david_schnurr_dschnurr_inpainting_with_dalle_2_is/
-
https://old.reddit.com/r/dalle2/comments/ub0sfg/dalle_2_imitation_game_results_check_sticky_for/
-
https://old.reddit.com/r/dalle2/comments/ueizwz/i_printed_a_dalle_generated_childrens_book_about/
-
https://old.reddit.com/r/dalle2/comments/uwb3cz/the_first_image_in_this_video_was_created_from/
-
https://old.reddit.com/r/dalle2/comments/uxpl87/a_photograph_of_a_street_sign_that_warns_drivers/
-
https://old.reddit.com/r/dalle2/comments/v1swsh/how_to_use_edit_mode_aka_inpainting_to_change/
-
https://old.reddit.com/r/pokemon/comments/rgmyxp/i_trained_an_ai_on_all_the_official_pokemon/
-
https://openai.com/blog/dall-e-api-now-available-in-public-beta/
-
https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-e-2-can-and-cannot-do
-
https://www.mattbell.us/my-fake-dall-e-2-vacation-photos-passed-the-turing-test/
Link Bibliography
-
https://arxiv.org/abs/2301.02111#microsoft
: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers”, : -
https://arxiv.org/abs/2211.12561#facebook
: “Retrieval-Augmented Multimodal Language Modeling”, : -
https://openai.com/blog/dall-e-now-available-without-waitlist/
: “DALL·E Now Available Without Waitlist”, OpenAI: -
https://arxiv.org/abs/2208.08831#deepmind
: “Discovering Bugs in Vision Models Using Off-the-shelf Image Generation and Captioning”, Olivia Wiles, Isabela Albuquerque, Sven Gowal: -
https://arxiv.org/abs/2207.09814#microsoft
: “NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan: -
https://arxiv.org/abs/2205.15868
: “CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers”, Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang: -
https://arxiv.org/abs/2205.11487#google
: “Imagen: Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding”, : -
https://arxiv.org/abs/2204.14217#baai
: “CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers”, Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang: -
https://arxiv.org/pdf/2204.06125.pdf#page=16&org=openai
: “DALL·E 2: Hierarchical Text-Conditional Image Generation With CLIP Latents § 7. Limitations and Risks”, Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen: -
https://arxiv.org/abs/2203.13131#facebook
: “Make-A-Scene: Scene-Based Text-to-Image Generation With Human Priors”, Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, Yaniv Taigman: -
https://arxiv.org/abs/2112.15283#baidu
: “ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation”, Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang: -
https://arxiv.org/abs/2111.11133
: “L-Verse: Bidirectional Generation Between Image and Text”, : -
https://arxiv.org/abs/2111.02114#laion
: “LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs”, : -
https://arxiv.org/abs/2108.07258
: “On the Opportunities and Risks of Foundation Models”, : -
https://sites.google.com/berkeley.edu/decision-transformer
: “Decision Transformer: Reinforcement Learning via Sequence Modeling”, : -
https://en.pingwest.com/a/8693#baai
: “Chinese AI Lab Challenges Google, OpenAI With a Model of 1.75 Trillion Parameters”, Chen Du: -
https://arxiv.org/abs/2105.14211#alibaba
: “M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis”, Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang: -
https://arxiv.org/abs/2105.13290#baai
: “CogView: Mastering Text-to-Image Generation via Transformers”, : -
https://arxiv.org/abs/2104.10157
: “VideoGPT: Video Generation Using VQ-VAE and Transformers”, Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas: -
https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/#baai
: “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-scale Pretraining Model.”, Synced: -
https://openai.com/blog/dall-e/
: “DALL·E 1: Creating Images from Text: We’ve Trained a Neural Network Called DALL·E That Creates Images from Text Captions for a Wide Range of Concepts Expressible in Natural Language”, : -
https://openai.com/blog/image-gpt/
: “Image GPT (iGPT): We Find That, Just As a Large Transformer Model Trained on Language Can Generate Coherent Text, the Same Exact Model Trained on Pixel Sequences Can Generate Coherent Image Completions and Samples”, Mark Chen, Alec Radford, Ilya Sutskever: -
2020-chen.pdf#openai
: “IGPT: Generative Pretraining from Pixels”, Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever: -
https://www.technologyreview.com/2020/02/17/844721/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/
: “The Messy, Secretive Reality behind OpenAI’s Bid to save the World: The AI Moonshot Was Founded in the Spirit of Transparency. This Is the inside Story of How Competitive Pressure Eroded That Idealism”, Karen Hao: -
2018-sharma.pdf#google
: “Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning”, Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut: