‘video generation’ directory

Gwern

‘video generation’ directory

Links

“One-Minute Video Generation With Test-Time Training ”, Dalal et al 2025

⁠One-Minute Video Generation with Test-Time Training⁠

“Video-T1: Test-Time Scaling for Video Generation ”, Liu et al 2025

⁠Video-T1: Test-Time Scaling for Video Generation⁠

“Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model ”, Ma et al 2025

⁠Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model⁠

“Kudzueye/boreal-Hl-V1: Boring Reality Hunyuan LoRA [De-Tuning] ”, kudzueye 2025

kudzueye/boreal-hl-v1: Boring Reality Hunyuan LoRA [de-tuning]⁠

“Do Generative Video Models Learn Physical Principles from Watching Videos? ”, Motamed et al 2025

⁠Do generative video models learn physical principles from watching videos?⁠

“AniDoc: Animation Creation Made Easier ”, Meng et al 2024

AniDoc: Animation Creation Made Easier⁠

“AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era ”, Jiang et al 2024

AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era⁠

“How Far Is Video Generation from World Model: A Physical Law Perspective ”, Kang et al 2024

⁠How Far is Video Generation from World Model: A Physical Law Perspective⁠

“Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion ”, Chen et al 2024

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion⁠

“SF-V: Single Forward Video Generation Model ”, Zhang et al 2024

SF-V: Single Forward Video Generation Model⁠

“ShareGPT4Video: Improving Video Understanding and Generation With Better Captions ”, Chen et al 2024

⁠ShareGPT4Video: Improving Video Understanding and Generation with Better Captions⁠

“ToonCrafter: Generative Cartoon Interpolation ”, Xing et al 2024

ToonCrafter: Generative Cartoon Interpolation⁠

“Sakuga-42M Dataset: Scaling Up Cartoon Research ”, Pan et al 2024

Sakuga-42M Dataset: Scaling Up Cartoon Research⁠

“VideoGigaGAN: Towards Detail-Rich Video Super-Resolution ”, Xu et al 2024

VideoGigaGAN: Towards Detail-rich Video Super-Resolution⁠

“Dynamic Typography: Bringing Text to Life via Video Diffusion Prior ”, Liu et al 2024

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior⁠

“VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time ”, Xu et al 2024

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time⁠

“CMD: Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition ”, Yu et al 2024

CMD: Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition⁠

“ZigMa: Zigzag Mamba Diffusion Model ”, Hu et al 2024

ZigMa: Zigzag Mamba Diffusion Model⁠

“Sora Generates Videos With Stunning Geometrical Consistency ”, Li et al 2024

⁠Sora Generates Videos with Stunning Geometrical Consistency⁠

“TF-T2V: A Recipe for Scaling up Text-To-Video Generation With Text-Free Videos ”, Wang et al 2023

TF-T2V: A Recipe for Scaling up Text-to-Video Generation with Text-free Videos⁠

“W.A.L.T: Photorealistic Video Generation With Diffusion Models ”, Gupta et al 2023

W.A.L.T: Photorealistic Video Generation with Diffusion Models⁠

“StyleCrafter: Enhancing Stylized Text-To-Video Generation With Style Adapter ”, Liu et al 2023

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter⁠

“MicroCinema: A Divide-And-Conquer Approach for Text-To-Video Generation ”, Wang et al 2023

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation⁠

“Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets ”, Blattmann et al 2023

⁠Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets⁠

“I2VGen-XL: High-Quality Image-To-Video Synthesis via Cascaded Diffusion Models ”, Zhang et al 2023

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models⁠

“Anime Rock, Paper, Scissors 2 ”, Digital 2023

⁠Anime Rock, Paper, Scissors 2⁠

“Where Memory Ends and Generative AI Begins: New Photo Manipulation Tools from Google and Adobe Are Blurring the Lines between Real Memories and Those Dreamed up by AI ”, Goode 2023

Where Memory Ends and Generative AI Begins: New photo manipulation tools from Google and Adobe are blurring the lines between real memories and those dreamed up by AI⁠

“Parsing-Conditioned Anime Translation: A New Dataset and Method ”, Li et al 2023c

Parsing-Conditioned Anime Translation: A New Dataset and Method⁠

“Animators React 11: Mulan, Aladdin, ‘Anime Rock Paper Scissors’ ”, Digital 2023

⁠Animators React 11: Mulan, Aladdin, ‘Anime Rock Paper Scissors’⁠

“Anime Rock, Paper, Scissors ”, Digital 2023

⁠Anime Rock, Paper, Scissors⁠

“Did We Just Change Animation Forever? § Making Of ”, Digital 2023

⁠Did We Just Change Animation Forever? § Making Of⁠

“Dreamix: Video Diffusion Models Are General Video Editors ”, Molad et al 2023

Dreamix: Video Diffusion Models are General Video Editors⁠

“OpenAI CEO Sam Altman on GPT-4: ‘People Are Begging to Be Disappointed and They Will Be’ ”, Vincent 2023

OpenAI CEO Sam Altman on GPT-4: ‘people are begging to be disappointed and they will be’⁠

“Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-To-Video Generation ”, Wu et al 2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation⁠

“MAGVIT: Masked Generative Video Transformer ”, Yu et al 2022

MAGVIT: Masked Generative Video Transformer⁠

“Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths ”, He et al 2022

Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths⁠

“AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies ”, Siyao et al 2022

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies⁠

“Imagen Video: High Definition Video Generation With Diffusion Models ”, Ho et al 2022

Imagen Video: High Definition Video Generation with Diffusion Models⁠

“Phenaki: Variable Length Video Generation From Open Domain Textual Description ”, Villegas et al 2022

Phenaki: Variable Length Video Generation From Open Domain Textual Description⁠

“Make-A-Video: Text-To-Video Generation without Text-Video Data ”, Singer et al 2022

Make-A-Video: Text-to-Video Generation without Text-Video Data⁠

“CelebV-HQ: A Large-Scale Video Facial Attributes Dataset ”, Zhu et al 2022

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset⁠

“InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images ”, Li et al 2022

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images⁠

“NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis ”, Wu et al 2022

NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis⁠

“OmniMAE: Single Model Masked Pretraining on Images and Videos ”, Girdhar et al 2022

OmniMAE: Single Model Masked Pretraining on Images and Videos⁠

“Cascaded Video Generation for Videos In-The-Wild ”, Castrejon et al 2022

Cascaded Video Generation for Videos In-the-Wild⁠

“CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers ”, Hong et al 2022

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers⁠

“Flexible Diffusion Modeling of Long Videos ”, Harvey et al 2022

Flexible Diffusion Modeling of Long Videos⁠

“Ethan Caballero on Private Scaling Progress ”, Caballero & Trazzi 2022

Ethan Caballero on Private Scaling Progress

“Video Diffusion Models ”, Ho et al 2022

Video Diffusion Models⁠

“TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer ”, Ge et al 2022

TATS: Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer⁠

“Reinforcement Learning With Action-Free Pre-Training from Videos ”, Seo et al 2022

Reinforcement Learning with Action-Free Pre-Training from Videos⁠

“Transframer: Arbitrary Frame Prediction With Generative Models ”, Nash et al 2022

Transframer: Arbitrary Frame Prediction with Generative Models⁠

“Diffusion Probabilistic Modeling for Video Generation ”, Yang et al 2022

Diffusion Probabilistic Modeling for Video Generation⁠

“General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR ”, Hawthorne et al 2022

General-purpose, long-context autoregressive modeling with Perceiver AR⁠

“Microdosing: Knowledge Distillation for GAN Based Compression ”, Helminger et al 2022

Microdosing: Knowledge Distillation for GAN based Compression⁠

“StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN-2 ”, Skorokhodov et al 2021

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN-2⁠

“U.S. vs. China Rivalry Boosts Tech—And Tensions: Militarized AI Threatens a New Arms Race ”, Smith 2021

U.S. vs. China Rivalry Boosts Tech—and Tensions: Militarized AI threatens a new arms race⁠

“NÜWA: Visual Synthesis Pre-Training for Neural VisUal World CreAtion ”, Wu et al 2021

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion⁠

“Advances in Neural Rendering ”, Tewari et al 2021

Advances in Neural Rendering⁠

“Learning a Perceptual Manifold With Deep Features for Animation Video Resequencing ”, Morace et al 2021

Learning a perceptual manifold with deep features for animation video resequencing⁠

“Autoregressive Latent Video Prediction With High-Fidelity Image Generator ”, Seo et al 2021

Autoregressive Latent Video Prediction with High-Fidelity Image Generator⁠

“FitVid: Overfitting in Pixel-Level Video Prediction ”, Babaeizadeh et al 2021

FitVid: Overfitting in Pixel-Level Video Prediction⁠

“Alias-Free Generative Adversarial Networks ”, Karras et al 2021

Alias-Free Generative Adversarial Networks⁠

“GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation (Works for Videos Too!) ”, Chong & Forsyth 2021

GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)⁠

“NWT: Towards Natural Audio-To-Video Generation With Representation Learning ”, Mama et al 2021

NWT: Towards natural audio-to-video generation with representation learning⁠

“Vector Quantized Models for Planning ”, Ozair et al 2021

Vector Quantized Models for Planning⁠

“GODIVA: Generating Open-DomaIn Videos from NAtural Descriptions ”, Wu et al 2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions⁠

“VideoGPT: Video Generation Using VQ-VAE and Transformers ”, Yan et al 2021

VideoGPT: Video Generation using VQ-VAE and Transformers⁠

“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model. ”, Synced 2021

China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) releases Wu Dao 1.0, China’s first large-scale pretraining model.

“Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction ”, Wu et al 2021

Greedy Hierarchical Variational Autoencoders (GHVAEs) for Large-Scale Video Prediction⁠

“CW-VAE: Clockwork Variational Autoencoders ”, Saxena et al 2021

CW-VAE: Clockwork Variational Autoencoders⁠

“Scaling Laws for Autoregressive Generative Modeling ”, Henighan et al 2020

Scaling Laws for Autoregressive Generative Modeling⁠

“SIREN: Implicit Neural Representations With Periodic Activation Functions ”, Sitzmann et al 2020

SIREN: Implicit Neural Representations with Periodic Activation Functions⁠

“NeRF: Representing Scenes As Neural Radiance Fields for View Synthesis ”, Mildenhall et al 2020

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis⁠

“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks ”, Villegas et al 2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks⁠

“Learning to Predict Without Looking Ahead: World Models Without Forward Prediction ”, Freeman et al 2019

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction⁠

“Learning to Predict Without Looking Ahead: World Models Without Forward Prediction [Blog] ”, Freeman et al 2019

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction [blog]⁠

“Scaling Autoregressive Video Models ”, Weissenborn et al 2019

Scaling Autoregressive Video Models⁠

“NoGAN: Decrappification, DeOldification, and Super Resolution ”, Antic et al 2019

NoGAN: Decrappification, DeOldification, and Super Resolution⁠

“Model-Based Reinforcement Learning for Atari ”, Kaiser et al 2019

Model-Based Reinforcement Learning for Atari⁠

“Parallel Multiscale Autoregressive Density Estimation ”, Reed et al 2017

Parallel Multiscale Autoregressive Density Estimation⁠

“VPN: Video Pixel Networks ”, Kalchbrenner et al 2016

VPN: Video Pixel Networks⁠

“THUDM/CogVideo: Text-To-Video Generation. The Repo for ICLR2023 Paper ‘CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers’ ”

THUDM/CogVideo: Text-to-video generation. The repo for ICLR2023 paper ‘CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers’⁠

“PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings ”

⁠PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings

“Flexible Diffusion Modeling of Long Videos ”

Flexible Diffusion Modeling of Long Videos

“Text2Bricks: Fine-Tuning Open-Sora in 1,000 GPU-Hours ”

⁠Text2Bricks: Fine-tuning Open-Sora in 1,000 GPU-Hours⁠ :

View HTML:

⁠/doc/www/wandb.ai/bdfdfcaa29b374da7fcca5ac091961f3538fb23f.html⁠

“EfficientZero: How It Works ”

⁠EfficientZero: How It Works⁠ :

View External Link:

⁠https://www.lesswrong.com/posts/mRwJce3npmzbKfxws/efficientzero-how-it-works⁠

“Scammers Are Creating Fake News Videos to Blackmail Victims ”

⁠Scammers Are Creating Fake News Videos to Blackmail Victims⁠ :

View External Link:

⁠https://www.wired.com/story/scammers-are-creating-fake-news-videos-to-blackmail-victims/⁠

Wikipedia

Viroid⁠ :

⁠https://en.wikipedia.org/wiki/Viroid⁠

Miscellaneous

Bibliography

https://arxiv.org/abs/2504.05298#nvidia: “One-Minute Video Generation With Test-Time Training ”⁠, Karan Dalal, Daniel Koceja, Gashon Hussein …, Jiarui Xu, Yue Zhao, Youjin Song, Shihao Han, Ka Chun Cheung, Jan Kautz, Carlos Guestrin, Tatsunori Hashimoto, Sanmi Koyejo, Yejin Choi⁠, ⁠Yu Sun, Xiaolong Wang
link-bibliography⁠
https://arxiv.org/abs/2502.10248#stepfun: “Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model ”⁠, Guoqing Ma, Haoyang Huang, Kun Yan …, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou⁠, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun⁠, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo Wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang⁠, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu⁠, Jie Yang⁠, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li⁠, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, Siqi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang
link-bibliography⁠
https://arxiv.org/abs/2501.09038#deepmind: “Do Generative Video Models Learn Physical Principles from Watching Videos? ”⁠, Saman Motamed, Laura Culp, Kevin Swersky …, Priyank Jaini, Robert Geirhos⁠
link-bibliography⁠
https://arxiv.org/abs/2405.07425: “Sakuga-42M Dataset: Scaling Up Cartoon Research ”⁠, Zhenglin Pan, Yu Zhu, Yuxuan Mu
link-bibliography⁠
https://arxiv.org/abs/2403.13802: “ZigMa: Zigzag Mamba Diffusion Model ”⁠, Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui …, Olga Grebenkova, Pingchuan Ma, Johannes Fischer⁠, Bjorn Ommer
link-bibliography⁠
https://arxiv.org/abs/2312.15770#alibaba: “TF-T2V: A Recipe for Scaling up Text-To-Video Generation With Text-Free Videos ”⁠, Xiang Wang, Shiwei Zhang, Hangjie Yuan …, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang
link-bibliography⁠
https://arxiv.org/abs/2311.18829#microsoft: “MicroCinema: A Divide-And-Conquer Approach for Text-To-Video Generation ”⁠, Yanhui Wang, Jianmin Bao, Wenming Weng …, Ruoyu Feng, Dacheng Yin, Tao Yang⁠, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang⁠, Kai Qiu, Yuhui Yuan, Xiaoyan Sun, Chong Luo, Baining Guo
link-bibliography⁠
https://arxiv.org/abs/2311.04145#alibaba: “I2VGen-XL: High-Quality Image-To-Video Synthesis via Cascaded Diffusion Models ”⁠, Shiwei Zhang, Jiayu Wang, Yingya Zhang …, Kang Zhao⁠, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou
link-bibliography⁠
https://arxiv.org/abs/2302.01329#google: “Dreamix: Video Diffusion Models Are General Video Editors ”⁠, Eyal Molad, Eliahu Horwitz, Dani Valevski …, Alex Rav Acha, Yossi Matias⁠, Yael Pritch, Yaniv Leviathan, Yedid Hoshen
link-bibliography⁠
https://www.theverge.com/23560328/openai-gpt-4-rumor-release-date-sam-altman-interview: “OpenAI CEO Sam Altman on GPT-4: ‘People Are Begging to Be Disappointed and They Will Be’ ”⁠, James Vincent
link-bibliography⁠
https://arxiv.org/abs/2212.05199#google: “MAGVIT: Masked Generative Video Transformer ”⁠, Lijun Yu, Yong Cheng, Kihyuk Sohn …, José Lezama, Han Zhang⁠, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao⁠, Irfan Essa⁠, Lu Jiang
link-bibliography⁠
https://arxiv.org/abs/2207.09814#microsoft: “NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis ”⁠, Chenfei Wu, Jian Liang, Xiaowei Hu …, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu⁠, Yuejian Fang, Nan Duan
link-bibliography⁠
https://arxiv.org/abs/2206.08356#facebook: “OmniMAE: Single Model Masked Pretraining on Images and Videos ”⁠, Rohit Girdhar, Alaaeldin El-Nouby, Mannat Singh …, Kalyan Vasudev Alwala, Armand Joulin⁠, Ishan Misra
link-bibliography⁠
https://arxiv.org/abs/2205.15868: “CogVideo: Large-Scale Pretraining for Text-To-Video Generation via Transformers ”⁠, Wenyi Hong, Ming Ding, Wendi Zheng …, Xinghan Liu, Jie Tang⁠
link-bibliography⁠
https://theinsideview.ai/ethan: “Ethan Caballero on Private Scaling Progress ”, Ethan Caballero, Michaël Trazzi
link-bibliography⁠
https://arxiv.org/abs/2204.03638#facebook: “TATS: Long Video Generation With Time-Agnostic VQGAN and Time-Sensitive Transformer ”⁠, Songwei Ge, Thomas Hayes, Harry Yang …, Xi Yin⁠, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh⁠
link-bibliography⁠
https://arxiv.org/abs/2202.07765#deepmind: “General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR ”⁠, Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea …, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski⁠, Sander Dieleman, Oriol Vinyals⁠, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel
link-bibliography⁠
https://arxiv.org/abs/2112.14683: “StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN-2 ”⁠, Ivan Skorokhodov⁠, Sergey Tulyakov, Mohamed Elhoseiny
link-bibliography⁠
https://spectrum.ieee.org/china-us-militarized-ai: “U.S. vs. China Rivalry Boosts Tech—And Tensions: Militarized AI Threatens a New Arms Race ”⁠, Craig S. Smith⁠
link-bibliography⁠
https://arxiv.org/abs/2106.04615#deepmind: “Vector Quantized Models for Planning ”⁠, Sherjil Ozair, Yazhe Li, Ali Razavi …, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals⁠
link-bibliography⁠
https://arxiv.org/abs/2104.10157: “VideoGPT: Video Generation Using VQ-VAE and Transformers ”⁠, Wilson Yan, Yunzhi Zhang, Pieter Abbeel⁠, Aravind Srinivas⁠
link-bibliography⁠
https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/#baai: “China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model. ”, Synced
link-bibliography⁠
https://arxiv.org/abs/2010.14701#openai: “Scaling Laws for Autoregressive Generative Modeling ”⁠, Tom Henighan, Jared Kaplan, Mor Katz …, ⁠Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown⁠, ⁠Prafulla Dhariwal, Scott Gray⁠, Chris Hallacy, Benjamin Mann, Alec Radford⁠, Aditya A. Ramesh⁠, Nick Ryder, Daniel M. Ziegler, ⁠John Schulman, Dario Amodei⁠, Sam McCandlish⁠
link-bibliography⁠