“NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan2022-07-20 (, , )⁠:

In this paper, we present NUWA-∞, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings.

Compared to DALL·E, Imagen and Parti, NUWA-∞ can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-∞ has superior visual synthesis capabilities in terms of resolution and variable-size generation.

The GitHub link is NUWA. The homepage link is https://nuwa-infinity.microsoft.com/.