“NUWA-∞: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis”, 2022-07-20 ():
In this paper, we present NUWA-∞, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.
An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings.
Compared to DALL·E, Imagen and Parti, NUWA-∞ can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-∞ has superior visual synthesis capabilities in terms of resolution and variable-size generation.
The GitHub link is NUWA. The homepage link is https://nuwa-infinity.microsoft.com/.