“Image Generators With Conditionally-Independent Pixel Synthesis”, Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, Denis Korzhenkov2020-11-27 (, ; backlinks; similar)⁠:

[cf. Stanley2007/Mordvintsev et al 2018/Ha2016 on CPPNs, COCO-GAN, NeRF] Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner.

Here, we present a new architecture for image generators [Conditionally-Independent Pixel Synthesis (CIPS)], where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. [A single-pixel GAN!] No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis…In total, its backbone contains 15 fully-connected layers [45.9m parameters]. [Otherwise StyleGAN 2-like—note StyleGAN was already unusual for its deep stack of MLP layers processing the z → w embedding.]

We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators.

We also investigate several interesting properties unique to the new architecture. [It can take in arbitrary-shaped images like cylinders & generate arbitrary-sized or interpolated images, while using fixed memory per pixel in both training/sampling, and anytime (‘foveated’) results.]

Figure 2: The Conditionally-Independent Pixel Synthesis (CIPS) generator architecture. Top: the generation pipeline, in which the coordinates (x, y) of each pixel are encoded (yellow) and processed by a fully-connected (FC) network with weights, modulated with a latent vector w, shared for all pixels. The network returns the RGB value of that pixel. Bottom: The architecture of a modulated fully-connected layer (ModFC). Note: our default configuration also includes skip connections to the output (not shown here).