[cf. Stanley2007/Mordvintsevet al2018/Ha2016 on CPPNs, COCO-GAN, NeRF] Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner.
Here, we present a new architecture for image generators [Conditionally-Independent Pixel Synthesis (CIPS)], where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. [A single-pixel GAN!] No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesisâŚIn total, its backbone contains 15 fully-connected layers [45.9m parameters]. [Otherwise StyleGAN 2-likeânote StyleGAN was already unusual for its deep stack of MLP layers processing the z â w embedding.]
We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators.
We also investigate several interesting properties unique to the new architecture. [It can take in arbitrary-shaped images like cylinders & generate arbitrary-sized or interpolated images, while using fixed memory per pixel in both training/sampling, and anytime (âfoveatedâ) results.]
Figure 2: The Conditionally-Independent Pixel Synthesis (CIPS) generator architecture. Top: the generation pipeline, in which the coordinates (x, y) of each pixel are encoded (yellow) and processed by a fully-connected (FC) network with weights, modulated with a latent vector w, shared for all pixels. The network returns the RGB value of that pixel.
Bottom: The architecture of a modulated fully-connected layer (ModFC). Note: our default configuration also includes skip connections to the output (not shown here).