One limitation of stylegan is that it generates a "pyramid" of images. The first layer makes a 4x4 image, which is upscaled and passed through the next layer (8x8), and so on, until out pops the final 1024x1024. It's limiting because \

Jun 30, 2020 · 7:54 AM UTC

by the time you reach 32x32, the overall structure of the object is established (is this a face? is it a dog?) yet only the first 4 layers of the model were allowed to contribute to that decision! For a 1024x1024 model, that means 6 out of 10 layers of weights are irrelevant.
This may be why biggan is superior at modeling complex objects vs stylegan, since biggan doesn't use a pyramid at all. I think the pyramid is worth keeping, but I propose a modification: have every slice of the pyramid pass through the entire model. Expensive, but maybe better.
Datapoint: Simply turning off the pyramid resulted in no noticeable improvement. (Danbooru2019, 512x512 StyleGAN2 config-f with generator in "resnet" mode.) It's still early in training, but I'm not sure how much more converged it can get.
Replying to @theshawwn
You make a good point, may i know what tool/technique you used to visualize the intermediate layers?
My branch of stylegan2 outputs the layers to Tensorboard. It’s at https colon slash slash github dot com slash shawwn slash stylegan2 on the “working” branch. You can see an example at krillin.shawwn.com:1078/%23i… (though it’s taking a couple minutes to load).