Shawn Presser · Jun 30, 2020 · 7:54 AM UTC

Shawn Presser · Jun 30, 2020 · 7:54 AM UTC

Shawn Presser

Shawn Presser

@theshawwn

30 Jun 2020

One limitation of stylegan is that it generates a "pyramid" of images. The first layer makes a 4x4 image, which is upscaled and passed through the next layer (8x8), and so on, until out pops the final 1024x1024. It's limiting because \

Jun 30, 2020 · 7:54 AM UTC

Shawn Presser · Jun 30, 2020 · 7:54 AM UTC

Shawn Presser

@theshawwn

30 Jun 2020

by the time you reach 32x32, the overall structure of the object is established (is this a face? is it a dog?) yet only the first 4 layers of the model were allowed to contribute to that decision! For a 1024x1024 model, that means 6 out of 10 layers of weights are irrelevant.

Shawn Presser · Jun 30, 2020 · 7:54 AM UTC

Shawn Presser

@theshawwn

30 Jun 2020

This may be why biggan is superior at modeling complex objects vs stylegan, since biggan doesn't use a pyramid at all. I think the pyramid is worth keeping, but I propose a modification: have every slice of the pyramid pass through the entire model. Expensive, but maybe better.

Shawn Presser · Jul 5, 2020 · 6:01 PM UTC

Shawn Presser

@theshawwn

5 Jul 2020

Datapoint: Simply turning off the pyramid resulted in no noticeable improvement. (Danbooru2019, 512x512 StyleGAN2 config-f with generator in "resnet" mode.) It's still early in training, but I'm not sure how much more converged it can get.

Faris Hijazi 🇵🇸🇸🇦 · Jun 30, 2020 · 11:10 PM UTC

Faris Hijazi 🇵🇸🇸🇦 @theeFaris

30 Jun 2020

Replying to @theshawwn

You make a good point, may i know what tool/technique you used to visualize the intermediate layers?

Shawn Presser · Jun 30, 2020 · 11:49 PM UTC

Shawn Presser

@theshawwn

30 Jun 2020

My branch of stylegan2 outputs the layers to Tensorboard. It’s at https colon slash slash github dot com slash shawwn slash stylegan2 on the “working” branch. You can see an example at krillin.shawwn.com:1078/%23i… (though it’s taking a couple minutes to load).

more replies