“Optimal Transport-Based Unsupervised Semantic Disentanglement: A Novel Approach for Efficient Image Editing in GANs”, 2023-12 ():
The latent space of pre-trained generative adversarial networks (GANs) is rich in semantic information, which often becomes highly entangled. It is crucial to identify semantic directions within this latent space, as these directions correlate with image attributes and are vital for image editing tasks. Existing methods for semantic discovery usually involve labor-intensive procedures such as manual labeling and training attribute classifiers, which limits their practicality.
In response to this issue, the paper proposes the Optimal Transport-based Unsupervised Semantic Disentanglement (OTUSD) algorithm. This novel method efficiently uncovers semantic directions in the latent space of GANs by using the concepts of manifold learning and optimal transport (OT) theory. OTUSD applies singular value decomposition (SVD) to the OT matrix that links latent codes to generated images. This process yields singular vectors that correspond to semantically meaningful directions.
Unlike traditional methods, OTUSD bypasses the need for time-consuming labeling and training processes, thus enhancing efficiency and revealing a wider array of semantically meaningful directions.
Experimental results demonstrate the effectiveness of OTUSD in discovering semantic directions from several state-of-the-art GAN models, including StyleGAN, StyleGAN2, and BigGAN.
This performance emphasizes the potential applicability of OTUSD to image editing and other related tasks, and illuminates its value in harnessing the manifold learning and OT mapping capabilities inherent in GANs for semantic disentanglement. The implementation code is available at Github.
See Also:
Unsupervised Discovery of Disentangled Interpretable Directions for Layer-Wise GAN
Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
AdvStyle: Discovering Interpretable Latent Space Directions of GANs Beyond Binary Attributes
Interpreting the Latent Space of GANs for Semantic Face Editing
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
User-Controllable Latent Transformer for StyleGAN Image Layout Editing
Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation