“Evaluating Loss Functions for Illustration Super-Resolution Neural Networks”, 2021-10-18 ():
As display technologies evolve and high-resolution screens become more available, the desirability of images and videos with high perceptual quality grows in order to properly utilize such advances. At the same time, the market for illustrated mediums, such animations and comics, has been in steady growth over the past years.
Based on these observations, we were motivated to explore the super-resolution task in the niche of drawings. In absence of original high-resolution imagery, it is necessary to use approximate methods, such as interpolation algorithms, to enhance low-resolution media. Such methods, however, can produce undesirable artifacts in the reconstruct images, such as blurring and edge distortions. Recent works have successfully applied deep learning to this task, but such efforts are often aimed at real-world images and do not take in account the specifics of illustrations, which emphasize lines and employ simplified patterns rather than complex textures, which in turn makes visual artifacts introduced by algorithms easier to spot.
With these differences in mind, we evaluated the effects of the choice of loss functions [on an ESPCN CNN] in order to obtain accurate and perceptually pleasing results in the super-resolution task for comics, cartoons, and other illustrations.
Experimental evaluations have shown that a loss function based on edge detection performs best in this context among the evaluated functions, though still showing room for further improvements.
…C. Datasets: In an attempt to replicate the wide spectrum of illustrations, the following 3 datasets were used in this work.
- Danbooru2020: A collection of ~4 million crowdsourced illustrations of varying characteristics, ranging from line art to highly textured pictures. A subset of 40,000 randomly sampled images were selected for use during the training phase, of which 8,000 were used for validation at the end of each epoch. A second subset of 10,000 images was used for testing. Due to hardware constraints for training, we used 96×96px central patches as the ground truth images.
- Manga109: A collection of ~10,000 comic pages drawn by professional manga artists in Japan,26 used as a benchmark for SISR tasks,14 .17 This dataset is characterized by having mostly grayscale images with finer details such as text. We used 288×288px central patches from this dataset as the ground truth images. A larger patch size was used in order to capture a meaningful section of the illustration images present in this dataset.
- SYNLA: In order to further evaluate the generalization capabilities of the networks and find potential pathological cases, we also included a collection of synthetic line art images. The dataset is available in two versions, each with roughly 2,000 images, both which were used: one in grayscale, the other in color. As the original image sizes were smaller than the patch size 288×288px, specified in §IV-C2, and not an exact multiple of our scale factor, we used 192×192px central patches from this dataset as the ground truth images.
Danbooru2020 was used for training and testing, due to its wide range of illustrations in order to train networks able to generalize over style characteristics. Manga109 and SYNLA were used solely for testing.