“GCN-Based Multi-Modal Multi-Label Attribute Classification in Anime Illustration Using Domain-Specific Semantic Features”, 2022-10-16 (; backlinks):
This paper presents a multi-modal multi-label attribute classification model in anime illustration based on Graph Convolutional Networks (GCN) using domain-specific semantic features. In animation production, since creators often intentionally highlight the subtle characteristics of the characters and objects when creating anime illustrations, we focus on the task of multi-label attribute classification.
To capture the relationship between attributes, we construct a multi-modal GCN model that can adopt semantic features specific to anime illustration. To generate the domain-specific semantic features that represent the semantic contents of anime illustrations, we construct a new captioning framework for anime illustration by combining real images and their style transformation. The contributions of the proposed method are: (1) More comprehensive relationships between attributes are captured by introducing GCN with semantic features into the multi-label attribute classification task of anime illustrations; (2) More accurate image captioning of anime illustrations can be generated by a trainable model by using only real-world images.
To our best knowledge, this is the first work dealing with multi-label attribute classification in anime illustration.
The experimental results show the effectiveness of the proposed method by comparing it with some existing methods including the state-of-the-art methods.
[Keywords: anime illustration, graph convolutional networks, semantic feature, multi-modal classification, image captioning]
…3.2. Training of Whole Multi-label Classification Model: In our experiments, we used Danbooru2020 dataset for training the whole multi-label attribute classification model. Danbooru2020 dataset is a large anime illustration dataset with over 4.2 million images and over 130 million tags. From the dataset, we extracted about 25,000 anime illustrations, which include 100 common attribute classes, and each illustration contains an average of 6.3 attribute labels. We used 75% of the 25,000 images as the training set and the remaining 25% as the validation set.
See Also: