The selection of text color is a time-consuming and important aspect in the designing of visual-textual presentation layout.
In this paper, we propose a novel deep neural network architecture for predicting text color in the designing of visual-textual presentation layout. The proposed architecture consists of a text colorization network, a color harmony scoring network, and a text readability scoring network. The color harmony scoring network is learned by training with color theme data with esthetic scores. The text readability scoring network is learned by training with design works. Finally, the text colorization network is designed to predict text colors by maximizing both color harmony and text readability, as well as learning from designer’s choice of color.
In addition, this paper conducts a comparison with other methods based on random generation, color theory rules or similar features search.
Both quantitative and qualitative evaluation results demonstrate that the proposed method has better performance.
[Keywords: text colorization, color harmonization, text readability, visual-textual presentation design]
…4.1 Datasets:
Color Combination esthetics Score Dataset: We obtained the Mechanical Turk public dataset from,14 which consists of 10,743 carefully selected color themes created by users on Adobe Kuler,1 covering a wide range of highly and poorly rated color themes, each of which rated by at least 3 random users with ratings 1–5. The Mechanical Turk dataset uses Amazon Mechanical Turk1 to collect more user ratings for the selected topics, making each topic rated by 40 users. Finally, the average score for each topic was taken as the final score.
Visual-Textual Design Works Dataset: We constructed a visual-textual design dataset called VTDSet (Visual-Textual Design Set) where 10 designers selected text colors in 5 to 7 areas on each of the total 1,226 images, resulting in 77,038 designed text colors and their corresponding information. We randomly selected 10,000 design results associated with 1,000 background images from the dataset as the training dataset, and 2,260 design results associated with the remaining 226 background images as the testing dataset.
…4.4 Comparison with Other Method: We compare the text colorization network HTCN proposed in this paper with the following 3 approaches:
Random Text Colorization (“Random”). A random value is selected in the RGB color space, and this baseline is used to check whether the color design of the text in the generation of the visual-textual presentation layout is arbitrary.
Text Colorization Based on Matsuda Color Wheel Theory (“Matsuda CW”). This text colorization method bases on the color wheel theory, which is also adopted in the work of Yanget al2016.18 We reproduce the method by first performing principal component analysis on the image to obtain the color theme, taking the color with the largest proportion as the base color Cd of the image, and then calculating the minimum harmonic color wheel distance between the base color Cd and the esthetic template color set according to the constraint defined by Matsuda to obtain the optimal hue value of the text color Cr. Finally, the color mean μh,s,v of the image covered by the text area is calculated, and the optimal text color is obtained by reasonably maximizing the distance between μh,s,v and Cr in the (s, v) saturation and luminance space.
Text Colorization Based on Image Feature Retrieval (“Retrieval”). Retrieval-based strategy is frequently used in design, ie. seeking reference among solutions of similar problems. For the text colorization problem, the original designer’s color can become the recommended color when the background image and the text area are similar. As a result, we concatenate the global features of the image and the local image features of the text-covered region to obtain the K nearest neighbor recommendations for the current text coloring by the cosine distance. We used the VGG-16 network15 pretrained on the ImageNet dataset, and selected the output of the fc6 layer as the image features. The combined feature of the text region image Itext on the global image I is f=<VGGI,VGGItext>. The text color corresponding to the feature with greatest similarity in the design library is selected for colorization.
Figure 3: Comparison of the actual effect of text colorization under various algorithms: (a) random generation of text colors, (b) method based on the Matsuda color wheel theory, (c) retrieval-based method that directly obtains corresponding color recommendations from historically similar design examples, (d) the HTCN network proposed in this paper, and (e) is the designer’s original work.