ABSTRACT
Despite advances in machine learning and deep neural networks, there is still a huge gap between machine and human image understanding. One of the causes is the annotation process used to label training images. In most image categorization tasks, there is a fundamental ambiguity between some image categories and the underlying class probability differs from very obvious cases to ambiguous ones. However, current machine learning systems and applications usually work with discrete annotation processes and the training labels do not reflect this ambiguity. To address this issue, we propose an new image annotation framework where labeling incorporates human gaze behavior. In this framework, gaze behavior is used to predict image labeling difficulty. The image classifier is then trained with sample weights defined by the predicted difficulty. We demonstrate our approach's effectiveness on four-class image classification tasks.
References
- A. Borji and L. Itti. 2014. Human vs. computer in scene and object recognition. In Proc. CVPR. 113--120. Google Scholar
Digital Library
- B.E. Boser, I.M. Guyon, and V.N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proc. COLT. 144--152. Google Scholar
Digital Library
- A. Bulling, C. Weichel, and H. Gellersen. 2013. EyeContext: Recognition of high-level contextual cues from human visual behaviour. In Proc. CHI. 305--308. Google Scholar
Digital Library
- Y. Cui, F. Zhou, Y. Lin, and S. Belongie. 2016. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proc. CVPR. 1153--1162.Google Scholar
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. CVPR.Google Scholar
- J.A. Fails and D.R. Olsen Jr. 2003. Interactive machine learning. In Proc. IUI. 39--45. Google Scholar
Digital Library
- J. Fogarty, D. Tan, A. Kapoor, and S. Winder. 2008. CueFlik: Interactive concept learning in image search. In Proc. CHI. 29--38. Google Scholar
Digital Library
- R.C. Fong, W.J. Scheirer, and D.D. Cox. 2018. Using human brain activity to guide machine learning. Scientific reports 8, 1 (2018), 5397.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proc. CVPR. 770--778.Google Scholar
- A. Karpathy, G. Toderici, S. Shetty ans T. Leung, R. Sukthankar, and L. Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proc. CVPR. 1725--1732. Google Scholar
Digital Library
- A. Krizhevsky, I. Sutskever, and G.E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proc. NIPS. 1097--1105. Google Scholar
Digital Library
- R.T. Pramod and S.P. Arun. 2016. Do computational models differ systematically from human object perception?. In Proc. CVPR. 1601--1609.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, and L. Fei-Fei. 2015. ImageNet large scale visual recognition challenge . IJCV 115, 3 (2015), 211--252. Google Scholar
Digital Library
- H. Sattar, S. Muller, M. Fritz, and A. Bulling. 2015. Prediction of search targets from fixations in open-world settings. In Proc. CVPR. 981--990.Google Scholar
- W.J. Scheirer, S.E. Anthony, K. Nakayama, and D.D. Cox. 2014. Perceptual annotation: Measuring human vision to improve computer vision. IEEE TPAMI 36, 8 (2014), 1679--1686. Google Scholar
Digital Library
- S. Shimojo, C. Simion, E. Shimojo, and C. Scheier. 2003. Gaze bias both reflects and influences preference. Nature Neuroscience 6, 12 (2003), 1317--1322.Google Scholar
Cross Ref
- Y. Sugano, Y. Ozaki, H. Kasai, K. Ogaki, and Y. Sato. 2014. Image preference estimation with a data-driven approach: A comparative study between gaze and image features. JEMR 7, 3 (2014).Google Scholar
- B. Zhou, A. Lapedrizaa, J. Xiao, A. Torralba, and A. Oliva. 2014. Learning deep features for scene recognition using places database. In Proc. NIPS. 487--495. Google Scholar
Digital Library
Index Terms
Gaze-guided Image Classification for Reflecting Perceptual Class Ambiguity
Comments