{"title":"Deep Active Learning with Contaminated Tags for Image Aesthetics Assessment.","authors":"Zhenguang Liu, Zepeng Wang, Yiyang Yao, Luming Zhang, Ling Shao","doi":"10.1109/TIP.2018.2828326","DOIUrl":null,"url":null,"abstract":"<p><p>Image aesthetic quality assessment has becoming an indispensable technique that facilitates a variety of image applications, e.g., photo retargeting and non-realistic rendering. Conventional approaches suffer from the following limitations: 1) the inefficiency of semantically describing images due to the inherent tag noise and incompletion, 2) the difficulty of accurately reflecting how humans actively perceive various regions inside each image, and 3) the challenge of incorporating the aesthetic experiences of multiple users. To solve these problems, we propose a novel semi-supervised deep active learning (SDAL) algorithm, which discovers how humans perceive semantically important regions from a large quantity of images partially assigned with contaminated tags. More specifically, as humans usually attend to the foreground objects before understanding them, we extract a succinct set of BING (binarized normed gradients) [60]-based object patches from each image. To simulate human visual perception, we propose SDAL which hierarchically learns human gaze shifting path (GSP) by sequentially linking semantically important object patches from each scenery. Noticeably, SDLA unifies the semantically important regions discovery and deep GSP feature learning into a principled framework, wherein only a small proportion of tagged images are adopted. Moreover, based on the sparsity penalty, SDLA can optimally abandon the noisy or redundant low-level image features. Finally, by leveraging the deeply-learned GSP features, a probabilistic model is developed for image aesthetics assessment, where the experience of multiple professional photographers can be encoded. Besides, auxiliary quality-related features can be conveniently integrated into our probabilistic model. Comprehensive experiments on a series of benchmark image sets have demonstrated the superiority of our method. As a byproduct, eye tracking experiments have shown that GSPs generated by our SDAL are about 93% consistent with real human gaze shifting paths.</p>","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":" ","pages":""},"PeriodicalIF":10.8000,"publicationDate":"2018-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Image Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TIP.2018.2828326","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Image aesthetic quality assessment has becoming an indispensable technique that facilitates a variety of image applications, e.g., photo retargeting and non-realistic rendering. Conventional approaches suffer from the following limitations: 1) the inefficiency of semantically describing images due to the inherent tag noise and incompletion, 2) the difficulty of accurately reflecting how humans actively perceive various regions inside each image, and 3) the challenge of incorporating the aesthetic experiences of multiple users. To solve these problems, we propose a novel semi-supervised deep active learning (SDAL) algorithm, which discovers how humans perceive semantically important regions from a large quantity of images partially assigned with contaminated tags. More specifically, as humans usually attend to the foreground objects before understanding them, we extract a succinct set of BING (binarized normed gradients) [60]-based object patches from each image. To simulate human visual perception, we propose SDAL which hierarchically learns human gaze shifting path (GSP) by sequentially linking semantically important object patches from each scenery. Noticeably, SDLA unifies the semantically important regions discovery and deep GSP feature learning into a principled framework, wherein only a small proportion of tagged images are adopted. Moreover, based on the sparsity penalty, SDLA can optimally abandon the noisy or redundant low-level image features. Finally, by leveraging the deeply-learned GSP features, a probabilistic model is developed for image aesthetics assessment, where the experience of multiple professional photographers can be encoded. Besides, auxiliary quality-related features can be conveniently integrated into our probabilistic model. Comprehensive experiments on a series of benchmark image sets have demonstrated the superiority of our method. As a byproduct, eye tracking experiments have shown that GSPs generated by our SDAL are about 93% consistent with real human gaze shifting paths.
期刊介绍:
The IEEE Transactions on Image Processing delves into groundbreaking theories, algorithms, and structures concerning the generation, acquisition, manipulation, transmission, scrutiny, and presentation of images, video, and multidimensional signals across diverse applications. Topics span mathematical, statistical, and perceptual aspects, encompassing modeling, representation, formation, coding, filtering, enhancement, restoration, rendering, halftoning, search, and analysis of images, video, and multidimensional signals. Pertinent applications range from image and video communications to electronic imaging, biomedical imaging, image and video systems, and remote sensing.