{"title":"Fine-grained regression for image aesthetic scoring","authors":"Xin Jin, Qiang Deng, Hao Lou, Xiqiao Li, Chaoen Xiao","doi":"10.1016/j.cogr.2022.07.003","DOIUrl":null,"url":null,"abstract":"<div><p>There are many tasks on image aesthetic assessment, such as aesthetic classification, scoring, score distribution prediction, and captions. Due to the distribution of the aesthetic score is unbalanced, the assessment models always output scores near the mean score. In this paper, we propose a fine-grained regression method for aesthetics score regression and combine position and channel attention mechanisms to enhance the aesthetic feature fusion. And by training the regression network separately from the classification network, we make the classification task a complement to the regression task. Besides, the researchers are used to using Mean Square Error (MSE) as the main evaluation metric which is inadequate in measuring the error of each interval. In order to fully consider the images of the various aesthetic score segments, instead of focusing on the intermediate aesthetic score segments because of the imbalance of the aesthetic datasets, we propose a new evaluation metric called Segmented Mean Square Errors (SMSE) to prove the advantages of the model. We divide the entire AADB dataset into 10 equal parts based on the aesthetic scores and the experiments were carried out on each of the segmented AADB datasets. In this way, images for each aesthetic score segment are fairly considered. The experimental results reveal that our method outperforms all the state-of-the-art methods on both MSE and SMSE. The dual attention modules of position and channel also make the activation maps more reasonable. Our methods make the aesthetic scoring go beyond laboratories to real life applications. Because computational visual aesthetics is a very interesting and challenging task in the field of computer vision, and computer vision is also one of the key areas of focus of this journal, the method proposed in this paper is closely related to the field covered by the journal.</p></div>","PeriodicalId":100288,"journal":{"name":"Cognitive Robotics","volume":"2 ","pages":"Pages 202-210"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667241322000167/pdfft?md5=dc3af1caaad28fd9bab9b75e96e3a5e1&pid=1-s2.0-S2667241322000167-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Robotics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667241322000167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
There are many tasks on image aesthetic assessment, such as aesthetic classification, scoring, score distribution prediction, and captions. Due to the distribution of the aesthetic score is unbalanced, the assessment models always output scores near the mean score. In this paper, we propose a fine-grained regression method for aesthetics score regression and combine position and channel attention mechanisms to enhance the aesthetic feature fusion. And by training the regression network separately from the classification network, we make the classification task a complement to the regression task. Besides, the researchers are used to using Mean Square Error (MSE) as the main evaluation metric which is inadequate in measuring the error of each interval. In order to fully consider the images of the various aesthetic score segments, instead of focusing on the intermediate aesthetic score segments because of the imbalance of the aesthetic datasets, we propose a new evaluation metric called Segmented Mean Square Errors (SMSE) to prove the advantages of the model. We divide the entire AADB dataset into 10 equal parts based on the aesthetic scores and the experiments were carried out on each of the segmented AADB datasets. In this way, images for each aesthetic score segment are fairly considered. The experimental results reveal that our method outperforms all the state-of-the-art methods on both MSE and SMSE. The dual attention modules of position and channel also make the activation maps more reasonable. Our methods make the aesthetic scoring go beyond laboratories to real life applications. Because computational visual aesthetics is a very interesting and challenging task in the field of computer vision, and computer vision is also one of the key areas of focus of this journal, the method proposed in this paper is closely related to the field covered by the journal.