{"title":"Mfpenet: multistage foreground-perception enhancement network for remote-sensing scene classification","authors":"Junding Sun, Chenxu Wang, Haifeng Sima, Xiaosheng Wu, Shuihua Wang, Yudong Zhang","doi":"10.1007/s00371-024-03587-w","DOIUrl":null,"url":null,"abstract":"<p>Scene classification plays a vital role in the field of remote-sensing (RS). However, remote-sensing images have the essential properties of complex scene information and large-scale spatial changes, as well as the high similarity between various classes and the significant differences within the same class, which brings great challenges to scene classification. To address these issues, a multistage foreground-perception enhancement network (MFPENet) is proposed to enhance the ability to perceive foreground features, thereby improving classification accuracy. Firstly, to enrich the scene semantics of feature information, a multi-scale feature aggregation module is specifically designed using dilated convolution, which takes the features of different stages of the backbone network as input data to obtain enhanced multiscale features. Then, a novel foreground-perception enhancement module is designed to capture foreground information. Unlike the previous methods, we separate foreground features by designing feature masks and then innovatively explore the symbiotic relationship between foreground features and scene features to improve the recognition ability of foreground features further. Finally, a hierarchical attention module is designed to reduce the interference of redundant background details on classification. By embedding the dependence between adjacent level features into the attention mechanism, the model can pay more accurate attention to the key information. Redundancy is reduced, and the loss of useful information is minimized. Experiments on three public RS scene classification datasets [UC-Merced, Aerial Image Dataset, and NWPU-RESISC45] show that our method achieves highly competitive results. Future work will focus on utilizing the background features outside the effective foreground features in the image as a decision aid to improve the distinguishability between similar scenes. The source code of our proposed algorithm and the related datasets are available at https://github.com/Hpu-wcx/MFPENet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03587-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scene classification plays a vital role in the field of remote-sensing (RS). However, remote-sensing images have the essential properties of complex scene information and large-scale spatial changes, as well as the high similarity between various classes and the significant differences within the same class, which brings great challenges to scene classification. To address these issues, a multistage foreground-perception enhancement network (MFPENet) is proposed to enhance the ability to perceive foreground features, thereby improving classification accuracy. Firstly, to enrich the scene semantics of feature information, a multi-scale feature aggregation module is specifically designed using dilated convolution, which takes the features of different stages of the backbone network as input data to obtain enhanced multiscale features. Then, a novel foreground-perception enhancement module is designed to capture foreground information. Unlike the previous methods, we separate foreground features by designing feature masks and then innovatively explore the symbiotic relationship between foreground features and scene features to improve the recognition ability of foreground features further. Finally, a hierarchical attention module is designed to reduce the interference of redundant background details on classification. By embedding the dependence between adjacent level features into the attention mechanism, the model can pay more accurate attention to the key information. Redundancy is reduced, and the loss of useful information is minimized. Experiments on three public RS scene classification datasets [UC-Merced, Aerial Image Dataset, and NWPU-RESISC45] show that our method achieves highly competitive results. Future work will focus on utilizing the background features outside the effective foreground features in the image as a decision aid to improve the distinguishability between similar scenes. The source code of our proposed algorithm and the related datasets are available at https://github.com/Hpu-wcx/MFPENet.