Mfpenet: multistage foreground-perception enhancement network for remote-sensing scene classification

Junding Sun, Chenxu Wang, Haifeng Sima, Xiaosheng Wu, Shuihua Wang, Yudong Zhang
{"title":"Mfpenet: multistage foreground-perception enhancement network for remote-sensing scene classification","authors":"Junding Sun, Chenxu Wang, Haifeng Sima, Xiaosheng Wu, Shuihua Wang, Yudong Zhang","doi":"10.1007/s00371-024-03587-w","DOIUrl":null,"url":null,"abstract":"<p>Scene classification plays a vital role in the field of remote-sensing (RS). However, remote-sensing images have the essential properties of complex scene information and large-scale spatial changes, as well as the high similarity between various classes and the significant differences within the same class, which brings great challenges to scene classification. To address these issues, a multistage foreground-perception enhancement network (MFPENet) is proposed to enhance the ability to perceive foreground features, thereby improving classification accuracy. Firstly, to enrich the scene semantics of feature information, a multi-scale feature aggregation module is specifically designed using dilated convolution, which takes the features of different stages of the backbone network as input data to obtain enhanced multiscale features. Then, a novel foreground-perception enhancement module is designed to capture foreground information. Unlike the previous methods, we separate foreground features by designing feature masks and then innovatively explore the symbiotic relationship between foreground features and scene features to improve the recognition ability of foreground features further. Finally, a hierarchical attention module is designed to reduce the interference of redundant background details on classification. By embedding the dependence between adjacent level features into the attention mechanism, the model can pay more accurate attention to the key information. Redundancy is reduced, and the loss of useful information is minimized. Experiments on three public RS scene classification datasets [UC-Merced, Aerial Image Dataset, and NWPU-RESISC45] show that our method achieves highly competitive results. Future work will focus on utilizing the background features outside the effective foreground features in the image as a decision aid to improve the distinguishability between similar scenes. The source code of our proposed algorithm and the related datasets are available at https://github.com/Hpu-wcx/MFPENet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03587-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Scene classification plays a vital role in the field of remote-sensing (RS). However, remote-sensing images have the essential properties of complex scene information and large-scale spatial changes, as well as the high similarity between various classes and the significant differences within the same class, which brings great challenges to scene classification. To address these issues, a multistage foreground-perception enhancement network (MFPENet) is proposed to enhance the ability to perceive foreground features, thereby improving classification accuracy. Firstly, to enrich the scene semantics of feature information, a multi-scale feature aggregation module is specifically designed using dilated convolution, which takes the features of different stages of the backbone network as input data to obtain enhanced multiscale features. Then, a novel foreground-perception enhancement module is designed to capture foreground information. Unlike the previous methods, we separate foreground features by designing feature masks and then innovatively explore the symbiotic relationship between foreground features and scene features to improve the recognition ability of foreground features further. Finally, a hierarchical attention module is designed to reduce the interference of redundant background details on classification. By embedding the dependence between adjacent level features into the attention mechanism, the model can pay more accurate attention to the key information. Redundancy is reduced, and the loss of useful information is minimized. Experiments on three public RS scene classification datasets [UC-Merced, Aerial Image Dataset, and NWPU-RESISC45] show that our method achieves highly competitive results. Future work will focus on utilizing the background features outside the effective foreground features in the image as a decision aid to improve the distinguishability between similar scenes. The source code of our proposed algorithm and the related datasets are available at https://github.com/Hpu-wcx/MFPENet.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Mfpenet:用于遥感场景分类的多级前景感知增强网络
场景分类在遥感(RS)领域发挥着重要作用。然而,遥感图像具有场景信息复杂、空间变化幅度大、不同类别之间相似性高、同一类别内部差异显著等基本特性,这给场景分类带来了巨大挑战。针对这些问题,提出了一种多级前景感知增强网络(MFPENet),以增强对前景特征的感知能力,从而提高分类精度。首先,为了丰富特征信息的场景语义,利用扩张卷积技术专门设计了一个多尺度特征聚合模块,将骨干网络不同阶段的特征作为输入数据,得到增强的多尺度特征。然后,我们设计了一个新颖的前景感知增强模块来捕捉前景信息。与以往方法不同的是,我们通过设计特征掩码来分离前景特征,然后创新性地探索前景特征与场景特征之间的共生关系,进一步提高前景特征的识别能力。最后,我们设计了分层注意力模块,以减少冗余背景细节对分类的干扰。通过将相邻层级特征之间的依赖关系嵌入关注机制,模型可以更准确地关注关键信息。冗余减少了,有用信息的损失也降到了最低。在三个公共 RS 场景分类数据集 [UC-Merced、Aerial Image Dataset 和 NWPU-RESISC45]上的实验表明,我们的方法取得了极具竞争力的结果。未来的工作将侧重于利用图像中有效前景特征之外的背景特征作为辅助决策工具,以提高相似场景之间的可区分性。我们提出的算法源代码和相关数据集可在 https://github.com/Hpu-wcx/MFPENet 网站上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling Video-driven musical composition using large language model with memory-augmented state space 3D human pose estimation using spatiotemporal hypergraphs and its public benchmark on opera videos Topological structure extraction for computing surface–surface intersection curves Lunet: an enhanced upsampling fusion network with efficient self-attention for semantic segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1