DKETFormer:基于判别知识提取与转移的光学遥感图像显著目标检测

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2025-04-07 Epub Date: 2025-02-04 DOI:10.1016/j.neucom.2025.129558
Yuze Sun, Hongwei Zhao, Jianhang Zhou
{"title":"DKETFormer:基于判别知识提取与转移的光学遥感图像显著目标检测","authors":"Yuze Sun,&nbsp;Hongwei Zhao,&nbsp;Jianhang Zhou","doi":"10.1016/j.neucom.2025.129558","DOIUrl":null,"url":null,"abstract":"<div><div>Generally, most methods for salient object detection in optical remote sensing images (ORSI-SOD) are based on convolutional neural networks (CNNs). However, CNNs, due to their architectural characteristics, can only encode local semantic information, which leads to a lack of exploration of discriminative features on a large scale. Therefore, to encode the long-term dependency within the detection image, enhance the extraction of discriminative knowledge, and transfer it at multiple scales, we introduce a Transformer architecture called DKETFormer. Specifically, DKETFormer utilizes the Transformer backbone to obtain multi-scale feature maps that have encoded long-term dependency relationships. Then, it constructs a decoder using the Cross-spatial Knowledge Extraction Module (CKEM) and the Inter-layer Feature Transfer Module (IFTM). The CKEM is capable of extracting discriminative information across receptive fields while preserving knowledge from each channel. It also utilizes global information encoding to calibrate channel weights, resulting in improved knowledge aggregation and capturing of pixel-level pairwise relationships. The IFTM utilizes encoded and extracted information from the backbone and CKEM, employing a self-attention mechanism with cosine similarity knowledge to model and propagate discriminative features. Finally, we generated the final detection map using a salient object detector. The results of comparative experiments and ablation experiments demonstrate the effectiveness of the proposed DKETFormer and its internal modules.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129558"},"PeriodicalIF":6.5000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DKETFormer: Salient object detection in optical remote sensing images based on discriminative knowledge extraction and transfer\",\"authors\":\"Yuze Sun,&nbsp;Hongwei Zhao,&nbsp;Jianhang Zhou\",\"doi\":\"10.1016/j.neucom.2025.129558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Generally, most methods for salient object detection in optical remote sensing images (ORSI-SOD) are based on convolutional neural networks (CNNs). However, CNNs, due to their architectural characteristics, can only encode local semantic information, which leads to a lack of exploration of discriminative features on a large scale. Therefore, to encode the long-term dependency within the detection image, enhance the extraction of discriminative knowledge, and transfer it at multiple scales, we introduce a Transformer architecture called DKETFormer. Specifically, DKETFormer utilizes the Transformer backbone to obtain multi-scale feature maps that have encoded long-term dependency relationships. Then, it constructs a decoder using the Cross-spatial Knowledge Extraction Module (CKEM) and the Inter-layer Feature Transfer Module (IFTM). The CKEM is capable of extracting discriminative information across receptive fields while preserving knowledge from each channel. It also utilizes global information encoding to calibrate channel weights, resulting in improved knowledge aggregation and capturing of pixel-level pairwise relationships. The IFTM utilizes encoded and extracted information from the backbone and CKEM, employing a self-attention mechanism with cosine similarity knowledge to model and propagate discriminative features. Finally, we generated the final detection map using a salient object detector. The results of comparative experiments and ablation experiments demonstrate the effectiveness of the proposed DKETFormer and its internal modules.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"625 \",\"pages\":\"Article 129558\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225002309\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225002309","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

通常,光学遥感图像中显著目标检测(ORSI-SOD)的方法大多基于卷积神经网络(cnn)。然而,由于cnn的结构特点,只能对局部语义信息进行编码,缺乏对大规模判别特征的探索。因此,为了对检测图像中的长期依赖进行编码,增强对判别知识的提取,并在多尺度上进行转移,我们引入了一种名为DKETFormer的Transformer架构。具体来说,DKETFormer利用Transformer主干来获得编码了长期依赖关系的多尺度特征映射。然后,利用跨空间知识提取模块(CKEM)和层间特征传递模块(IFTM)构建解码器;CKEM能够在保留每个通道的知识的同时提取不同感受野的区别信息。它还利用全局信息编码来校准信道权重,从而改进了知识聚合和像素级成对关系的捕获。IFTM利用从主干和CKEM中编码和提取的信息,利用余弦相似度知识的自关注机制来建模和传播判别特征。最后,我们使用显著目标检测器生成最终的检测图。对比实验和烧蚀实验的结果证明了所提出的DKETFormer及其内部模块的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DKETFormer: Salient object detection in optical remote sensing images based on discriminative knowledge extraction and transfer
Generally, most methods for salient object detection in optical remote sensing images (ORSI-SOD) are based on convolutional neural networks (CNNs). However, CNNs, due to their architectural characteristics, can only encode local semantic information, which leads to a lack of exploration of discriminative features on a large scale. Therefore, to encode the long-term dependency within the detection image, enhance the extraction of discriminative knowledge, and transfer it at multiple scales, we introduce a Transformer architecture called DKETFormer. Specifically, DKETFormer utilizes the Transformer backbone to obtain multi-scale feature maps that have encoded long-term dependency relationships. Then, it constructs a decoder using the Cross-spatial Knowledge Extraction Module (CKEM) and the Inter-layer Feature Transfer Module (IFTM). The CKEM is capable of extracting discriminative information across receptive fields while preserving knowledge from each channel. It also utilizes global information encoding to calibrate channel weights, resulting in improved knowledge aggregation and capturing of pixel-level pairwise relationships. The IFTM utilizes encoded and extracted information from the backbone and CKEM, employing a self-attention mechanism with cosine similarity knowledge to model and propagate discriminative features. Finally, we generated the final detection map using a salient object detector. The results of comparative experiments and ablation experiments demonstrate the effectiveness of the proposed DKETFormer and its internal modules.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
SSFA-Net: Sparse strip and dual-domain spatial-frequency attention for efficient image dehazing Analysis of adaptive optimal control theory for nonzero-sum stackelberg game based on high-order neural networks and conjugate gradient method Blind confusion of classification networks: A black box evaluation under common and structured image corruptions GeoDiffuser: A geometry-aware extension of pretrained diffusion models for consistent multi-view synthesis DCAF: Dynamic affective consistency-aware fusion with disentangled modality representations for multimodal sentiment analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1