DKETFormer：基于判别知识提取与转移的光学遥感图像显著目标检测

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2025-04-07 Epub Date: 2025-02-04 DOI:10.1016/j.neucom.2025.129558

Yuze Sun, Hongwei Zhao, Jianhang Zhou

{"title":"DKETFormer：基于判别知识提取与转移的光学遥感图像显著目标检测","authors":"Yuze Sun, Hongwei Zhao, Jianhang Zhou","doi":"10.1016/j.neucom.2025.129558","DOIUrl":null,"url":null,"abstract":"<div><div>Generally, most methods for salient object detection in optical remote sensing images (ORSI-SOD) are based on convolutional neural networks (CNNs). However, CNNs, due to their architectural characteristics, can only encode local semantic information, which leads to a lack of exploration of discriminative features on a large scale. Therefore, to encode the long-term dependency within the detection image, enhance the extraction of discriminative knowledge, and transfer it at multiple scales, we introduce a Transformer architecture called DKETFormer. Specifically, DKETFormer utilizes the Transformer backbone to obtain multi-scale feature maps that have encoded long-term dependency relationships. Then, it constructs a decoder using the Cross-spatial Knowledge Extraction Module (CKEM) and the Inter-layer Feature Transfer Module (IFTM). The CKEM is capable of extracting discriminative information across receptive fields while preserving knowledge from each channel. It also utilizes global information encoding to calibrate channel weights, resulting in improved knowledge aggregation and capturing of pixel-level pairwise relationships. The IFTM utilizes encoded and extracted information from the backbone and CKEM, employing a self-attention mechanism with cosine similarity knowledge to model and propagate discriminative features. Finally, we generated the final detection map using a salient object detector. The results of comparative experiments and ablation experiments demonstrate the effectiveness of the proposed DKETFormer and its internal modules.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"625 ","pages":"Article 129558"},"PeriodicalIF":6.5000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DKETFormer: Salient object detection in optical remote sensing images based on discriminative knowledge extraction and transfer\",\"authors\":\"Yuze Sun, Hongwei Zhao, Jianhang Zhou\",\"doi\":\"10.1016/j.neucom.2025.129558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Generally, most methods for salient object detection in optical remote sensing images (ORSI-SOD) are based on convolutional neural networks (CNNs). However, CNNs, due to their architectural characteristics, can only encode local semantic information, which leads to a lack of exploration of discriminative features on a large scale. Therefore, to encode the long-term dependency within the detection image, enhance the extraction of discriminative knowledge, and transfer it at multiple scales, we introduce a Transformer architecture called DKETFormer. Specifically, DKETFormer utilizes the Transformer backbone to obtain multi-scale feature maps that have encoded long-term dependency relationships. Then, it constructs a decoder using the Cross-spatial Knowledge Extraction Module (CKEM) and the Inter-layer Feature Transfer Module (IFTM). The CKEM is capable of extracting discriminative information across receptive fields while preserving knowledge from each channel. It also utilizes global information encoding to calibrate channel weights, resulting in improved knowledge aggregation and capturing of pixel-level pairwise relationships. The IFTM utilizes encoded and extracted information from the backbone and CKEM, employing a self-attention mechanism with cosine similarity knowledge to model and propagate discriminative features. Finally, we generated the final detection map using a salient object detector. The results of comparative experiments and ablation experiments demonstrate the effectiveness of the proposed DKETFormer and its internal modules.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"625 \",\"pages\":\"Article 129558\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225002309\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225002309","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

通常，光学遥感图像中显著目标检测（ORSI-SOD）的方法大多基于卷积神经网络（cnn）。然而，由于cnn的结构特点，只能对局部语义信息进行编码，缺乏对大规模判别特征的探索。因此，为了对检测图像中的长期依赖进行编码，增强对判别知识的提取，并在多尺度上进行转移，我们引入了一种名为DKETFormer的Transformer架构。具体来说，DKETFormer利用Transformer主干来获得编码了长期依赖关系的多尺度特征映射。然后，利用跨空间知识提取模块（CKEM）和层间特征传递模块（IFTM）构建解码器；CKEM能够在保留每个通道的知识的同时提取不同感受野的区别信息。它还利用全局信息编码来校准信道权重，从而改进了知识聚合和像素级成对关系的捕获。IFTM利用从主干和CKEM中编码和提取的信息，利用余弦相似度知识的自关注机制来建模和传播判别特征。最后，我们使用显著目标检测器生成最终的检测图。对比实验和烧蚀实验的结果证明了所提出的DKETFormer及其内部模块的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DKETFormer: Salient object detection in optical remote sensing images based on discriminative knowledge extraction and transfer

Generally, most methods for salient object detection in optical remote sensing images (ORSI-SOD) are based on convolutional neural networks (CNNs). However, CNNs, due to their architectural characteristics, can only encode local semantic information, which leads to a lack of exploration of discriminative features on a large scale. Therefore, to encode the long-term dependency within the detection image, enhance the extraction of discriminative knowledge, and transfer it at multiple scales, we introduce a Transformer architecture called DKETFormer. Specifically, DKETFormer utilizes the Transformer backbone to obtain multi-scale feature maps that have encoded long-term dependency relationships. Then, it constructs a decoder using the Cross-spatial Knowledge Extraction Module (CKEM) and the Inter-layer Feature Transfer Module (IFTM). The CKEM is capable of extracting discriminative information across receptive fields while preserving knowledge from each channel. It also utilizes global information encoding to calibrate channel weights, resulting in improved knowledge aggregation and capturing of pixel-level pairwise relationships. The IFTM utilizes encoded and extracted information from the backbone and CKEM, employing a self-attention mechanism with cosine similarity knowledge to model and propagate discriminative features. Finally, we generated the final detection map using a salient object detector. The results of comparative experiments and ablation experiments demonstrate the effectiveness of the proposed DKETFormer and its internal modules.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.