Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-06-14 DOI:10.1109/TMM.2024.3414669

Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu

{"title":"Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images","authors":"Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu","doi":"10.1109/TMM.2024.3414669","DOIUrl":null,"url":null,"abstract":"Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10892-10907"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10557726/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义伪标签的深度混合对比学习用于光学遥感图像中的突出物体检测

近年来，自然场景图像中的突出物体检测（NSI-SOD）取得了显著进展。然而，与自然图像相比，遥感图像（ORSI）的空间分辨率多样、背景结构复杂、物体视觉属性各异等特性更为复杂。因此，如何发掘遥感图像的多尺度结构感知信息以准确检测突出物体更具挑战性。本文受对比学习优越性的启发，提出了一种新的 ORSI-SOD 训练范式，即基于语义伪标签的深度混合对比学习（Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label, DHCont），以迫使网络提取丰富的结构感知信息，并进一步学习结构更好的特征嵌入空间。具体来说，DHCont 首先将 ORSI 分割成若干个由颜色和纹理相似像素组成的局部子区域，作为语义伪标签。这种策略可以有效地探索 ORSI-SOD 中未充分开发的语义类别。为了深入研究多尺度结构感知优化，DHCont 采用了混合对比策略，在多个尺度上整合了 "像素到像素"、"区域到区域"、"像素到区域 "和 "区域到像素 "对比。此外，为了增强突出区域的边缘细节，我们开发了一种硬边缘对比策略，重点提高对象边界附近硬像素的检测精度。此外，我们还引入了一种深度对比算法，为多个阶段的特征空间添加了额外的深度约束。在两个流行的 ORSI-SOD 数据集上进行的广泛实验表明，只需将我们的 DHCont 集成到现有的 ORSI-SOD 模型中，就能显著提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.