Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images

IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-06-14 DOI:10.1109/TMM.2024.3414669
Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu
{"title":"Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images","authors":"Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu","doi":"10.1109/TMM.2024.3414669","DOIUrl":null,"url":null,"abstract":"Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10892-10907"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10557726/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语义伪标签的深度混合对比学习用于光学遥感图像中的突出物体检测
近年来,自然场景图像中的突出物体检测(NSI-SOD)取得了显著进展。然而,与自然图像相比,遥感图像(ORSI)的空间分辨率多样、背景结构复杂、物体视觉属性各异等特性更为复杂。因此,如何发掘遥感图像的多尺度结构感知信息以准确检测突出物体更具挑战性。本文受对比学习优越性的启发,提出了一种新的 ORSI-SOD 训练范式,即基于语义伪标签的深度混合对比学习(Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label, DHCont),以迫使网络提取丰富的结构感知信息,并进一步学习结构更好的特征嵌入空间。具体来说,DHCont 首先将 ORSI 分割成若干个由颜色和纹理相似像素组成的局部子区域,作为语义伪标签。这种策略可以有效地探索 ORSI-SOD 中未充分开发的语义类别。为了深入研究多尺度结构感知优化,DHCont 采用了混合对比策略,在多个尺度上整合了 "像素到像素"、"区域到区域"、"像素到区域 "和 "区域到像素 "对比。此外,为了增强突出区域的边缘细节,我们开发了一种硬边缘对比策略,重点提高对象边界附近硬像素的检测精度。此外,我们还引入了一种深度对比算法,为多个阶段的特征空间添加了额外的深度约束。在两个流行的 ORSI-SOD 数据集上进行的广泛实验表明,只需将我们的 DHCont 集成到现有的 ORSI-SOD 模型中,就能显著提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
期刊最新文献
Phase-shifted tACS can modulate cortical alpha waves in human subjects. Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings Toward Efficient Video Compression Artifact Detection and Removal: A Benchmark Dataset Human-Centric Behavior Description in Videos: New Benchmark and Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1