{"title":"Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label for Salient Object Detection in Optical Remote Sensing Images","authors":"Yu Qiu;Yuhang Sun;Jie Mei;Jing Xu","doi":"10.1109/TMM.2024.3414669","DOIUrl":null,"url":null,"abstract":"Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10892-10907"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10557726/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Salient object detection in natural scene images (NSI-SOD) has undergone remarkable advancements in recent years. However, compared to those of natural images, the properties of remote sensing images (ORSIs), such as diverse spatial resolutions, complex background structures, and varying visual attributes of objects, are more complicated. Hence, how to explore the multiscale structural perceptual information of ORSIs to accurately detect salient objects is more challenging. In this paper, inspired by the superiority of contrastive learning, we propose a novel training paradigm for ORSI-SOD, named Deeply Hybrid Contrastive Learning Based on Semantic Pseudo-Label (DHCont), to force the network to extract rich structural perceptual information and further learn the better-structured feature embedding spaces. Specifically, DHCont first splits the ORSI into several local subregions composed of color- and texture-similar pixels, which act as semantic pseudo-labels. This strategy can effectively explore the underdeveloped semantic categories in ORSI-SOD. To delve deeper into multiscale structure-aware optimization, DHCont incorporates a hybrid contrast strategy that integrates “pixel-to-pixel”, “region-to-region”, “pixel-to-region”, and “region-to-pixel” contrasts at multiple scales. Additionally, to enhance the edge details of salient regions, we develop a hard edge contrast strategy that focuses on improving the detection accuracy of hard pixels near the object boundary. Moreover, we introduce a deep contrast algorithm that adds additional deep-level constraints to the feature spaces of multiple stages. Extensive experiments on two popular ORSI-SOD datasets demonstrate that simply integrating our DHCont into the existing ORSI-SOD models can significantly improve the performance.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.