用于高精度二分图像分割的双交叉增强网络

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-11-01 Epub Date: 2024-09-02 DOI:10.1016/j.cviu.2024.104122

Hongbo Bi , Yuyu Tong , Pan Zhang , Jiayuan Zhang , Cong Zhang

{"title":"用于高精度二分图像分割的双交叉增强网络","authors":"Hongbo Bi , Yuyu Tong , Pan Zhang , Jiayuan Zhang , Cong Zhang","doi":"10.1016/j.cviu.2024.104122","DOIUrl":null,"url":null,"abstract":"<div><p>The existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, such as salient, camouflaged, and meticulous objects, etc. However, the research of highly accurate Dichotomous Image Segmentation (DIS) combining these tasks has just started and still faces problems such as insufficient information interaction between layers and incomplete integration of high-level semantic information and low-level detailed features. In this paper, a new dual cross-enhancement network (DCENet) for highly accurate DIS is proposed, which mainly consists of two new modules: a cross-scaling guidance (CSG) module and a semantic cross-transplantation (SCT) module. Specifically, the CSG module adopts the adjacent-layer cross-scaling guidance method, which can efficiently interact with the multi-scale features of the adjacent layers extracted; the SCT module uses dual-branch features to complement each other. Moreover, in the way of transplantation, the high-level semantic information of the low-resolution branch is used to guide the low-level detail features of the high-resolution branch, and the features of different resolution branches are effectively fused. Finally, experimental results on the challenging DIS5K benchmark dataset show that the proposed network outperforms the 9 state-of-the-art (SOTA) networks in 5 widely used evaluation metrics. In addition, the ablation experiments also demonstrate the effectiveness of the cross-scaling guidance module and the semantic cross-transplantation module.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"248 ","pages":"Article 104122"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual cross-enhancement network for highly accurate dichotomous image segmentation\",\"authors\":\"Hongbo Bi , Yuyu Tong , Pan Zhang , Jiayuan Zhang , Cong Zhang\",\"doi\":\"10.1016/j.cviu.2024.104122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, such as salient, camouflaged, and meticulous objects, etc. However, the research of highly accurate Dichotomous Image Segmentation (DIS) combining these tasks has just started and still faces problems such as insufficient information interaction between layers and incomplete integration of high-level semantic information and low-level detailed features. In this paper, a new dual cross-enhancement network (DCENet) for highly accurate DIS is proposed, which mainly consists of two new modules: a cross-scaling guidance (CSG) module and a semantic cross-transplantation (SCT) module. Specifically, the CSG module adopts the adjacent-layer cross-scaling guidance method, which can efficiently interact with the multi-scale features of the adjacent layers extracted; the SCT module uses dual-branch features to complement each other. Moreover, in the way of transplantation, the high-level semantic information of the low-resolution branch is used to guide the low-level detail features of the high-resolution branch, and the features of different resolution branches are effectively fused. Finally, experimental results on the challenging DIS5K benchmark dataset show that the proposed network outperforms the 9 state-of-the-art (SOTA) networks in 5 widely used evaluation metrics. In addition, the ablation experiments also demonstrate the effectiveness of the cross-scaling guidance module and the semantic cross-transplantation module.</p></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"248 \",\"pages\":\"Article 104122\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002030\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002030","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

现有的图像分割任务主要集中于分割具有特定特征的物体，如突出物体、伪装物体和细致物体等。然而，结合这些任务的高精度二分图像分割（DIS）研究才刚刚起步，仍然面临着层间信息交互不足、高层语义信息与低层细节特征融合不彻底等问题。本文提出了一种用于高精度 DIS 的新型双交叉增强网络（DCENet），它主要由两个新模块组成：交叉缩放引导（CSG）模块和语义交叉移植（SCT）模块。具体来说，CSG 模块采用相邻层交叉缩放引导方法，可与提取的相邻层多尺度特征有效交互；SCT 模块采用双分支特征互补。此外，在移植方式上，利用低分辨率分支的高层语义信息引导高分辨率分支的低层细节特征，有效融合了不同分辨率分支的特征。最后，在具有挑战性的 DIS5K 基准数据集上的实验结果表明，在 5 个广泛使用的评估指标上，所提出的网络优于 9 个最先进的网络（SOTA）。此外，消融实验还证明了交叉缩放引导模块和语义交叉移植模块的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dual cross-enhancement network for highly accurate dichotomous image segmentation

The existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, such as salient, camouflaged, and meticulous objects, etc. However, the research of highly accurate Dichotomous Image Segmentation (DIS) combining these tasks has just started and still faces problems such as insufficient information interaction between layers and incomplete integration of high-level semantic information and low-level detailed features. In this paper, a new dual cross-enhancement network (DCENet) for highly accurate DIS is proposed, which mainly consists of two new modules: a cross-scaling guidance (CSG) module and a semantic cross-transplantation (SCT) module. Specifically, the CSG module adopts the adjacent-layer cross-scaling guidance method, which can efficiently interact with the multi-scale features of the adjacent layers extracted; the SCT module uses dual-branch features to complement each other. Moreover, in the way of transplantation, the high-level semantic information of the low-resolution branch is used to guide the low-level detail features of the high-resolution branch, and the features of different resolution branches are effectively fused. Finally, experimental results on the challenging DIS5K benchmark dataset show that the proposed network outperforms the 9 state-of-the-art (SOTA) networks in 5 widely used evaluation metrics. In addition, the ablation experiments also demonstrate the effectiveness of the cross-scaling guidance module and the semantic cross-transplantation module.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems