Semantic segmentation via pixel-to-center similarity calculation

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE CAAI Transactions on Intelligence Technology Pub Date : 2023-06-07 DOI:10.1049/cit2.12245

Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao

{"title":"Semantic segmentation via pixel-to-center similarity calculation","authors":"Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao","doi":"10.1049/cit2.12245","DOIUrl":null,"url":null,"abstract":"<p>Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra-class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same-class pixels from different scenes; (ii) The inter-class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra-class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity. Finally, the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 1","pages":"87-100"},"PeriodicalIF":8.4000,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12245","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12245","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra-class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same-class pixels from different scenes; (ii) The inter-class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra-class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity. Finally, the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过像素到中心的相似性计算进行语义分割

自全卷积网络在语义分割领域取得巨大成功以来，人们提出了许多提取区分像素表征的方法。然而，作者发现现有的方法仍然存在两个典型的挑战：(i) 不同场景之间的类内特征差异可能很大，导致不同场景的同类像素之间难以保持一致；(ii) 同一场景中的类间特征差异可能很小，导致区分每个场景中不同类的性能有限。作者首先从像素与类中心相似性的角度重新思考语义分割。分割头的每个权重向量都代表其在整个数据集中对应的语义类别，可视为类别中心的嵌入。因此，像素分类相当于计算像素与类中心在最终特征空间中的相似度。在这一新颖观点的指导下，作者提出了类中心相似性（CCS）层，通过生成以每个场景为条件的自适应类中心并监督类中心之间的相似性来应对上述挑战。CCS 层利用自适应类中心模块生成以每个场景为条件的类中心，以适应不同场景之间巨大的类内差异。根据预测的中心到中心和像素到中心的相似性，引入专门设计的类距离损失（CD Loss）来控制类间和类内距离。最后，CCS 层输出处理后的像素到中心相似度作为分割预测。广泛的实验证明，我们的模型与最先进的方法相比表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.