词向量嵌入和自补充网络用于广义少镜头语义分割

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-10-21 DOI:10.1016/j.neucom.2024.128737
Xiaowei Wang, Qiong Chen, Yong Yang
{"title":"词向量嵌入和自补充网络用于广义少镜头语义分割","authors":"Xiaowei Wang,&nbsp;Qiong Chen,&nbsp;Yong Yang","doi":"10.1016/j.neucom.2024.128737","DOIUrl":null,"url":null,"abstract":"<div><div>Under the condition of sufficient base class samples and a few novel class samples, Generalized Few-shot Semantic Segmentation (GFSS) classifies each pixel in the query image as base class, novel class, or background. A standard GFSS approach involves two training stages: base class learning and novel class updating. However, inter-class interference and information loss which contribute to the poor performance of GFSS, have not been synthetical considered. To address the problem, we propose an Embedded-Self-Supplementing Network (ESSNet), i.e., semantic word embedding and query set self-supplementing information to enhance segmentation accuracy. Specifically, the semantic word embedding module employs distance information between word vectors to assist the model in learning the distance between class prototypes. In order to transform the semantic word vector prototypes from the semantic space to the visual embedding space, we designed a triplet loss function to supervise the word vector embedding module, where the word vector prototype serves as an anchor and positive-negative samples are collected among the general features of the support image. To compensate for the information loss caused by using prototypes to represent classes, we propose a self-supplementing module to mine the information contained in the query image. Specifically, this module first makes a preliminary prediction on the query image, then selects high-confidence area to form pseudo labels, and finally uses pseudo labels to extract query prototypes to supplement the missing information. Extensive experiments on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-20<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> show that ESSNet has superior performance and outperforms state-of-the-art methods in all settings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"613 ","pages":"Article 128737"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Word vector embedding and self-supplementing network for Generalized Few-shot Semantic Segmentation\",\"authors\":\"Xiaowei Wang,&nbsp;Qiong Chen,&nbsp;Yong Yang\",\"doi\":\"10.1016/j.neucom.2024.128737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Under the condition of sufficient base class samples and a few novel class samples, Generalized Few-shot Semantic Segmentation (GFSS) classifies each pixel in the query image as base class, novel class, or background. A standard GFSS approach involves two training stages: base class learning and novel class updating. However, inter-class interference and information loss which contribute to the poor performance of GFSS, have not been synthetical considered. To address the problem, we propose an Embedded-Self-Supplementing Network (ESSNet), i.e., semantic word embedding and query set self-supplementing information to enhance segmentation accuracy. Specifically, the semantic word embedding module employs distance information between word vectors to assist the model in learning the distance between class prototypes. In order to transform the semantic word vector prototypes from the semantic space to the visual embedding space, we designed a triplet loss function to supervise the word vector embedding module, where the word vector prototype serves as an anchor and positive-negative samples are collected among the general features of the support image. To compensate for the information loss caused by using prototypes to represent classes, we propose a self-supplementing module to mine the information contained in the query image. Specifically, this module first makes a preliminary prediction on the query image, then selects high-confidence area to form pseudo labels, and finally uses pseudo labels to extract query prototypes to supplement the missing information. Extensive experiments on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-20<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> show that ESSNet has superior performance and outperforms state-of-the-art methods in all settings.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"613 \",\"pages\":\"Article 128737\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092523122401508X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122401508X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在有足够的基础类样本和少量新类别样本的条件下,广义少镜头语义分割(GFSS)可将查询图像中的每个像素分为基础类、新类别或背景。标准的 GFSS 方法包括两个训练阶段:基础类学习和新类别更新。然而,类间干扰和信息丢失是导致 GFSS 性能不佳的原因,这一点尚未得到综合考虑。为解决这一问题,我们提出了一种嵌入式自补充网络(ESSNet),即通过语义词嵌入和查询集自补充信息来提高分割精度。具体来说,语义词嵌入模块利用词向量之间的距离信息来帮助模型学习类原型之间的距离。为了将语义词向量原型从语义空间转换到视觉嵌入空间,我们设计了一个三元组损失函数来监督词向量嵌入模块,其中词向量原型作为锚点,正负样本从支持图像的一般特征中收集。为了弥补使用原型表示类别所造成的信息损失,我们提出了一个自我补充模块来挖掘查询图像中包含的信息。具体来说,该模块首先对查询图像进行初步预测,然后选择高置信度区域形成伪标签,最后使用伪标签提取查询原型以补充缺失的信息。在 PASCAL-5i 和 COCO-20i 上进行的大量实验表明,ESSNet 性能卓越,在所有情况下都优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Word vector embedding and self-supplementing network for Generalized Few-shot Semantic Segmentation
Under the condition of sufficient base class samples and a few novel class samples, Generalized Few-shot Semantic Segmentation (GFSS) classifies each pixel in the query image as base class, novel class, or background. A standard GFSS approach involves two training stages: base class learning and novel class updating. However, inter-class interference and information loss which contribute to the poor performance of GFSS, have not been synthetical considered. To address the problem, we propose an Embedded-Self-Supplementing Network (ESSNet), i.e., semantic word embedding and query set self-supplementing information to enhance segmentation accuracy. Specifically, the semantic word embedding module employs distance information between word vectors to assist the model in learning the distance between class prototypes. In order to transform the semantic word vector prototypes from the semantic space to the visual embedding space, we designed a triplet loss function to supervise the word vector embedding module, where the word vector prototype serves as an anchor and positive-negative samples are collected among the general features of the support image. To compensate for the information loss caused by using prototypes to represent classes, we propose a self-supplementing module to mine the information contained in the query image. Specifically, this module first makes a preliminary prediction on the query image, then selects high-confidence area to form pseudo labels, and finally uses pseudo labels to extract query prototypes to supplement the missing information. Extensive experiments on PASCAL-5i and COCO-20i show that ESSNet has superior performance and outperforms state-of-the-art methods in all settings.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
期刊最新文献
Editorial Board Virtual sample generation for small sample learning: A survey, recent developments and future prospects Adaptive selection of spectral–spatial features for hyperspectral image classification using a modified-CBAM-based network FPGA-based component-wise LSTM training accelerator for neural granger causality analysis Multi-sensor information fusion in Internet of Vehicles based on deep learning: A review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1