从背景和潜在对象中学习原型,实现少镜头语义分割

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2025-04-08 Epub Date: 2025-02-27 DOI:10.1016/j.knosys.2025.113218
Yicong Wang , Rong Huang , Shubo Zhou , Xueqin Jiang , Zhijun Fang
{"title":"从背景和潜在对象中学习原型,实现少镜头语义分割","authors":"Yicong Wang ,&nbsp;Rong Huang ,&nbsp;Shubo Zhou ,&nbsp;Xueqin Jiang ,&nbsp;Zhijun Fang","doi":"10.1016/j.knosys.2025.113218","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-20<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span>. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"314 ","pages":"Article 113218"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning prototypes from background and latent objects for few-shot semantic segmentation\",\"authors\":\"Yicong Wang ,&nbsp;Rong Huang ,&nbsp;Shubo Zhou ,&nbsp;Xueqin Jiang ,&nbsp;Zhijun Fang\",\"doi\":\"10.1016/j.knosys.2025.113218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-20<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span>. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"314 \",\"pages\":\"Article 113218\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125002655\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125002655","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

少镜头语义分割(few -shot semantic segmentation, FSS)的目的是在给定的图像中,利用少量的像素级标注对目标物体进行分割。现有的FSS框架主要关注目标区域来学习目标-对象原型,而直接忽略了非目标线索。因此,目标-对象原型不仅要分割目标对象,同时还要过滤掉非目标区域,从而产生大量的误报。在本文中,我们提出了一种背景和潜在对象原型学习网络(BLPLNet),它不仅可以从目标区域学习原型,还可以从非目标区域学习原型。从我们的角度来看,非目标区域被描绘成充满重复纹理和突出目标的背景,本文将其称为潜在目标。具体来说,开发了一个背景挖掘模块(BMM),专门通过情景学习来学习背景原型。学习后的背景原型代替目标对象原型进行背景滤波,减少了误报。此外,基于自注意机制的潜在目标挖掘模块(LOMM)与BMM协同工作,从潜在目标中学习多个软正交原型。然后,在目标对象增强模块(TOEM)中使用学习到的潜在对象原型,该原型浓缩了对象的一般知识,在基于亲和力分数的指导下对目标对象原型进行增强。在PASCAL-5i和COCO-20i数据集上的大量实验证明了BLPLNet的优越性,其在PASCAL-5i上的平均性能优于最先进的方法0.60%。消融研究验证了每个组件的有效性,可视化结果表明学习的潜在对象原型确实传达了对象的一般知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learning prototypes from background and latent objects for few-shot semantic segmentation
Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL-5i. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
期刊最新文献
ReSemGS-SLAM: Real-time semantic Gaussian Splatting SLAM based on semantic consistency perception FLEX: Robust client selection for dynamic federated learning environments Predicting student satisfaction in metaverse platform using deep learning Unsupervised video stitching via spatiotemporal coupling with temporal propagation and dynamic memory Grounded multimodal named entity recognition via large multimodal models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1