Learning prototypes from background and latent objects for few-shot semantic segmentation

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2025-02-27 DOI:10.1016/j.knosys.2025.113218
Yicong Wang , Rong Huang , Shubo Zhou , Xueqin Jiang , Zhijun Fang
{"title":"Learning prototypes from background and latent objects for few-shot semantic segmentation","authors":"Yicong Wang ,&nbsp;Rong Huang ,&nbsp;Shubo Zhou ,&nbsp;Xueqin Jiang ,&nbsp;Zhijun Fang","doi":"10.1016/j.knosys.2025.113218","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> and COCO-20<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span> datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL-5<span><math><msup><mrow></mrow><mrow><mi>i</mi></mrow></msup></math></span>. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"314 ","pages":"Article 113218"},"PeriodicalIF":7.6000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125002655","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Few-shot semantic segmentation (FSS) aims to segment target object within a given image supported by few samples with pixel-level annotations. Existing FSS framework primarily focuses on target area for learning a target-object prototype while directly neglecting non-target clues. As such, the target-object prototype has not only to segment the target object but also to filter out non-target area simultaneously, resulting in numerous false positives. In this paper, we propose a background and latent-object prototype learning network (BLPLNet), which learns prototypes from not only the target area but also the non-target counterpart. From our perspective, the non-target area is delineated into background full of repeated textures and salient objects, refer to as latent objects in this paper. Specifically, a background mining module (BMM) is developed to specially learn a background prototype by episodic learning. The learned background prototype replaces the target-object one for background filtering, reducing the false positives. Moreover, a latent object mining module (LOMM), based on self-attention mechanism, works together with the BMM for learning multiple soft-orthogonal prototypes from latent objects. Then, the learned latent-object prototypes, which condense the general knowledge of objects, are used in a target object enhancement module (TOEM) to enhance the target-object prototype with the guidance of affinity-based scores. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate the superiority of the BLPLNet, which outperforms state-of-the-art methods by an average of 0.60% on PASCAL-5i. Ablation studies validate the effectiveness of each component, and visualization results indicate that the learned latent-object prototypes indeed convey the general knowledge of objects.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从背景和潜在对象中学习原型,实现少镜头语义分割
少镜头语义分割(few -shot semantic segmentation, FSS)的目的是在给定的图像中,利用少量的像素级标注对目标物体进行分割。现有的FSS框架主要关注目标区域来学习目标-对象原型,而直接忽略了非目标线索。因此,目标-对象原型不仅要分割目标对象,同时还要过滤掉非目标区域,从而产生大量的误报。在本文中,我们提出了一种背景和潜在对象原型学习网络(BLPLNet),它不仅可以从目标区域学习原型,还可以从非目标区域学习原型。从我们的角度来看,非目标区域被描绘成充满重复纹理和突出目标的背景,本文将其称为潜在目标。具体来说,开发了一个背景挖掘模块(BMM),专门通过情景学习来学习背景原型。学习后的背景原型代替目标对象原型进行背景滤波,减少了误报。此外,基于自注意机制的潜在目标挖掘模块(LOMM)与BMM协同工作,从潜在目标中学习多个软正交原型。然后,在目标对象增强模块(TOEM)中使用学习到的潜在对象原型,该原型浓缩了对象的一般知识,在基于亲和力分数的指导下对目标对象原型进行增强。在PASCAL-5i和COCO-20i数据集上的大量实验证明了BLPLNet的优越性,其在PASCAL-5i上的平均性能优于最先进的方法0.60%。消融研究验证了每个组件的有效性,可视化结果表明学习的潜在对象原型确实传达了对象的一般知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
期刊最新文献
Editorial Board Visual and textual spaces both matter: Taming CLIP for non-IID federated medical image classification Improved LSTNet-Driven hyperchaotic sequence optimization and its application in multi-Image encryption HiSURF: Hierarchical semantic-guided unified radiance field for generalizing across unseen scenes FasterGCN: Accelerating and enhancing graph convolutional network for recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1