Bidirectional Mask Selection for Zero-Shot Referring Image Segmentation

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-09-16 DOI:10.1109/TCSVT.2024.3460874
Wenhui Li;Chao Pang;Weizhi Nie;Hongshuo Tian;An-An Liu
{"title":"Bidirectional Mask Selection for Zero-Shot Referring Image Segmentation","authors":"Wenhui Li;Chao Pang;Weizhi Nie;Hongshuo Tian;An-An Liu","doi":"10.1109/TCSVT.2024.3460874","DOIUrl":null,"url":null,"abstract":"Zero-shot referring image segmentation (RIS) aims to segment a referent mask via a natural language expression, without any training. Although existing research has made some progress, the lack of a training process in zero-shot learning results in insufficient information, leading to poor zero-shot segmentation performance. We propose a Bidirectional Mask Selection (BMS) framework, which is the first work to incorporate the negative masks into zero-shot RIS. Our idea is based on leveraging the negative masks’ semantic context information around target semantic to enhance the understanding of cross-modal fine-grained correlation. Further, we propose a novel mask adaptive fusion strategy to combine the complementary information from positive and negative masks without additional training. In the experiments, BMS has demonstrated outstanding performance on three prominent RIS datasets, and it has surpassed even the most advanced weakly supervised methods on the RefCOCOg datasets. Code will be available at <uri>https://github.com/pcc-99/BMS</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"911-921"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10680572/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Zero-shot referring image segmentation (RIS) aims to segment a referent mask via a natural language expression, without any training. Although existing research has made some progress, the lack of a training process in zero-shot learning results in insufficient information, leading to poor zero-shot segmentation performance. We propose a Bidirectional Mask Selection (BMS) framework, which is the first work to incorporate the negative masks into zero-shot RIS. Our idea is based on leveraging the negative masks’ semantic context information around target semantic to enhance the understanding of cross-modal fine-grained correlation. Further, we propose a novel mask adaptive fusion strategy to combine the complementary information from positive and negative masks without additional training. In the experiments, BMS has demonstrated outstanding performance on three prominent RIS datasets, and it has surpassed even the most advanced weakly supervised methods on the RefCOCOg datasets. Code will be available at https://github.com/pcc-99/BMS.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于零镜头参考图像分割的双向掩码选择
零镜头参考图像分割(Zero-shot reference image segmentation, RIS)的目的是在不经过任何训练的情况下,通过自然语言表达对参考蒙版进行分割。虽然已有的研究取得了一定的进展,但是在零采样学习中缺乏训练过程,导致信息不足,导致零采样分割性能不佳。我们提出了一个双向掩模选择(BMS)框架,这是第一个将负掩模纳入零拍RIS的工作。我们的想法是基于利用负掩码在目标语义周围的语义上下文信息来增强对跨模态细粒度相关性的理解。此外,我们提出了一种新的掩模自适应融合策略,在不需要额外训练的情况下,将正掩模和负掩模的互补信息结合在一起。在实验中,BMS在三个突出的RIS数据集上表现出了出色的性能,并且在RefCOCOg数据集上甚至超过了最先进的弱监督方法。代码将在https://github.com/pcc-99/BMS上提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
期刊最新文献
IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information 2025 Index IEEE Transactions on Circuits and Systems for Video Technology IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1