Universal Relocalizer for Weakly Supervised Referring Expression Grounding

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-04-04 DOI:10.1145/3656045

Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie

{"title":"Universal Relocalizer for Weakly Supervised Referring Expression Grounding","authors":"Panpan Zhang, Meng Liu, Xuemeng Song, Da Cao, Zan Gao, Liqiang Nie","doi":"10.1145/3656045","DOIUrl":null,"url":null,"abstract":"<p>This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key modules: the category module, the color module, and the spatial relationship module. The category and color modules assign respective category and color labels to region proposals, enabling the computation of category and color scores. Simultaneously, the spatial relationship module integrates spatial cues, yielding a spatial score for each proposal to enhance localization accuracy further. By adeptly amalgamating the category, color, and spatial scores, we derive a refined grounding score for every proposal. Comprehensive evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets manifest the prowess of the Universal Relocalizer, showcasing its formidable performance across the board.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"17 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656045","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces the Universal Relocalizer, a novel approach designed for weakly supervised referring expression grounding. Our method strives to pinpoint a target proposal that corresponds to a specific query, eliminating the need for region-level annotations during training. To bolster the localization precision and enrich the semantic understanding of the target proposal, we devise three key modules: the category module, the color module, and the spatial relationship module. The category and color modules assign respective category and color labels to region proposals, enabling the computation of category and color scores. Simultaneously, the spatial relationship module integrates spatial cues, yielding a spatial score for each proposal to enhance localization accuracy further. By adeptly amalgamating the category, color, and spatial scores, we derive a refined grounding score for every proposal. Comprehensive evaluations on the RefCOCO, RefCOCO+, and RefCOCOg datasets manifest the prowess of the Universal Relocalizer, showcasing its formidable performance across the board.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于弱监督引用表达接地的通用重定位器

本文介绍了通用重定位器（Universal Relocalizer），这是一种专为弱监督引用表达接地而设计的新方法。我们的方法致力于精确定位与特定查询相对应的目标建议，在训练过程中无需区域级注释。为了提高定位精度并丰富对目标建议的语义理解，我们设计了三个关键模块：类别模块、颜色模块和空间关系模块。类别和颜色模块分别为区域提案分配类别和颜色标签，从而计算出类别和颜色分数。与此同时，空间关系模块整合空间线索，为每个建议得出空间分数，从而进一步提高定位精度。通过将类别、颜色和空间得分巧妙地结合在一起，我们为每个建议得出了一个精细的定位得分。在 RefCOCO、RefCOCO+ 和 RefCOCOg 数据集上进行的综合评估体现了通用重定位器的实力，展示了其强大的全面性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.