Scene Graph Prediction with Limited Labels.

Vincent S Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Ré, Li Fei-Fei
{"title":"Scene Graph Prediction with Limited Labels.","authors":"Vincent S Chen,&nbsp;Paroma Varma,&nbsp;Ranjay Krishna,&nbsp;Michael Bernstein,&nbsp;Christopher Ré,&nbsp;Li Fei-Fei","doi":"10.1109/iccv.2019.00267","DOIUrl":null,"url":null,"abstract":"<p><p>Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R<sup>2</sup> = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.</p>","PeriodicalId":74564,"journal":{"name":"Proceedings. IEEE International Conference on Computer Vision","volume":"2019 ","pages":"2580-2590"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/iccv.2019.00267","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccv.2019.00267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/2/27 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R2 = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
有限标签的场景图预测。
视觉知识库(如Visual Genome)为计算机视觉领域的许多应用提供了动力,包括视觉问答和字幕,但存在稀疏、不完整的关系。到目前为止,所有的场景图模型都局限于训练一小组视觉关系,每个视觉关系都有数千个训练标签。雇用人工注释者是昂贵的,并且使用文本知识库补全方法与可视化数据不兼容。在本文中,我们引入了一种半监督方法,该方法使用很少的“标记示例”为大量未标记的图像分配概率关系标签。我们分析了视觉关系,提出了两种类型的图像不可知特征,用于生成噪声启发式,其输出使用基于因子图的生成模型进行聚合。每个关系只需10个标记示例,生成模型就可以创建足够的训练数据来训练任何现有的最先进的场景图模型。我们证明,我们的方法在PREDCLS的场景图预测上优于所有基线方法5.16 recall@ 100。在我们的有限标签设置中,我们定义了关系的复杂性度量,作为我们的方法优于迁移学习的条件的指标(R2 = 0.778),迁移学习是使用有限标签进行训练的实际方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PGFed: Personalize Each Client's Global Objective for Federated Learning. The Devil is in the Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior. Enhancing Modality-Agnostic Representations via Meta-learning for Brain Tumor Segmentation. SimpleClick: Interactive Image Segmentation with Simple Vision Transformers. Improving Representation Learning for Histopathologic Images with Cluster Constraints.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1