Scene Graph Prediction with Limited Labels.

Proceedings. IEEE International Conference on Computer Vision Pub Date : 2019-10-01 Epub Date: 2020-02-27 DOI:10.1109/iccv.2019.00267

Vincent S Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Ré, Li Fei-Fei

{"title":"Scene Graph Prediction with Limited Labels.","authors":"Vincent S Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Ré, Li Fei-Fei","doi":"10.1109/iccv.2019.00267","DOIUrl":null,"url":null,"abstract":"Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R2 = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.","PeriodicalId":74564,"journal":{"name":"Proceedings. IEEE International Conference on Computer Vision","volume":"2019 ","pages":"2580-2590"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/iccv.2019.00267","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccv.2019.00267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/2/27 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Visual knowledge bases such as Visual Genome power numerous applications in computer vision, including visual question answering and captioning, but suffer from sparse, incomplete relationships. All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each. Hiring human annotators is expensive, and using textual knowledge base completion methods are incompatible with visual data. In this paper, we introduce a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few' labeled examples. We analyze visual relationships to suggest two types of image-agnostic features that are used to generate noisy heuristics, whose outputs are aggregated using a factor graph-based generative model. With as few as 10 labeled examples per relationship, the generative model creates enough training data to train any existing state-of-the-art scene graph model. We demonstrate that our method outperforms all baseline approaches on scene graph prediction by 5.16 recall@ 100 for PREDCLS. In our limited label setting, we define a complexity metric for relationships that serves as an indicator (R² = 0.778) for conditions under which our method succeeds over transfer learning, the de-facto approach for training with limited labels.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有限标签的场景图预测。

视觉知识库(如Visual Genome)为计算机视觉领域的许多应用提供了动力，包括视觉问答和字幕，但存在稀疏、不完整的关系。到目前为止，所有的场景图模型都局限于训练一小组视觉关系，每个视觉关系都有数千个训练标签。雇用人工注释者是昂贵的，并且使用文本知识库补全方法与可视化数据不兼容。在本文中，我们引入了一种半监督方法，该方法使用很少的“标记示例”为大量未标记的图像分配概率关系标签。我们分析了视觉关系，提出了两种类型的图像不可知特征，用于生成噪声启发式，其输出使用基于因子图的生成模型进行聚合。每个关系只需10个标记示例，生成模型就可以创建足够的训练数据来训练任何现有的最先进的场景图模型。我们证明，我们的方法在PREDCLS的场景图预测上优于所有基线方法5.16 recall@ 100。在我们的有限标签设置中，我们定义了关系的复杂性度量，作为我们的方法优于迁移学习的条件的指标(R2 = 0.778)，迁移学习是使用有限标签进行训练的实际方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. IEEE International Conference on Computer Vision

自引率

0.00%

发文量