You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI:10.1109/CVPR.2019.00964

Krishna Kumar Singh, Yong Jae Lee

{"title":"You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection","authors":"Krishna Kumar Singh, Yong Jae Lee","doi":"10.1109/CVPR.2019.00964","DOIUrl":null,"url":null,"abstract":"We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. Existing weakly-supervised detection approaches use off-the-shelf proposal methods like edge boxes or selective search to obtain candidate boxes. These methods provide high recall but at the expense of thousands of noisy proposals. Thus, the entire burden of finding the few relevant object regions is left to the ensuing object mining step. To mitigate this issue, we focus instead on improving the precision of the initial candidate object proposals. Since we cannot rely on localization annotations, we turn to video and leverage motion cues to automatically estimate the extent of objects to train a Weakly-supervised Region Proposal Network (W-RPN). We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap. Our W-RPN proposals lead to significant improvement in performance for state-of-the-art weakly-supervised object detection approaches on PASCAL VOC 2007 and 2012.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"125 4","pages":"9406-9414"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2019.00964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. Existing weakly-supervised detection approaches use off-the-shelf proposal methods like edge boxes or selective search to obtain candidate boxes. These methods provide high recall but at the expense of thousands of noisy proposals. Thus, the entire burden of finding the few relevant object regions is left to the ensuing object mining step. To mitigate this issue, we focus instead on improving the precision of the initial candidate object proposals. Since we cannot rely on localization annotations, we turn to video and leverage motion cues to automatically estimate the extent of objects to train a Weakly-supervised Region Proposal Network (W-RPN). We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap. Our W-RPN proposals lead to significant improvement in performance for state-of-the-art weakly-supervised object detection approaches on PASCAL VOC 2007 and 2012.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一分耕耘一分收获:使用视频为弱监督对象检测生成高精度对象建议

我们提出了一种利用视频获取高精度目标建议的新方法，用于弱监督目标检测。现有的弱监督检测方法使用现成的建议方法，如边缘盒或选择性搜索来获得候选盒。这些方法提供了高召回率，但代价是成千上万的噪声提议。因此，寻找少数相关对象区域的全部负担留给了随后的对象挖掘步骤。为了缓解这个问题，我们将重点放在提高初始候选对象建议的精度上。由于我们不能依赖于定位注释，我们转向视频并利用运动线索来自动估计对象的范围，以训练弱监督区域建议网络(W-RPN)。我们使用W-RPN来生成高精度的目标建议，然后根据它们的空间重叠来重新排序高召回率的建议，如边缘盒或选择性搜索。我们的W-RPN提案在PASCAL VOC 2007和2012上显著改善了最先进的弱监督目标检测方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

Multi-Level Context Ultra-Aggregation for Stereo Matching Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting Incremental Object Learning From Contiguous Views Progressive Teacher-Student Learning for Early Action Prediction Inverse Discriminative Networks for Handwritten Signature Verification