基于混合监督的少镜头标注目标检测实例学习

Proceedings of the 30th ACM International Conference on Multimedia Pub Date : 2022-10-10 DOI:10.1145/3503161.3548242

Yi Zhong, Chengyao Wang, Shiyong Li, Zhuyun Zhou, Yaowei Wang, Weishi Zheng

{"title":"基于混合监督的少镜头标注目标检测实例学习","authors":"Yi Zhong, Chengyao Wang, Shiyong Li, Zhuyun Zhou, Yaowei Wang, Weishi Zheng","doi":"10.1145/3503161.3548242","DOIUrl":null,"url":null,"abstract":"Mixed supervision for object detection (MSOD) that utilizes image-level annotations and a small amount of instance-level annotations has emerged as an efficient tool by alleviating the requirement for a large amount of costly instance-level annotations and providing effective instance supervision on previous methods that only use image-level annotations. In this work, we introduce the mixed supervision instance learning (MSIL), as a novel MSOD framework to leverage a handful of instance-level annotations to provide more explicit and implicit supervision. Rather than just adding instance-level annotations directly on loss functions for detection, we aim to dig out more effective explicit and implicit relations between these two different level annotations. In particular, we firstly propose the Instance-Annotation Guided Image Classification strategy to provide explicit guidance from instance-level annotations by using positional relation to force the image classifier to focus on the proposals which contain the correct object. And then, in order to exploit more implicit interaction between the mixed annotations, an instance reproduction strategy guided by the extra instance-level annotations is developed for generating more accurate pseudo ground truth, achieving a more discriminative detector. Finally, a false target instance mining strategy is used to refine the above processing by enriching the number and diversity of training instances with the position and score information. Our experiments show that the proposed MSIL framework outperforms recent state-of-the-art mixed supervised detectors with a large margin on both the Pascal VOC2007 and the MS-COCO dataset.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Mixed Supervision for Instance Learning in Object Detection with Few-shot Annotation\",\"authors\":\"Yi Zhong, Chengyao Wang, Shiyong Li, Zhuyun Zhou, Yaowei Wang, Weishi Zheng\",\"doi\":\"10.1145/3503161.3548242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mixed supervision for object detection (MSOD) that utilizes image-level annotations and a small amount of instance-level annotations has emerged as an efficient tool by alleviating the requirement for a large amount of costly instance-level annotations and providing effective instance supervision on previous methods that only use image-level annotations. In this work, we introduce the mixed supervision instance learning (MSIL), as a novel MSOD framework to leverage a handful of instance-level annotations to provide more explicit and implicit supervision. Rather than just adding instance-level annotations directly on loss functions for detection, we aim to dig out more effective explicit and implicit relations between these two different level annotations. In particular, we firstly propose the Instance-Annotation Guided Image Classification strategy to provide explicit guidance from instance-level annotations by using positional relation to force the image classifier to focus on the proposals which contain the correct object. And then, in order to exploit more implicit interaction between the mixed annotations, an instance reproduction strategy guided by the extra instance-level annotations is developed for generating more accurate pseudo ground truth, achieving a more discriminative detector. Finally, a false target instance mining strategy is used to refine the above processing by enriching the number and diversity of training instances with the position and score information. Our experiments show that the proposed MSIL framework outperforms recent state-of-the-art mixed supervised detectors with a large margin on both the Pascal VOC2007 and the MS-COCO dataset.\",\"PeriodicalId\":412792,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503161.3548242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

利用图像级注释和少量实例级注释的混合监督对象检测(MSOD)已经成为一种有效的工具，它减轻了对大量昂贵的实例级注释的需求，并对以前只使用图像级注释的方法提供了有效的实例监督。在这项工作中，我们引入了混合监督实例学习(MSIL)，作为一种新的MSOD框架，利用少量实例级注释来提供更显式和隐式的监督。我们的目标不是直接在损失函数上添加实例级注释来进行检测，而是在这两种不同级别的注释之间挖掘出更有效的显式和隐式关系。特别地，我们首次提出了实例-注释引导图像分类策略，通过使用位置关系来强制图像分类器关注包含正确对象的建议，从而从实例级注释提供明确的指导。然后，为了利用混合注释之间更多的隐式交互，开发了一种由额外的实例级注释引导的实例复制策略，以生成更准确的伪基础真值，实现了更具判别性的检测器。最后，采用假目标实例挖掘策略，利用位置和分数信息丰富训练实例的数量和多样性，对上述处理进行细化。我们的实验表明，所提出的MSIL框架在Pascal VOC2007和MS-COCO数据集上都以很大的优势优于最近最先进的混合监督检测器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mixed Supervision for Instance Learning in Object Detection with Few-shot Annotation

Mixed supervision for object detection (MSOD) that utilizes image-level annotations and a small amount of instance-level annotations has emerged as an efficient tool by alleviating the requirement for a large amount of costly instance-level annotations and providing effective instance supervision on previous methods that only use image-level annotations. In this work, we introduce the mixed supervision instance learning (MSIL), as a novel MSOD framework to leverage a handful of instance-level annotations to provide more explicit and implicit supervision. Rather than just adding instance-level annotations directly on loss functions for detection, we aim to dig out more effective explicit and implicit relations between these two different level annotations. In particular, we firstly propose the Instance-Annotation Guided Image Classification strategy to provide explicit guidance from instance-level annotations by using positional relation to force the image classifier to focus on the proposals which contain the correct object. And then, in order to exploit more implicit interaction between the mixed annotations, an instance reproduction strategy guided by the extra instance-level annotations is developed for generating more accurate pseudo ground truth, achieving a more discriminative detector. Finally, a false target instance mining strategy is used to refine the above processing by enriching the number and diversity of training instances with the position and score information. Our experiments show that the proposed MSIL framework outperforms recent state-of-the-art mixed supervised detectors with a large margin on both the Pascal VOC2007 and the MS-COCO dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 30th ACM International Conference on Multimedia

自引率

0.00%

发文量