精心设计的教师:利用丰富的图像开发改进半监督物体检测

IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-09-02 DOI:10.1109/TMM.2024.3453040
Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao
{"title":"精心设计的教师:利用丰富的图像开发改进半监督物体检测","authors":"Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao","doi":"10.1109/TMM.2024.3453040","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the \n<italic>PASCAL VOC</i>\n and \n<italic>MS-COCO</i>\n datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on \n<italic>MS-COCO</i>\n, respectively.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11345-11357"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Elaborate Teacher: Improved Semi-Supervised Object Detection With Rich Image Exploiting\",\"authors\":\"Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao\",\"doi\":\"10.1109/TMM.2024.3453040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the \\n<italic>PASCAL VOC</i>\\n and \\n<italic>MS-COCO</i>\\n datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on \\n<italic>MS-COCO</i>\\n, respectively.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"11345-11357\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10663070/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663070/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

半监督物体检测(SSOD)通过利用师生框架下的图像对,取得了显著效果。优秀的强增强方法可以生成更丰富的图像,并减轻伪标签中噪声的影响。然而,现有的 SSOD 数据增强方法没有考虑实例级信息,因此无法充分利用未标记数据。此外,目前 SSOD 中的师生框架仅依赖于伪标签技术,这可能会忽略一些不确定的信息。在本文中,我们介绍了一种名为 "精心设计的教师 "的新方法,它能以更精细的方式生成和利用图像对。为了丰富强增强图像,我们提出了一种名为 "信息感知混合表示法"(IAMR)的新型数据增强方法。IAMR 利用教师模型的预测作为先验信息,并考虑实例级信息,可与现有的 SSOD 数据增强方法无缝集成。此外,为了充分利用未标记数据中的信息,我们提出了增强尺度一致性正则化(ESCR),它同时考虑了语义空间和特征空间的一致性。阐释老师介绍了一种全新的数据增强方法,并辅以一致性正则化,从而提高了半监督对象检测器的性能。在 PASCAL VOC 和 MS-COCO 数据集上进行的大量实验证明了我们的方法在利用未标记图像信息方面的有效性。在 MS-COCO 数据集上使用 5% 和 10% 的标记数据时,我们的方法始终优于基线方法,相对于监督基线方法,mAP 分别提高了 11.6% 和 9.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Elaborate Teacher: Improved Semi-Supervised Object Detection With Rich Image Exploiting
Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the PASCAL VOC and MS-COCO datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on MS-COCO , respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
期刊最新文献
Improving Network Interpretability via Explanation Consistency Evaluation Deep Mutual Distillation for Unsupervised Domain Adaptation Person Re-identification Collaborative License Plate Recognition via Association Enhancement Network With Auxiliary Learning and a Unified Benchmark VLDadaptor: Domain Adaptive Object Detection With Vision-Language Model Distillation Camera-Incremental Object Re-Identification With Identity Knowledge Evolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1