精心设计的教师：利用丰富的图像开发改进半监督物体检测

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-09-02 DOI:10.1109/TMM.2024.3453040

Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao

{"title":"精心设计的教师：利用丰富的图像开发改进半监督物体检测","authors":"Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao","doi":"10.1109/TMM.2024.3453040","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the \n<italic>PASCAL VOC\n and \n<italic>MS-COCO\n datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on \n<italic>MS-COCO\n, respectively.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11345-11357"},"PeriodicalIF":8.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Elaborate Teacher: Improved Semi-Supervised Object Detection With Rich Image Exploiting\",\"authors\":\"Xi Yang;Qiubai Zhou;Ziyu Wei;Hong Liu;Nannan Wang;Xinbo Gao\",\"doi\":\"10.1109/TMM.2024.3453040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the \\n<italic>PASCAL VOC\\n and \\n<italic>MS-COCO\\n datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on \\n<italic>MS-COCO\\n, respectively.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"11345-11357\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10663070/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663070/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

半监督物体检测（SSOD）通过利用师生框架下的图像对，取得了显著效果。优秀的强增强方法可以生成更丰富的图像，并减轻伪标签中噪声的影响。然而，现有的 SSOD 数据增强方法没有考虑实例级信息，因此无法充分利用未标记数据。此外，目前 SSOD 中的师生框架仅依赖于伪标签技术，这可能会忽略一些不确定的信息。在本文中，我们介绍了一种名为 "精心设计的教师 "的新方法，它能以更精细的方式生成和利用图像对。为了丰富强增强图像，我们提出了一种名为 "信息感知混合表示法"（IAMR）的新型数据增强方法。IAMR 利用教师模型的预测作为先验信息，并考虑实例级信息，可与现有的 SSOD 数据增强方法无缝集成。此外，为了充分利用未标记数据中的信息，我们提出了增强尺度一致性正则化（ESCR），它同时考虑了语义空间和特征空间的一致性。阐释老师介绍了一种全新的数据增强方法，并辅以一致性正则化，从而提高了半监督对象检测器的性能。在 PASCAL VOC 和 MS-COCO 数据集上进行的大量实验证明了我们的方法在利用未标记图像信息方面的有效性。在 MS-COCO 数据集上使用 5% 和 10% 的标记数据时，我们的方法始终优于基线方法，相对于监督基线方法，mAP 分别提高了 11.6% 和 9.0%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Elaborate Teacher: Improved Semi-Supervised Object Detection With Rich Image Exploiting

Semi-Supervised Object Detection (SSOD) has shown remarkable results by leveraging image pairs with a teacher-student framework. An excellent strong augmentation method can generate richer images and alleviate the influence of noise in pseudo-labels. However, existing data augmentation methods for SSOD do not consider instance-level information, thus, they cannot make full use of unlabeled data. Besides, the current teacher-student framework in SSOD solely relies on pseudo-labeling techniques, which may disregard some uncertain information. In this article, we introduce a new method called Elaborate Teacher which generates and exploits image pairs in a more refined manner. To enrich strongly augmented images, a novel data augmentation method called Information-Aware Mixup Representation (IAMR) is proposed. IAMR utilizes the teacher model's predictions as prior information and considers instance-level information, which can be seamlessly integrated with existing SSOD data augmentation methods. Furthermore, to fully exploit the information in unlabeled data, we propose the Enhanced Scale Consistency Regularization (ESCR), which considers the consistency from both semantic space and feature space. Elaborate Teacher introduces a fresh data augmentation method, complemented by consistency regularization, which boosts the performance of semi-supervised object detectors. Extensive experiments on the PASCAL VOC and MS-COCO datasets demonstrate the effectiveness of our method in leveraging unlabeled image information. Our method consistently outperforms the baseline method and improves mAP by 11.6% and 9.0% relative to the supervised baseline method when using 5% and 10% of labeled data on MS-COCO , respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.

期刊最新文献

Frequency-Guided Spatial Adaptation for Camouflaged Object Detection Cross-Scatter Sparse Dictionary Pair Learning for Cross-Domain Classification DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization List of Reviewers Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding