IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-10-07 DOI:10.48550/arXiv.2210.03477

Sheng Xu, Yanjing Li, Bo-Wen Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Penglei Gao, Jinhu Lv

{"title":"IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors","authors":"Sheng Xu, Yanjing Li, Bo-Wen Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Penglei Gao, Jinhu Lv","doi":"10.48550/arXiv.2210.03477","DOIUrl":null,"url":null,"abstract":"Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"60 1","pages":"346-361"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.03477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

IDa-Det:用于1位检测器的信息差异感知蒸馏

知识蒸馏(KD)已被证明是训练紧凑目标检测模型的有用方法。然而，我们观察到，当教师模型和学生模型共享相似的提议信息时，KD通常是有效的。这解释了为什么现有的KD方法对1位检测器的有效性较低，这是由于实值教师和1位学生之间存在显着的信息差异。本文提出了一种信息差异感知策略(IDa-Det)来提取1位检测器，该检测器可以有效地消除信息差异，并显着降低1位检测器与实值检测器之间的性能差距。我们将蒸馏过程制定为一个双层优化配方。在内部层面，我们选择信息差异最大的具有代表性的提案。然后，我们引入了一种新的熵蒸馏损失来减小基于所选建议的差异。大量的实验证明了IDa-Det在PASCAL VOC和COCO数据集上优于最先进的1位检测器和KD方法。IDa-Det在具有ResNet-18骨干网的1位更快的rcnn上实现了76.9%的mAP。我们的代码在https://github.com/SteveTsui/IDa-Det上是开源的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量