Bilateral-Head Region-Based Convolutional Neural Networks: A Unified Approach for Incremental Few-Shot Object Detection

IEEE transactions on artificial intelligence Pub Date : 2024-03-26 DOI:10.1109/TAI.2024.3381919

Yiting Li;Haiyue Zhu;Sichao Tian;Jun Ma;Cheng Xiang;Prahlad Vadakkepat

{"title":"Bilateral-Head Region-Based Convolutional Neural Networks: A Unified Approach for Incremental Few-Shot Object Detection","authors":"Yiting Li;Haiyue Zhu;Sichao Tian;Jun Ma;Cheng Xiang;Prahlad Vadakkepat","doi":"10.1109/TAI.2024.3381919","DOIUrl":null,"url":null,"abstract":"Practical object detection systems are highly desired to be open-ended for learning on frequently evolved datasets. Moreover, learning with little supervision further adds flexibility for real-world applications such as autonomous driving and robotics, where large-scale datasets could be prohibitive or expensive to obtain. However, continual adaption with small training examples often results in catastrophic forgetting and dramatic overfitting. To address such issues, a compositional learning system is proposed to enable effective incremental object detection from nonstationary and few-shot data streams. First of all, a novel bilateral–head framework is proposed to decouple the representation learning of base (pretrained) and novel (few-shot) classes into separate embedding spaces, which takes care of novel concept integration and base knowledge retention simultaneously. Moreover, to enhance learning stability, a robust parameter updating rule, i.e., recall and progress mechanism, is carried out to constrain the optimization trajectory of sequential model adaption. Beyond that, to enforce intertask class discrimination with little memory burden, we present a between-class regularization method that expands the decision space of few-shot classes for constructing unbiased feature representation. Final, we deeply investigate the incomplete annotation issue considering the realistic scenario of incremental few-shot object detection (iFSOD) and propose a semisupervised object labeling mechanism to accurately recover the missing annotations for previously encountered classes, which further enhances the robustness of the target detector to counteract catastrophic forgetting. Extensive experiments conducted on both Pascal visual object classes dataset (VOC) and microsoft common objects in context dataset (MS-COCO) datasets demonstrate the effectiveness of our method.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 9","pages":"4376-4390"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10480289/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Practical object detection systems are highly desired to be open-ended for learning on frequently evolved datasets. Moreover, learning with little supervision further adds flexibility for real-world applications such as autonomous driving and robotics, where large-scale datasets could be prohibitive or expensive to obtain. However, continual adaption with small training examples often results in catastrophic forgetting and dramatic overfitting. To address such issues, a compositional learning system is proposed to enable effective incremental object detection from nonstationary and few-shot data streams. First of all, a novel bilateral–head framework is proposed to decouple the representation learning of base (pretrained) and novel (few-shot) classes into separate embedding spaces, which takes care of novel concept integration and base knowledge retention simultaneously. Moreover, to enhance learning stability, a robust parameter updating rule, i.e., recall and progress mechanism, is carried out to constrain the optimization trajectory of sequential model adaption. Beyond that, to enforce intertask class discrimination with little memory burden, we present a between-class regularization method that expands the decision space of few-shot classes for constructing unbiased feature representation. Final, we deeply investigate the incomplete annotation issue considering the realistic scenario of incremental few-shot object detection (iFSOD) and propose a semisupervised object labeling mechanism to accurately recover the missing annotations for previously encountered classes, which further enhances the robustness of the target detector to counteract catastrophic forgetting. Extensive experiments conducted on both Pascal visual object classes dataset (VOC) and microsoft common objects in context dataset (MS-COCO) datasets demonstrate the effectiveness of our method.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于双侧头部区域的卷积神经网络：增量少拍物体检测的统一方法

人们非常希望实用的物体检测系统是开放式的，以便在频繁变化的数据集上进行学习。此外，在自动驾驶和机器人等现实世界应用中，大规模数据集的获取可能过于昂贵或令人望而却步。然而，使用少量训练实例进行持续适应往往会导致灾难性遗忘和严重的过拟合。为了解决这些问题，我们提出了一种组合学习系统，以便从非稳态和少量数据流中实现有效的增量目标检测。首先，我们提出了一个新颖的双边头框架，将基础类（预训练）和新类（少量拍摄）的表征学习分离到不同的嵌入空间，从而同时兼顾新概念整合和基础知识保留。此外，为了增强学习的稳定性，还采用了一种稳健的参数更新规则，即召回和进步机制，来约束顺序模型自适应的优化轨迹。此外，为了在减轻记忆负担的情况下实现任务间的类别区分，我们提出了一种类别间正则化方法，该方法扩展了少拍类别的决策空间，以构建无偏的特征表示。最后，我们深入研究了增量少拍目标检测（iFSOD）现实场景中的不完整注释问题，并提出了一种半监督目标标注机制，以准确恢复之前遇到的类的缺失注释，从而进一步增强目标检测器的鲁棒性，抵御灾难性遗忘。在帕斯卡视觉对象类数据集（VOC）和微软上下文中的常见对象数据集（MS-COCO）上进行的大量实验证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊