Ning Li , Mingliang Wang , Gaochao Yang , Bo Li , Baohua Yuan , Shoukun Xu , Jun Qi
{"title":"Background suppression and comprehensive prototype pyramid distillation for few-shot object detection","authors":"Ning Li , Mingliang Wang , Gaochao Yang , Bo Li , Baohua Yuan , Shoukun Xu , Jun Qi","doi":"10.1016/j.robot.2025.104938","DOIUrl":null,"url":null,"abstract":"<div><div>Few-Shot Object Detection (FSOD) methods can achieve detection of novel classes with only a small number of annotated samples and have received widespread attention in recent years. Meta-learning has been proven to be a key technology for addressing few-shot problems. Typically, meta-learning-based methods require an additional support branch to extract class prototypes for the few-shot classes, and the detection head performs classification and detection by measuring the distance between the class prototypes and the query features. Since the input to the support branch is the object image annotated with a bounding box, it often contains a large amount of background information, which degrades the quality of the class prototypes. Through our meticulous observation, we found that the center of the bounding box is often the core feature area of the object. Based on this, we designed a lightweight Background Suppression (BS) module that suppresses background features by measuring the similarity between the peripheral and central features of the support features, thereby providing high-quality support features for class prototype extraction. Additionally, in terms of class prototype extraction, we designed a more robust Comprehensive Prototype Pyramid Distillation (CPPD) module. This module first captures the multi-scale feature information of the object from the background-suppressed support features, and then uses a pyramid structure to hierarchically distill the multi-scale features to extract more comprehensive and purer class prototypes. Extensive experimental results on the PASCAL VOC and COCO datasets show that compared to other models under the same architecture and techniques, we achieved the best results.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"187 ","pages":"Article 104938"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025000247","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Few-Shot Object Detection (FSOD) methods can achieve detection of novel classes with only a small number of annotated samples and have received widespread attention in recent years. Meta-learning has been proven to be a key technology for addressing few-shot problems. Typically, meta-learning-based methods require an additional support branch to extract class prototypes for the few-shot classes, and the detection head performs classification and detection by measuring the distance between the class prototypes and the query features. Since the input to the support branch is the object image annotated with a bounding box, it often contains a large amount of background information, which degrades the quality of the class prototypes. Through our meticulous observation, we found that the center of the bounding box is often the core feature area of the object. Based on this, we designed a lightweight Background Suppression (BS) module that suppresses background features by measuring the similarity between the peripheral and central features of the support features, thereby providing high-quality support features for class prototype extraction. Additionally, in terms of class prototype extraction, we designed a more robust Comprehensive Prototype Pyramid Distillation (CPPD) module. This module first captures the multi-scale feature information of the object from the background-suppressed support features, and then uses a pyramid structure to hierarchically distill the multi-scale features to extract more comprehensive and purer class prototypes. Extensive experimental results on the PASCAL VOC and COCO datasets show that compared to other models under the same architecture and techniques, we achieved the best results.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.