Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang
{"title":"Apply prior feature integration to sparse object detectors","authors":"Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang","doi":"10.1016/j.patcog.2024.111103","DOIUrl":null,"url":null,"abstract":"<div><div>Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111103"},"PeriodicalIF":7.5000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008549","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.