Apply prior feature integration to sparse object detectors

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2024-10-31 DOI:10.1016/j.patcog.2024.111103

Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang

{"title":"Apply prior feature integration to sparse object detectors","authors":"Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang","doi":"10.1016/j.patcog.2024.111103","DOIUrl":null,"url":null,"abstract":"<div><div>Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111103"},"PeriodicalIF":7.5000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008549","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将先验特征整合应用于稀疏物体检测器

将噪声箱作为稀疏物体检测的查询对象已成为近年来的研究热点。稀疏 R-CNN 实现了从噪声盒到物体盒的一对一预测，而 DiffusionDet 则将稀疏 R-CNN 的预测过程转化为多个扩散过程。尤其是 Sparse R-CNN 及其改进版等算法，都是依靠 FPN 提取特征进行 ROI 对齐。但在 FPN 中，目标只匹配一个特征图，效率低且耗费资源。否则，这些方法（如稀疏对象检测）会从噪声盒中裁剪区域进行预测，导致盒无法捕捉全局特征。在这项工作中，我们对稀疏物体检测的检测范式进行了重新思考，并提出了两个改进方案，生成了一种新的物体检测器，称为 Prior Sparse R-CNN。首先，我们用一个只输出一个特征图的颈部来代替原来的 FPN 颈部，以提高效率。然后，我们在颈部之后设计了聚合编码器，通过扩张残差块和特征聚合来解决物体尺度问题。另一项改进是，我们为噪声盒引入了先验知识，以增强其对全局表征的理解。我们设计了区域生成网络（RGN）来生成全局对象信息，并将其与噪声盒的特征作为先验知识进行融合。先验稀疏 R-CNN 在 COCO 2017 验证集上达到了最先进的 47.0 AP，比使用 ResNet-50 骨干的 DiffusionDet 高出 1.5 AP。此外，我们的训练历时只需要 3/5 的时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.