Xingzhou Xu , Zhaoyong Mao , Xin Wang , Qinhao Tu , Junge Shen
{"title":"Dynamic Anchor: Density Map Guided Small Object Detector for Tiny Persons","authors":"Xingzhou Xu , Zhaoyong Mao , Xin Wang , Qinhao Tu , Junge Shen","doi":"10.1016/j.cviu.2025.104325","DOIUrl":null,"url":null,"abstract":"<div><div>With the application of aerial and space-based equipments, such as drones in the search and rescue process, there is an increasing demand on the detection of small and even tiny human targets. However, most existing detectors rely on generating smaller and denser anchors for small target detection, which introduces a high number of redundant negative anchor samples. To alleviate this issue, we propose a novel density map-guided tiny person detector with dynamic anchor. Specifically, we elaborately design an Anchor Proposals Mask (APM) module to effectively eliminate negative anchor samples and adaptively adjust anchor distribution with the guidance of density maps produced by Density Map Generator (DMG). To promote the quality of the density map, we develop a Multi-Scale Feature Distillation (MSFD) module and incorporate the Focal Inverse Distance Transform (FIDT) map to conduct knowledge distillation for DMG with the assistance of the crowd counting network. Extensive experiments on the TinyPerson and VisDrone datasets demonstrate that our method significantly enhances the performance of two-stage detectors in terms of average precision (AP) and average recall (AR) while effectively reducing the impact of negative anchor boxes.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"255 ","pages":"Article 104325"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000487","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With the application of aerial and space-based equipments, such as drones in the search and rescue process, there is an increasing demand on the detection of small and even tiny human targets. However, most existing detectors rely on generating smaller and denser anchors for small target detection, which introduces a high number of redundant negative anchor samples. To alleviate this issue, we propose a novel density map-guided tiny person detector with dynamic anchor. Specifically, we elaborately design an Anchor Proposals Mask (APM) module to effectively eliminate negative anchor samples and adaptively adjust anchor distribution with the guidance of density maps produced by Density Map Generator (DMG). To promote the quality of the density map, we develop a Multi-Scale Feature Distillation (MSFD) module and incorporate the Focal Inverse Distance Transform (FIDT) map to conduct knowledge distillation for DMG with the assistance of the crowd counting network. Extensive experiments on the TinyPerson and VisDrone datasets demonstrate that our method significantly enhances the performance of two-stage detectors in terms of average precision (AP) and average recall (AR) while effectively reducing the impact of negative anchor boxes.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems