Nan Zhang , Borui Chai , Jiamin Song , Tian Tian , Pengfei Zhu , Jiayi Ma , Jinwen Tian
{"title":"Omni-Scene Infrared Vehicle Detection: An Efficient Selective Aggregation approach and a unified benchmark","authors":"Nan Zhang , Borui Chai , Jiamin Song , Tian Tian , Pengfei Zhu , Jiayi Ma , Jinwen Tian","doi":"10.1016/j.isprsjprs.2025.03.002","DOIUrl":null,"url":null,"abstract":"<div><div>Vehicle detection in infrared aerial imagery is essential for military and civilian applications due to its effectiveness in low-light and adverse scenarios. However, the low spectral and pixel resolution of long-wave infrared (LWIR) results in limited information compared to visible light, causing significant background interference. Moreover, varying thermal radiation from vehicle movement and environmental factors creates diverse vehicle patterns, complicating accurate detection and recognition. To address these challenges, we propose the Omni-Scene Infrared Vehicle Detection Network (OSIV-Net), a framework optimized for scene adaptability in infrared vehicle detection. The core architecture of OSIV-Net employs Efficient Selective Aggregation Blocks (ESABlocks), combining Anchor-Adaptive Convolution (A<sup>2</sup>Conv) in shallow layers and the Magic Cube Module (MCM) in deeper layers to accurately capture and selectively aggregate features. A<sup>2</sup>Conv captures the local intrinsic and variable patterns of vehicles by combining differential and dynamic convolutions, while MCM flexibly integrates global features from three dimensions of the feature map. In addition, we constructed the Omni-Scene Infrared Vehicle (OSIV) dataset, the most comprehensive infrared aerial vehicle dataset to date, with 39,583 images spanning nine distinct scenes and over 617,000 annotated vehicle instances across five categories, providing a robust benchmark for advancing infrared vehicle detection across varied environments. Experimental results on the DroneVehicle and OSIV datasets demonstrate that OSIV-Net achieves state-of-the-art (SOTA) performance and outperforms across various scenarios. Specifically, it attains 82.60% [email protected] on the DroneVehicle dataset, surpassing the previous infrared modality SOTA method DTNet by +4.27% and the multi-modal SOTA method MGMF by +2.3%. On the OSIV dataset, it attains an average performance of 78.14% across all scenarios, outperforming DTNet by +6.13%. The dataset and code can be downloaded from <span><span>https://github.com/rslab1111/OSIV</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 244-260"},"PeriodicalIF":10.6000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625000929","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Vehicle detection in infrared aerial imagery is essential for military and civilian applications due to its effectiveness in low-light and adverse scenarios. However, the low spectral and pixel resolution of long-wave infrared (LWIR) results in limited information compared to visible light, causing significant background interference. Moreover, varying thermal radiation from vehicle movement and environmental factors creates diverse vehicle patterns, complicating accurate detection and recognition. To address these challenges, we propose the Omni-Scene Infrared Vehicle Detection Network (OSIV-Net), a framework optimized for scene adaptability in infrared vehicle detection. The core architecture of OSIV-Net employs Efficient Selective Aggregation Blocks (ESABlocks), combining Anchor-Adaptive Convolution (A2Conv) in shallow layers and the Magic Cube Module (MCM) in deeper layers to accurately capture and selectively aggregate features. A2Conv captures the local intrinsic and variable patterns of vehicles by combining differential and dynamic convolutions, while MCM flexibly integrates global features from three dimensions of the feature map. In addition, we constructed the Omni-Scene Infrared Vehicle (OSIV) dataset, the most comprehensive infrared aerial vehicle dataset to date, with 39,583 images spanning nine distinct scenes and over 617,000 annotated vehicle instances across five categories, providing a robust benchmark for advancing infrared vehicle detection across varied environments. Experimental results on the DroneVehicle and OSIV datasets demonstrate that OSIV-Net achieves state-of-the-art (SOTA) performance and outperforms across various scenarios. Specifically, it attains 82.60% [email protected] on the DroneVehicle dataset, surpassing the previous infrared modality SOTA method DTNet by +4.27% and the multi-modal SOTA method MGMF by +2.3%. On the OSIV dataset, it attains an average performance of 78.14% across all scenarios, outperforming DTNet by +6.13%. The dataset and code can be downloaded from https://github.com/rslab1111/OSIV.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.