{"title":"Feature-Enhanced PointPillars for 3-D Millimeter-Wave Object Detection","authors":"Yanyi Chang;Shuai Wan;Yichen Gao;Zhaohui Bu;Ping Li;Li Ding","doi":"10.1109/TAES.2024.3491058","DOIUrl":null,"url":null,"abstract":"PointPillars, a voxel-based 3-D object detection model, would encounter the resolution loss after voxelization, leading to the capability reduction in capturing intricate object details. This limitation is particularly evident in processing 3-D millimeter-wave (MMW) images. Therefore, this article proposes a feature-enhanced PointPillars for 3-D MMW object detection. This enhancement first integrates a multiscale feature extraction (MFE) module into the pillar feature network. This module is adept at handling the substantial volume of point cloud data characteristic of MMW images and significantly improves feature encoding efficiency. Considering the local density variations and sparsity patterns observed in MMW images, the modified PointPillars further explores a pyramidally attended feature extraction (PAFE) module to improve the inference efficiency. By employing multibranch convolutional kernels with varying dilation rates in the backbone network, the proposed approach expands the receptive fields and augments the contextual interconnectedness of the detected objects. This effectively curtails the semantic and spatial detail loss commonly associated with downsampling. Empirical evaluation of our proposed method against the standard PointPillars benchmark highlights its superiority. In particular, our method presents performance enhancement of 0.4$\\%$ and 7.16$\\%$ in $\\mathrm{{AP\\_{R}}}{40_{0.5}}$ (AP) for the bird's eye view and 3-D bounding boxes, respectively. Furthermore, it achieves a great 50.4$\\%$ reduction in the number of parameters and delivers an impressive inference speed of 0.0132 s per frame. These advancements confirm that the augmented network achieves a balance between computational efficiency and 3-D object detection performance for 3-D MMW images, all the while ensuring a practical inference cost.","PeriodicalId":13157,"journal":{"name":"IEEE Transactions on Aerospace and Electronic Systems","volume":"61 2","pages":"3828-3839"},"PeriodicalIF":5.7000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Aerospace and Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10742475/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}
引用次数: 0
Abstract
PointPillars, a voxel-based 3-D object detection model, would encounter the resolution loss after voxelization, leading to the capability reduction in capturing intricate object details. This limitation is particularly evident in processing 3-D millimeter-wave (MMW) images. Therefore, this article proposes a feature-enhanced PointPillars for 3-D MMW object detection. This enhancement first integrates a multiscale feature extraction (MFE) module into the pillar feature network. This module is adept at handling the substantial volume of point cloud data characteristic of MMW images and significantly improves feature encoding efficiency. Considering the local density variations and sparsity patterns observed in MMW images, the modified PointPillars further explores a pyramidally attended feature extraction (PAFE) module to improve the inference efficiency. By employing multibranch convolutional kernels with varying dilation rates in the backbone network, the proposed approach expands the receptive fields and augments the contextual interconnectedness of the detected objects. This effectively curtails the semantic and spatial detail loss commonly associated with downsampling. Empirical evaluation of our proposed method against the standard PointPillars benchmark highlights its superiority. In particular, our method presents performance enhancement of 0.4$\%$ and 7.16$\%$ in $\mathrm{{AP\_{R}}}{40_{0.5}}$ (AP) for the bird's eye view and 3-D bounding boxes, respectively. Furthermore, it achieves a great 50.4$\%$ reduction in the number of parameters and delivers an impressive inference speed of 0.0132 s per frame. These advancements confirm that the augmented network achieves a balance between computational efficiency and 3-D object detection performance for 3-D MMW images, all the while ensuring a practical inference cost.
期刊介绍:
IEEE Transactions on Aerospace and Electronic Systems focuses on the organization, design, development, integration, and operation of complex systems for space, air, ocean, or ground environment. These systems include, but are not limited to, navigation, avionics, spacecraft, aerospace power, radar, sonar, telemetry, defense, transportation, automated testing, and command and control.