A novel multi-model 3D object detection framework with adaptive voxel-image feature fusion

IF 1.3 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IET Computer Vision Pub Date : 2024-01-17 DOI:10.1049/cvi2.12269

Zhao Liu, Zhongliang Fu, Gang Li, Shengyuan Zhang

{"title":"A novel multi-model 3D object detection framework with adaptive voxel-image feature fusion","authors":"Zhao Liu, Zhongliang Fu, Gang Li, Shengyuan Zhang","doi":"10.1049/cvi2.12269","DOIUrl":null,"url":null,"abstract":"<p>The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi-sensor data looms large. This work presents a new approach to multi-model 3D object detection, called adaptive voxel-image feature fusion (AVIFF). Adaptive voxel-image feature fusion is an end-to-end single-shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel-based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"640-651"},"PeriodicalIF":1.3000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12269","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12269","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi-sensor data looms large. This work presents a new approach to multi-model 3D object detection, called adaptive voxel-image feature fusion (AVIFF). Adaptive voxel-image feature fusion is an end-to-end single-shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel-based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自适应体素图像特征融合的新型多模型三维物体检测框架

长期以来，传感器数据的多面性一直是那些试图在三维物体检测领域充分发挥其潜力的人所面临的障碍。虽然利用点云作为输入已经取得了卓越的成果，但如何有效结合多传感器数据的互补特性仍是一个巨大的挑战。本研究提出了一种新的多模型三维物体检测方法，称为自适应体素图像特征融合（AVIFF）。自适应体素-图像特征融合是一种端到端单次拍摄框架，可动态、自适应地融合点云和图像特征，从而对相机传感器和 LiDar 传感器数据进行更全面、更综合的分析。借助自适应特征融合模块，空间化图像特征可以与基于体素的点云特征巧妙融合，而密集融合模块则通过使用异构架构确保保留三维点云数据的独特特征。值得注意的是，作者的框架采用了新颖的广义交集大于联合损失函数，增强了三维空间中物体定位和旋转的可感知性。全面的实验验证了作者提出的模块的有效性，使 AVIFF 成为三维物体检测领域的新型框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf