A novel multi-model 3D object detection framework with adaptive voxel-image feature fusion

IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IET Computer Vision Pub Date : 2024-01-17 DOI:10.1049/cvi2.12269
Zhao Liu, Zhongliang Fu, Gang Li, Shengyuan Zhang
{"title":"A novel multi-model 3D object detection framework with adaptive voxel-image feature fusion","authors":"Zhao Liu,&nbsp;Zhongliang Fu,&nbsp;Gang Li,&nbsp;Shengyuan Zhang","doi":"10.1049/cvi2.12269","DOIUrl":null,"url":null,"abstract":"<p>The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi-sensor data looms large. This work presents a new approach to multi-model 3D object detection, called adaptive voxel-image feature fusion (AVIFF). Adaptive voxel-image feature fusion is an end-to-end single-shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel-based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 5","pages":"640-651"},"PeriodicalIF":1.5000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12269","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.12269","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge of effectively combining the complementary properties of multi-sensor data looms large. This work presents a new approach to multi-model 3D object detection, called adaptive voxel-image feature fusion (AVIFF). Adaptive voxel-image feature fusion is an end-to-end single-shot framework that can dynamically and adaptively fuse point cloud and image features, resulting in a more comprehensive and integrated analysis of the camera sensor and the LiDar sensor data. With the aid of the adaptive feature fusion module, spatialised image features can be adroitly fused with voxel-based point cloud features, while the Dense Fusion module ensures the preservation of the distinctive characteristics of 3D point cloud data through the use of a heterogeneous architecture. Notably, the authors’ framework features a novel generalised intersection over union loss function that enhances the perceptibility of object localsation and rotation in 3D space. Comprehensive experimentation has validated the efficacy of the authors’ proposed modules, firmly establishing AVIFF as a novel framework in the field of 3D object detection.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自适应体素图像特征融合的新型多模型三维物体检测框架
长期以来,传感器数据的多面性一直是那些试图在三维物体检测领域充分发挥其潜力的人所面临的障碍。虽然利用点云作为输入已经取得了卓越的成果,但如何有效结合多传感器数据的互补特性仍是一个巨大的挑战。本研究提出了一种新的多模型三维物体检测方法,称为自适应体素图像特征融合(AVIFF)。自适应体素-图像特征融合是一种端到端单次拍摄框架,可动态、自适应地融合点云和图像特征,从而对相机传感器和 LiDar 传感器数据进行更全面、更综合的分析。借助自适应特征融合模块,空间化图像特征可以与基于体素的点云特征巧妙融合,而密集融合模块则通过使用异构架构确保保留三维点云数据的独特特征。值得注意的是,作者的框架采用了新颖的广义交集大于联合损失函数,增强了三维空间中物体定位和旋转的可感知性。全面的实验验证了作者提出的模块的有效性,使 AVIFF 成为三维物体检测领域的新型框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IET Computer Vision
IET Computer Vision 工程技术-工程:电子与电气
CiteScore
3.30
自引率
11.80%
发文量
76
审稿时长
3.4 months
期刊介绍: IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf
期刊最新文献
SRL-ProtoNet: Self-supervised representation learning for few-shot remote sensing scene classification Balanced parametric body prior for implicit clothed human reconstruction from a monocular RGB Social-ATPGNN: Prediction of multi-modal pedestrian trajectory of non-homogeneous social interaction HIST: Hierarchical and sequential transformer for image captioning Multi-modal video search by examples—A video quality impact analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1