Yadong Wang , Shuqin Zhang , Yongqiang Deng , Juanjuan Li , Yanlong Yang , Kunfeng Wang
{"title":"用于真实世界交通场景中多模态 3D 物体检测的双向信息交互","authors":"Yadong Wang , Shuqin Zhang , Yongqiang Deng , Juanjuan Li , Yanlong Yang , Kunfeng Wang","doi":"10.1016/j.eswa.2024.125651","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal 3D object detection methods are poorly adapted to real-world traffic scenes due to sparse distribution of point clouds and misalignment multimodal data during actual collection. Among the existing methods, they focus on high-quality open-source datasets, with performance relying on the accurate structural representation of point clouds and the precise mapping relationship between point clouds and images. To solve the above challenges, this paper proposes a multimodal feature-level fusion method based on the bi-directional interaction between image and point cloud. To overcome the sparsity issue in asynchronous multi-modal data, a point cloud densification scheme based on visual guidance and point cloud density guidance is proposed. This scheme can generate object-level virtual point clouds even when the point cloud and image are misaligned. To deal with the unalignment issue between point cloud and image, a bi-directional interaction module based on image-guided interaction with key points of point clouds and point cloud-guided interaction with image context information is proposed. It achieves effective feature fusion even when the point cloud and image are misaligned. The experiments on the VANJEE and KITTI datasets demonstrated the effectiveness of the proposed method, with average precision improvements of 6.20% and 1.54% compared to the baseline.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125651"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bi-directional information interaction for multi-modal 3D object detection in real-world traffic scenes\",\"authors\":\"Yadong Wang , Shuqin Zhang , Yongqiang Deng , Juanjuan Li , Yanlong Yang , Kunfeng Wang\",\"doi\":\"10.1016/j.eswa.2024.125651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal 3D object detection methods are poorly adapted to real-world traffic scenes due to sparse distribution of point clouds and misalignment multimodal data during actual collection. Among the existing methods, they focus on high-quality open-source datasets, with performance relying on the accurate structural representation of point clouds and the precise mapping relationship between point clouds and images. To solve the above challenges, this paper proposes a multimodal feature-level fusion method based on the bi-directional interaction between image and point cloud. To overcome the sparsity issue in asynchronous multi-modal data, a point cloud densification scheme based on visual guidance and point cloud density guidance is proposed. This scheme can generate object-level virtual point clouds even when the point cloud and image are misaligned. To deal with the unalignment issue between point cloud and image, a bi-directional interaction module based on image-guided interaction with key points of point clouds and point cloud-guided interaction with image context information is proposed. It achieves effective feature fusion even when the point cloud and image are misaligned. The experiments on the VANJEE and KITTI datasets demonstrated the effectiveness of the proposed method, with average precision improvements of 6.20% and 1.54% compared to the baseline.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"262 \",\"pages\":\"Article 125651\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025181\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025181","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Bi-directional information interaction for multi-modal 3D object detection in real-world traffic scenes
Multimodal 3D object detection methods are poorly adapted to real-world traffic scenes due to sparse distribution of point clouds and misalignment multimodal data during actual collection. Among the existing methods, they focus on high-quality open-source datasets, with performance relying on the accurate structural representation of point clouds and the precise mapping relationship between point clouds and images. To solve the above challenges, this paper proposes a multimodal feature-level fusion method based on the bi-directional interaction between image and point cloud. To overcome the sparsity issue in asynchronous multi-modal data, a point cloud densification scheme based on visual guidance and point cloud density guidance is proposed. This scheme can generate object-level virtual point clouds even when the point cloud and image are misaligned. To deal with the unalignment issue between point cloud and image, a bi-directional interaction module based on image-guided interaction with key points of point clouds and point cloud-guided interaction with image context information is proposed. It achieves effective feature fusion even when the point cloud and image are misaligned. The experiments on the VANJEE and KITTI datasets demonstrated the effectiveness of the proposed method, with average precision improvements of 6.20% and 1.54% compared to the baseline.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.