{"title":"基于RGB和LiDAR融合的自动驾驶3D语义分割","authors":"Jianguo Liu, Zhiling Jia, Gongbo Li, Fuwu Yan, Youhua Wu, Yunfei Sun","doi":"10.1088/1742-6596/2632/1/012034","DOIUrl":null,"url":null,"abstract":"Abstract Projection-based multimodal 3D semantic segmentation methods suffer from information loss during the point cloud projection process. This issue becomes more prominent for small objects. Moreover, the alignment of sparse target features with the corresponding object features in the camera image during the fusion process is inaccurate, leading to low segmentation accuracy for small objects. Therefore, we propose an attention-based multimodal feature alignment and fusion network module. This module aggregates features in spatial directions and generates attention matrices. Through this transformation, the module could capture remote dependencies of features in one spatial direction. This helps our network precisely locate objects and establish relationships between similar features. It enables the adaptive alignment of sparse target features with the corresponding object features in the camera image, resulting in a better fusion of the two modalities. We validate our method on the nuScenes-lidar seg dataset. Our CAFNet achieves an improvement in segmentation accuracy for small objects with fewer points compared to the baseline network, such as bicycles (6% improvement), pedestrians (2.1% improvement), and traffic cones (0.9% improvement).","PeriodicalId":44008,"journal":{"name":"Journal of Physics-Photonics","volume":"92 3","pages":"0"},"PeriodicalIF":4.6000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving\",\"authors\":\"Jianguo Liu, Zhiling Jia, Gongbo Li, Fuwu Yan, Youhua Wu, Yunfei Sun\",\"doi\":\"10.1088/1742-6596/2632/1/012034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Projection-based multimodal 3D semantic segmentation methods suffer from information loss during the point cloud projection process. This issue becomes more prominent for small objects. Moreover, the alignment of sparse target features with the corresponding object features in the camera image during the fusion process is inaccurate, leading to low segmentation accuracy for small objects. Therefore, we propose an attention-based multimodal feature alignment and fusion network module. This module aggregates features in spatial directions and generates attention matrices. Through this transformation, the module could capture remote dependencies of features in one spatial direction. This helps our network precisely locate objects and establish relationships between similar features. It enables the adaptive alignment of sparse target features with the corresponding object features in the camera image, resulting in a better fusion of the two modalities. We validate our method on the nuScenes-lidar seg dataset. Our CAFNet achieves an improvement in segmentation accuracy for small objects with fewer points compared to the baseline network, such as bicycles (6% improvement), pedestrians (2.1% improvement), and traffic cones (0.9% improvement).\",\"PeriodicalId\":44008,\"journal\":{\"name\":\"Journal of Physics-Photonics\",\"volume\":\"92 3\",\"pages\":\"0\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Physics-Photonics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/1742-6596/2632/1/012034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Physics-Photonics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1742-6596/2632/1/012034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPTICS","Score":null,"Total":0}
RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving
Abstract Projection-based multimodal 3D semantic segmentation methods suffer from information loss during the point cloud projection process. This issue becomes more prominent for small objects. Moreover, the alignment of sparse target features with the corresponding object features in the camera image during the fusion process is inaccurate, leading to low segmentation accuracy for small objects. Therefore, we propose an attention-based multimodal feature alignment and fusion network module. This module aggregates features in spatial directions and generates attention matrices. Through this transformation, the module could capture remote dependencies of features in one spatial direction. This helps our network precisely locate objects and establish relationships between similar features. It enables the adaptive alignment of sparse target features with the corresponding object features in the camera image, resulting in a better fusion of the two modalities. We validate our method on the nuScenes-lidar seg dataset. Our CAFNet achieves an improvement in segmentation accuracy for small objects with fewer points compared to the baseline network, such as bicycles (6% improvement), pedestrians (2.1% improvement), and traffic cones (0.9% improvement).