SemanticVoxels:使用LiDAR点云和语义分割的3D行人检测的顺序融合

2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI) Pub Date : 2020-09-14 DOI:10.1109/MFI49285.2020.9235240

Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, C. Stiller

{"title":"SemanticVoxels:使用LiDAR点云和语义分割的3D行人检测的顺序融合","authors":"Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, C. Stiller","doi":"10.1109/MFI49285.2020.9235240","DOIUrl":null,"url":null,"abstract":"3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public benchmarks. Recently, PointPainting has been presented to eliminate this performance drop by effectively fusing the output of a semantic segmentation network instead of the raw image information. In this paper, we propose a generalization of PointPainting to be able to apply fusion at different levels. After the semantic augmentation of the point cloud, we encode raw point data in pillars to get geometric features and semantic point data in voxels to get semantic features and fuse them in an effective way. Experimental results on the KITTI test set show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird’s eye view pedestrian detection benchmarks. In particular, our approach demonstrates its strength in detecting challenging pedestrian cases and outperforms current state-of-the-art approaches.","PeriodicalId":446154,"journal":{"name":"2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation\",\"authors\":\"Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, C. Stiller\",\"doi\":\"10.1109/MFI49285.2020.9235240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public benchmarks. Recently, PointPainting has been presented to eliminate this performance drop by effectively fusing the output of a semantic segmentation network instead of the raw image information. In this paper, we propose a generalization of PointPainting to be able to apply fusion at different levels. After the semantic augmentation of the point cloud, we encode raw point data in pillars to get geometric features and semantic point data in voxels to get semantic features and fuse them in an effective way. Experimental results on the KITTI test set show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird’s eye view pedestrian detection benchmarks. In particular, our approach demonstrates its strength in detecting challenging pedestrian cases and outperforms current state-of-the-art approaches.\",\"PeriodicalId\":446154,\"journal\":{\"name\":\"2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MFI49285.2020.9235240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MFI49285.2020.9235240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

3D行人检测在自动驾驶中是一项具有挑战性的任务，因为行人相对较小，经常被遮挡，容易与狭窄的垂直物体混淆。激光雷达和相机是这一任务中常用的两种传感器模式，它们应该提供互补的信息。出乎意料的是，在公共基准测试中，仅激光雷达检测方法往往优于多传感器融合方法。最近，PointPainting通过有效地融合语义分割网络的输出而不是原始图像信息来消除这种性能下降。在本文中，我们提出了一个泛化的点绘画，能够应用融合在不同的层次。对点云进行语义增强后，对原始点数据进行柱状编码得到几何特征，对语义点数据进行体素编码得到语义特征并进行有效融合。在KITTI测试集上的实验结果表明，SemanticVoxels在3D和鸟瞰行人检测基准中都达到了最先进的性能。特别是，我们的方法在检测具有挑战性的行人情况方面显示出其优势，并且优于当前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation

3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public benchmarks. Recently, PointPainting has been presented to eliminate this performance drop by effectively fusing the output of a semantic segmentation network instead of the raw image information. In this paper, we propose a generalization of PointPainting to be able to apply fusion at different levels. After the semantic augmentation of the point cloud, we encode raw point data in pillars to get geometric features and semantic point data in voxels to get semantic features and fuse them in an effective way. Experimental results on the KITTI test set show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird’s eye view pedestrian detection benchmarks. In particular, our approach demonstrates its strength in detecting challenging pedestrian cases and outperforms current state-of-the-art approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)

自引率

0.00%

发文量