Fusion of Camera and Lidar Data for Large Scale Semantic Mapping

Thomas Westfechtel, K. Ohno, R. B. Neto, Shotaro Kojima, S. Tadokoro
{"title":"Fusion of Camera and Lidar Data for Large Scale Semantic Mapping","authors":"Thomas Westfechtel, K. Ohno, R. B. Neto, Shotaro Kojima, S. Tadokoro","doi":"10.1109/ITSC.2019.8917107","DOIUrl":null,"url":null,"abstract":"Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.","PeriodicalId":6717,"journal":{"name":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","volume":"11 1","pages":"257-264"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2019.8917107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向大规模语义映射的相机与激光雷达数据融合
目前的自动驾驶汽车依赖于包含详尽语义信息的详细环境地图。这项工作提出了一种策略,利用图像语义分割的最新进展,将从相机流中提取的信息与激光雷达传感器的精确深度测量融合在一起,以创建大规模的环境语义标记点云。我们将从环视相机系统收集的颜色和语义数据与从激光雷达传感器收集的深度数据融合在一起。在我们的框架中,每个激光雷达扫描点被投影到相机流中以提取颜色和语义信息,同时通过基于激光雷达的SLAM算法生成环境的大比尺3D地图。虽然我们使用的网络在Cityscape数据集[1]上实现了最先进的语义分割结果(IoU得分为82.1%),但提取的语义信息的唯一使用仅在日本仙台市的5个不同试验运行的105个手动标记的5x5m瓷砖上实现了38.9%的IoU得分(这种准确性的降低将在第III-B节中讨论)。为了提高性能,我们对每个点的标签进行重新分类。为此研究了两种不同的方法:随机森林和SparseConvNet[2](一种深度学习方法)。我们研究了这两种方法中包含来自相机流的语义标签如何影响3D点云的分类任务。为此,我们表明,通过这样做可以实现显着的性能提升-随机森林的25.4% (40.0% w/o标签到65.4%带标签)和SparseConvNet的16.6 (33.4% w/o标签到50.8%带标签)。最后,我们给出了一些实际的例子,说明如何将语义丰富的映射用于进一步的任务。特别是,我们展示了如何从点云中删除不同的类别(即汽车和植被),以增加其他类别(即道路和建筑物)的可见性。以及如何利用这些数据提取车辆和行人的轨迹。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reliable Monocular Ego-Motion Estimation System in Rainy Urban Environments Coarse-to-Fine Luminance Estimation for Low-Light Image Enhancement in Maritime Video Surveillance Vehicle Occupancy Detection for HOV/HOT Lanes Enforcement Road Roughness Crowd-Sensing with Smartphone Apps LACI: Low-effort Automatic Calibration of Infrastructure Sensors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1