基于单目伪激光雷达的自监督三维目标检测

2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI) Pub Date : 2022-09-20 DOI:10.1109/MFI55806.2022.9913846

Curie Kim, Ue-Hwan Kim, Jong-Hwan Kim

{"title":"基于单目伪激光雷达的自监督三维目标检测","authors":"Curie Kim, Ue-Hwan Kim, Jong-Hwan Kim","doi":"10.1109/MFI55806.2022.9913846","DOIUrl":null,"url":null,"abstract":"There have been attempts to detect 3D objects by fusion of stereo camera images and LiDAR sensor data or using LiDAR for pre-training and only monocular images for testing, but there have been less attempts to use only monocular image sequences due to low accuracy. In addition, when depth prediction using only monocular images, only scale-inconsistent depth can be predicted, which is the reason why researchers are reluctant to use monocular images alone.Therefore, we propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences by enabling end-to-end learning of detection networks and depth prediction networks. As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset. Even when monocular image and 3D LiDAR are used together during training in an attempt to improve performance, ours exhibit is the best performance compared to other methods using the same input. In addition, end-to-end learning not only improves depth prediction performance, but also enables absolute depth prediction, because our network utilizes the fact that the size of a 3D object such as a car is determined by the approximate size.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR\",\"authors\":\"Curie Kim, Ue-Hwan Kim, Jong-Hwan Kim\",\"doi\":\"10.1109/MFI55806.2022.9913846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There have been attempts to detect 3D objects by fusion of stereo camera images and LiDAR sensor data or using LiDAR for pre-training and only monocular images for testing, but there have been less attempts to use only monocular image sequences due to low accuracy. In addition, when depth prediction using only monocular images, only scale-inconsistent depth can be predicted, which is the reason why researchers are reluctant to use monocular images alone.Therefore, we propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences by enabling end-to-end learning of detection networks and depth prediction networks. As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset. Even when monocular image and 3D LiDAR are used together during training in an attempt to improve performance, ours exhibit is the best performance compared to other methods using the same input. In addition, end-to-end learning not only improves depth prediction performance, but also enables absolute depth prediction, because our network utilizes the fact that the size of a 3D object such as a car is determined by the approximate size.\",\"PeriodicalId\":344737,\"journal\":{\"name\":\"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MFI55806.2022.9913846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MFI55806.2022.9913846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

已经有人尝试通过融合立体摄像机图像和激光雷达传感器数据来检测3D物体，或者使用激光雷达进行预训练，仅使用单眼图像进行测试，但由于精度低，仅使用单眼图像序列的尝试较少。此外，当仅使用单眼图像进行深度预测时，只能预测尺度不一致的深度，这也是研究者不愿意单独使用单眼图像的原因。因此，我们提出了一种仅使用单眼图像序列预测绝对深度和检测3D物体的方法，该方法支持检测网络和深度预测网络的端到端学习。结果表明，该方法在KITTI三维数据集上的性能优于其他现有方法。即使在训练期间将单眼图像和3D激光雷达一起使用以试图提高性能，与使用相同输入的其他方法相比，我们的展示也具有最佳性能。此外，端到端学习不仅可以提高深度预测性能，还可以实现绝对深度预测，因为我们的网络利用了汽车等3D物体的大小是由近似大小决定的这一事实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Self-supervised 3D Object Detection from Monocular Pseudo-LiDAR

There have been attempts to detect 3D objects by fusion of stereo camera images and LiDAR sensor data or using LiDAR for pre-training and only monocular images for testing, but there have been less attempts to use only monocular image sequences due to low accuracy. In addition, when depth prediction using only monocular images, only scale-inconsistent depth can be predicted, which is the reason why researchers are reluctant to use monocular images alone.Therefore, we propose a method for predicting absolute depth and detecting 3D objects using only monocular image sequences by enabling end-to-end learning of detection networks and depth prediction networks. As a result, the proposed method surpasses other existing methods in performance on the KITTI 3D dataset. Even when monocular image and 3D LiDAR are used together during training in an attempt to improve performance, ours exhibit is the best performance compared to other methods using the same input. In addition, end-to-end learning not only improves depth prediction performance, but also enables absolute depth prediction, because our network utilizes the fact that the size of a 3D object such as a car is determined by the approximate size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)

自引率

0.00%

发文量