Evaluation of synthetic data for deep learning stereo depth algorithms on embedded platforms

2017 4th International Conference on Systems and Informatics (ICSAI) Pub Date : 2017-11-01 DOI:10.1109/ICSAI.2017.8248284

Kevin Lee, D. Moloney

{"title":"Evaluation of synthetic data for deep learning stereo depth algorithms on embedded platforms","authors":"Kevin Lee, D. Moloney","doi":"10.1109/ICSAI.2017.8248284","DOIUrl":null,"url":null,"abstract":"Stereo vision is a very active field in the realm of computer vision and in recent years Convolutional Neural Networks (CNNs) have proven to be very competitive against the state-of-the-art. However the performance of these networks are limited by the quality of the data that is used when training the CNNs. Data acquisition of high quality labelled images is a time-consuming and expensive process. By exploiting the power of modern-day powerful GPUs, we present a synthetic dataset with fully rectified stereo image pairs and accompanying accurate ground truth information that can be used for training and testing stereo algorithms. We provide validation of the quality of our dataset by performing quantitative experiments that suggest pre-training deep learning algorithms on synthetic data can perform competitively against networks trained on real life data. Testing on the KITTI data-set[1], we found the accuracy performance difference between the real and synthetically trained networks was within a margin of 1.8%. We also illustrate the functionality synthetic data can provide, by conducting a key performance index on a selection of conventional and deep learning stereo algorithms available on embedded platforms and compared them under common metrics. We also focused on power consumption and performance and we were able to achieve a compute the matching cost from a CNN performing inference on an embedded device at 11.9 FPS at 1.2 Watts.","PeriodicalId":285726,"journal":{"name":"2017 4th International Conference on Systems and Informatics (ICSAI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 4th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2017.8248284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Stereo vision is a very active field in the realm of computer vision and in recent years Convolutional Neural Networks (CNNs) have proven to be very competitive against the state-of-the-art. However the performance of these networks are limited by the quality of the data that is used when training the CNNs. Data acquisition of high quality labelled images is a time-consuming and expensive process. By exploiting the power of modern-day powerful GPUs, we present a synthetic dataset with fully rectified stereo image pairs and accompanying accurate ground truth information that can be used for training and testing stereo algorithms. We provide validation of the quality of our dataset by performing quantitative experiments that suggest pre-training deep learning algorithms on synthetic data can perform competitively against networks trained on real life data. Testing on the KITTI data-set[1], we found the accuracy performance difference between the real and synthetically trained networks was within a margin of 1.8%. We also illustrate the functionality synthetic data can provide, by conducting a key performance index on a selection of conventional and deep learning stereo algorithms available on embedded platforms and compared them under common metrics. We also focused on power consumption and performance and we were able to achieve a compute the matching cost from a CNN performing inference on an embedded device at 11.9 FPS at 1.2 Watts.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

嵌入式平台上深度学习立体深度算法的综合数据评价

立体视觉是计算机视觉领域中一个非常活跃的领域，近年来卷积神经网络(cnn)已被证明具有很强的竞争力。然而，这些网络的性能受到训练cnn时使用的数据质量的限制。高质量标记图像的数据采集是一个耗时且昂贵的过程。通过利用现代强大的gpu的力量，我们提出了一个合成数据集，其中包含完全校正的立体图像对和伴随的准确的地面真实信息，可用于训练和测试立体算法。我们通过进行定量实验来验证我们数据集的质量，这些实验表明，在合成数据上预训练深度学习算法可以与在现实生活数据上训练的网络相比具有竞争力。在KITTI数据集上进行测试[1]，我们发现真实网络和综合训练网络的准确率性能差异在1.8%以内。我们还通过对嵌入式平台上可用的传统和深度学习立体算法的选择进行关键性能指标，并在通用指标下对它们进行比较，来说明合成数据可以提供的功能。我们还关注了功耗和性能，我们能够在1.2瓦的情况下以11.9 FPS的速度在嵌入式设备上计算CNN执行推理的匹配成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 4th International Conference on Systems and Informatics (ICSAI)

自引率

0.00%

发文量