{"title":"A Learning-Based Framework for Depth Perception using Dense Light Fields","authors":"A. P. Ferrugem, B. Zatt, L. Agostini","doi":"10.1145/3539637.3557062","DOIUrl":null,"url":null,"abstract":"The rapid development of optical sensors technology has accompanied a growing demand for visual measurement systems in emerging areas that need to interpret the real three-dimensional physical world, such as self-driving cars, mobile robotics, Advanced Driver Assistance Systems (ADAS), and medical diagnostic in 3D imaging. In these systems, for modeling the physical world, it is necessary to unify visual information with depth measurements. Light Field cameras have the potential to be used in such systems as a versatile hypersensor. Since Light Fields represent the scene’s visual information from multiple viewpoints, it is possible to calculate the depth information through trigonometric operations. This paper proposes a learning-based framework that allows unifying scene depth with visual information obtained from Light Fields. The structure of the proposed framework is composed of four main modules. The deep learning modules consist of (i) a depth map estimation using a siamese convolutional neural network and (ii) an instance segmentation employing region-based convolutional neural network. The others two modules apply linear transformations: (iii) a module which applies the matrix transformations with camera intrinsic parameters to generated a new depth map of absolute distances and (iv) a module to return the distance of the selected objects. For the depth map estimation module this framework proposal a siamese neural network called EPINET-FAST, which allows for generating depth maps in less than half the time of the original EPINET. A case study is presented using Dense Light Fields captured by a Lytro Illum camera (plenotic 1.0). The case study seeks to exemplify the processing time of each module, allowing researchers to isolate critical points and propose changes in the future, seeking a processing that can be applied in real time.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"167 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539637.3557062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid development of optical sensors technology has accompanied a growing demand for visual measurement systems in emerging areas that need to interpret the real three-dimensional physical world, such as self-driving cars, mobile robotics, Advanced Driver Assistance Systems (ADAS), and medical diagnostic in 3D imaging. In these systems, for modeling the physical world, it is necessary to unify visual information with depth measurements. Light Field cameras have the potential to be used in such systems as a versatile hypersensor. Since Light Fields represent the scene’s visual information from multiple viewpoints, it is possible to calculate the depth information through trigonometric operations. This paper proposes a learning-based framework that allows unifying scene depth with visual information obtained from Light Fields. The structure of the proposed framework is composed of four main modules. The deep learning modules consist of (i) a depth map estimation using a siamese convolutional neural network and (ii) an instance segmentation employing region-based convolutional neural network. The others two modules apply linear transformations: (iii) a module which applies the matrix transformations with camera intrinsic parameters to generated a new depth map of absolute distances and (iv) a module to return the distance of the selected objects. For the depth map estimation module this framework proposal a siamese neural network called EPINET-FAST, which allows for generating depth maps in less than half the time of the original EPINET. A case study is presented using Dense Light Fields captured by a Lytro Illum camera (plenotic 1.0). The case study seeks to exemplify the processing time of each module, allowing researchers to isolate critical points and propose changes in the future, seeking a processing that can be applied in real time.