Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478562
A. Khatiullin, Mikhail Erofeev, D. Vatolin
Occlusion filling is a basic problem for multiview video generation from existing monocular video. The essential goal of this problem is to recover missing information about a scenes 3D structure and corresponding texture.We propose a method for content-aware deformation of the source view that ensures no disoccluded regions are visible in the synthesized views while also keeping visible distortions to a minimum. We formulate this problem in terms of global energy min-imization. Furthermore, we introduce a similar variable-rejection algorithm that, along with other known optimization techniques, allows us to accelerate the energy function minimization by nearly 30 times and still maintain the visual quality of the synthesized views.
{"title":"FAST OCCLUSION FILLING METHOD FOR MULTIVIEW VIDEO GENERATION","authors":"A. Khatiullin, Mikhail Erofeev, D. Vatolin","doi":"10.1109/3DTV.2018.8478562","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478562","url":null,"abstract":"Occlusion filling is a basic problem for multiview video generation from existing monocular video. The essential goal of this problem is to recover missing information about a scenes 3D structure and corresponding texture.We propose a method for content-aware deformation of the source view that ensures no disoccluded regions are visible in the synthesized views while also keeping visible distortions to a minimum. We formulate this problem in terms of global energy min-imization. Furthermore, we introduce a similar variable-rejection algorithm that, along with other known optimization techniques, allows us to accelerate the energy function minimization by nearly 30 times and still maintain the visual quality of the synthesized views.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116336615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478587
Ali Özgür Yöntem, D. Chu
The design of micro photon sieve arrays (PSAs) is investigated for light-field capture with high spatial resolution in plenoptic cameras. A commercial very high-resolution full-frame camera with a manual lens is converted into a plenoptic camera for high-resolution depth image acquisition by using the designed PSA as an add-on diffractive optical element in place of an ordinary refractive microlens array or a diffractive micro Fresnel Zone Plate (FZP) array, which is used in integral imaging applications. The noise introduced by the diffractive nature of the optical element is reduced by standard image processing tools. The light-field data is also used for computational refocusing of the 3D scene with wave propagation tools.
{"title":"DESIGN OF MICRO PHOTON SIEVE ARRAYS FOR HIGH RESOLUTION LIGHT-FIELD CAPTURE IN PLENOPTIC CAMERAS","authors":"Ali Özgür Yöntem, D. Chu","doi":"10.1109/3DTV.2018.8478587","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478587","url":null,"abstract":"The design of micro photon sieve arrays (PSAs) is investigated for light-field capture with high spatial resolution in plenoptic cameras. A commercial very high-resolution full-frame camera with a manual lens is converted into a plenoptic camera for high-resolution depth image acquisition by using the designed PSA as an add-on diffractive optical element in place of an ordinary refractive microlens array or a diffractive micro Fresnel Zone Plate (FZP) array, which is used in integral imaging applications. The noise introduced by the diffractive nature of the optical element is reduced by standard image processing tools. The light-field data is also used for computational refocusing of the 3D scene with wave propagation tools.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130366456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478527
Norishige Fukushima
We propose an iterative closest point (ICP) based calibration for time of flight (ToF) multiple depth sensors. For the multiple sensor calibrations, we usually use 2D patterns calibration with IR images. The depth sensor output depends on calibration parameters at a factory; thus, the re-calibration must include gaps from the calibration in the factory. Therefore, we use direct correspondences among depth values, and the calibrating extrinsic parameters by using ICP. Usually, simultaneous localization and mapping (SLAM) uses ICP, such as KinectFusion. The case of multiple sensor calibrations, however, is harder than the SLAM case. In this case, the distance between cameras is too far to apply ICP. Therefore, we modify the ICP based calibration for multiple sensors. The proposed method uses specific calibration objects to enforce the matching ability among sensors. Also, we proposed a compensation method for ToF depth map distortions.
{"title":"ICP WITH DEPTH COMPENSATION FOR CALIBRATION OF MULTIPLE TOF SENSORS","authors":"Norishige Fukushima","doi":"10.1109/3DTV.2018.8478527","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478527","url":null,"abstract":"We propose an iterative closest point (ICP) based calibration for time of flight (ToF) multiple depth sensors. For the multiple sensor calibrations, we usually use 2D patterns calibration with IR images. The depth sensor output depends on calibration parameters at a factory; thus, the re-calibration must include gaps from the calibration in the factory. Therefore, we use direct correspondences among depth values, and the calibrating extrinsic parameters by using ICP. Usually, simultaneous localization and mapping (SLAM) uses ICP, such as KinectFusion. The case of multiple sensor calibrations, however, is harder than the SLAM case. In this case, the distance between cameras is too far to apply ICP. Therefore, we modify the ICP based calibration for multiple sensors. The proposed method uses specific calibration objects to enforce the matching ability among sensors. Also, we proposed a compensation method for ToF depth map distortions.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115844791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478536
Andrej Satnik, E. Izquierdo
A key challenge when displaying and processing sensed real-time 3D data is efficiency of generating and post-processing algorithms in order to acquire high quality 3D content. In contrast, our approach focuses on volumetric generation and processing volumetric data using an efficient low-cost hardware setting. Acquisition of volumetric data is performed by connecting several Kinect v2 scanners to a single PC that are subsequently calibrated using planar pattern. This process is by no means trivial and requires well designed algorithms for fast processing and quick rendering of volumetric data. This can be achieved by fusing efficient filtering methods such as Weighted median filter (WM), Radius outlier removal (ROR) and Laplace-based smoothing algorithm. In this context, we demonstrate the robustness and efficiency of our technique by sensing several scenes.
{"title":"REAL-TIME MULTI-VIEW VOLUMETRIC RECONSTRUCTION OF DYNAMIC SCENES USING KINECT V2","authors":"Andrej Satnik, E. Izquierdo","doi":"10.1109/3DTV.2018.8478536","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478536","url":null,"abstract":"A key challenge when displaying and processing sensed real-time 3D data is efficiency of generating and post-processing algorithms in order to acquire high quality 3D content. In contrast, our approach focuses on volumetric generation and processing volumetric data using an efficient low-cost hardware setting. Acquisition of volumetric data is performed by connecting several Kinect v2 scanners to a single PC that are subsequently calibrated using planar pattern. This process is by no means trivial and requires well designed algorithms for fast processing and quick rendering of volumetric data. This can be achieved by fusing efficient filtering methods such as Weighted median filter (WM), Radius outlier removal (ROR) and Laplace-based smoothing algorithm. In this context, we demonstrate the robustness and efficiency of our technique by sensing several scenes.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130613019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478491
G. Nur, F. Battisti
A key role in the advancement of 3 Dimensional TV services is played by the development of 3D video quality metrics used for the assessment of the perceived quality. Moreover, this key role can only be supported when the features associated with the 3D video nature is reliably and efficiently characterized in these metrics. In this study, z-direction motion incorporated with significant depth levels in depth map sequences are considered as the main characterizations of the 3D nature. The 3D video quality metrics can be classified into three categories based on the need for the reference video during the assessment process at the user end: Full Reference (FR), Reduced Reference (RR) and No Reference (NR). In this study we propose a NR quality metric, PNRM, suitable for on-the-fly 3D video services. In order to evaluate the reliability and effectiveness of the proposed metric, subjective experiments are conducted in this paper. Observing the high correlation with the subjective experimental results, it can be clearly stated that the proposed metric is able to mimic the Human Visual System (HVS).
{"title":"DEPTH PERCEPTION PREDICTION OF 3D VIDEO FOR ENSURING ADVANCED MULTIMEDIA SERVICES","authors":"G. Nur, F. Battisti","doi":"10.1109/3DTV.2018.8478491","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478491","url":null,"abstract":"A key role in the advancement of 3 Dimensional TV services is played by the development of 3D video quality metrics used for the assessment of the perceived quality. Moreover, this key role can only be supported when the features associated with the 3D video nature is reliably and efficiently characterized in these metrics. In this study, z-direction motion incorporated with significant depth levels in depth map sequences are considered as the main characterizations of the 3D nature. The 3D video quality metrics can be classified into three categories based on the need for the reference video during the assessment process at the user end: Full Reference (FR), Reduced Reference (RR) and No Reference (NR). In this study we propose a NR quality metric, PNRM, suitable for on-the-fly 3D video services. In order to evaluate the reliability and effectiveness of the proposed metric, subjective experiments are conducted in this paper. Observing the high correlation with the subjective experimental results, it can be clearly stated that the proposed metric is able to mimic the Human Visual System (HVS).","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130671176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3dtv.2018.8478442
{"title":"3DTV-CON 2018 Organizing Committee Page","authors":"","doi":"10.1109/3dtv.2018.8478442","DOIUrl":"https://doi.org/10.1109/3dtv.2018.8478442","url":null,"abstract":"","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129172257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478467
D. Konstantinidis, K. Dimitropoulos, P. Daras
Sign language recognition (SLR) is a challenging, but highly important research field for several computer vision systems that attempt to facilitate the communication among the deaf and hearing impaired people. In this work, we propose an accurate and robust deep learning-based methodology for sign language recognition from video sequences. Our novel method relies on hand and body skeletal features extracted from RGB videos and, therefore, it acquires highly discriminative for gesture recognition skeletal data without the need for any additional equipment, such as data gloves, that may restrict signer’s movements. Experimentation on a large publicly available sign language dataset reveals the superiority of our methodology with respect to other state of the art approaches relying solely on RGB features.
{"title":"SIGN LANGUAGE RECOGNITION BASED ON HAND AND BODY SKELETAL DATA","authors":"D. Konstantinidis, K. Dimitropoulos, P. Daras","doi":"10.1109/3DTV.2018.8478467","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478467","url":null,"abstract":"Sign language recognition (SLR) is a challenging, but highly important research field for several computer vision systems that attempt to facilitate the communication among the deaf and hearing impaired people. In this work, we propose an accurate and robust deep learning-based methodology for sign language recognition from video sequences. Our novel method relies on hand and body skeletal features extracted from RGB videos and, therefore, it acquires highly discriminative for gesture recognition skeletal data without the need for any additional equipment, such as data gloves, that may restrict signer’s movements. Experimentation on a large publicly available sign language dataset reveals the superiority of our methodology with respect to other state of the art approaches relying solely on RGB features.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478436
A. Cserkaszky, P. A. Kara, A. Barsi, M. Martini
Light-field visualization is continuously emerging in industrial sectors, and the appearance on the consumer market is approaching. Yet this process is halted, or at least slowed down, by the lack of proper display-independent light-field formats. Such formats are necessary to enable the efficient interchange between light-field content creation and visualization, and thus support potential future use case scenarios of this technology. In this paper, we introduce the results of a perceived quality assessment research, performed on our own novel light-field visualization format. The subjective tests, which compared conventional linear camera array visualization to our format, were completed by experts only, thus quality assessment was an expert evaluation. We aim to use the findings gathered in this research to carry out a large-scale subjective test series in the future, with non-expert observers.
{"title":"EXPERT EVALUATION OF A NOVEL LIGHT-FIELD VISUALIZATION FORMAT","authors":"A. Cserkaszky, P. A. Kara, A. Barsi, M. Martini","doi":"10.1109/3DTV.2018.8478436","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478436","url":null,"abstract":"Light-field visualization is continuously emerging in industrial sectors, and the appearance on the consumer market is approaching. Yet this process is halted, or at least slowed down, by the lack of proper display-independent light-field formats. Such formats are necessary to enable the efficient interchange between light-field content creation and visualization, and thus support potential future use case scenarios of this technology. In this paper, we introduce the results of a perceived quality assessment research, performed on our own novel light-field visualization format. The subjective tests, which compared conventional linear camera array visualization to our format, were completed by experts only, thus quality assessment was an expert evaluation. We aim to use the findings gathered in this research to carry out a large-scale subjective test series in the future, with non-expert observers.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124256897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3dtv.2018.8478548
{"title":"3DTV-CON 2018 Index","authors":"","doi":"10.1109/3dtv.2018.8478548","DOIUrl":"https://doi.org/10.1109/3dtv.2018.8478548","url":null,"abstract":"","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"638 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116084692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/3DTV.2018.8478431
Aleksandr Ploshkin, D. Vatolin
Video synchronization is a fundamental computer-vision task that is necessary for a wide range of applications. A 3D video involves two streams, which show the scene from different angles concurrently, but many cases exhibit desynchronization between them. This paper investigates the problem of synchronizing the left and right stereoscopic views. We assume the temporal shift (time difference) and geometric distortion between the two streams are constant throughout each scene. We propose a temporal-shift estimation method with subframe accuracy based on a block-matching algorithm.
{"title":"ACCURATE METHOD OF TEMPORAL-SHIFT ESTIMATION FOR 3D VIDEO","authors":"Aleksandr Ploshkin, D. Vatolin","doi":"10.1109/3DTV.2018.8478431","DOIUrl":"https://doi.org/10.1109/3DTV.2018.8478431","url":null,"abstract":"Video synchronization is a fundamental computer-vision task that is necessary for a wide range of applications. A 3D video involves two streams, which show the scene from different angles concurrently, but many cases exhibit desynchronization between them. This paper investigates the problem of synchronizing the left and right stereoscopic views. We assume the temporal shift (time difference) and geometric distortion between the two streams are constant throughout each scene. We propose a temporal-shift estimation method with subframe accuracy based on a block-matching algorithm.","PeriodicalId":267389,"journal":{"name":"2018 - 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122707096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}