Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611926
Sangsoo Ahn, Munchurl Kim
In this paper, a hybrid stereoscopic video codec is proposed based on MPEG-2 and an extended HEVC with an interview coding scheme for stereoscopic TV services through heterogeneous networks. The left-view sequences are encoded in an MPEG-2 video encoder for conventional 2D TV services via the traditional terrestrial broadcasting networks. On the other hand, the right-view sequences are encoded by an extended HEVC with a proposed interview coding scheme and the resulting bitstreams are transmitted over Internet. So, a 3D TV terminal to support the hybrid stereoscopic video streams receives the MPEG-2 data for the left-view sequences via the terrestrial broadcasting networks, and receives the right-view sequence streams of an extended HEVC data over Internet. The proposed interview coding scheme in an extended HEVC utilizes as reference frames the reconstructed MPEG-2 frames of the left-view sequences to perform predictive coding for the current frames of the right-view sequences. To enhance the texture qualities of the reference frames, an ALF tool is applied for the reconstructed MPEG-2 frames and HEVC as well. The ALF ON/OFF signaling map and ALF coefficients for the MPEG-2 reconstructed frames are transmitted in conjunction with HEVC bitstreams via Internet. The experimental results show that the proposed hybrid stereoscopic codec with ALF-based interview coding improves the coding efficiency with average 16.81% BD-rate gain, compared to a hybrid stereoscopic codec of independent MPEG-2 and HEVC codec without interview coding.
{"title":"Adaptive loop filtering based interview video coding in an hybrid video codec with MPEG-2 and HEVC for stereosopic video coding","authors":"Sangsoo Ahn, Munchurl Kim","doi":"10.1109/IVMSPW.2013.6611926","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611926","url":null,"abstract":"In this paper, a hybrid stereoscopic video codec is proposed based on MPEG-2 and an extended HEVC with an interview coding scheme for stereoscopic TV services through heterogeneous networks. The left-view sequences are encoded in an MPEG-2 video encoder for conventional 2D TV services via the traditional terrestrial broadcasting networks. On the other hand, the right-view sequences are encoded by an extended HEVC with a proposed interview coding scheme and the resulting bitstreams are transmitted over Internet. So, a 3D TV terminal to support the hybrid stereoscopic video streams receives the MPEG-2 data for the left-view sequences via the terrestrial broadcasting networks, and receives the right-view sequence streams of an extended HEVC data over Internet. The proposed interview coding scheme in an extended HEVC utilizes as reference frames the reconstructed MPEG-2 frames of the left-view sequences to perform predictive coding for the current frames of the right-view sequences. To enhance the texture qualities of the reference frames, an ALF tool is applied for the reconstructed MPEG-2 frames and HEVC as well. The ALF ON/OFF signaling map and ALF coefficients for the MPEG-2 reconstructed frames are transmitted in conjunction with HEVC bitstreams via Internet. The experimental results show that the proposed hybrid stereoscopic codec with ALF-based interview coding improves the coding efficiency with average 16.81% BD-rate gain, compared to a hybrid stereoscopic codec of independent MPEG-2 and HEVC codec without interview coding.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130242718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611900
Che-Chun Su, L. Cormack, A. Bovik
We consider the problem of estimating a dense depth map from a single monocular image. Inspired by psychophysical evidence of visual processing in human vision systems (HVS) and natural scene statistics (NSS) models of image and range, we propose a Bayesian framework to recover detailed 3D scene structure by exploiting the statistical relationships between local image features and depth variations inherent in natural images. By observing that similar depth structures may exist in different types of luminance/chrominance textured regions in natural scenes, we build a dictionary of canonical range patterns as the prior, and fit a multivariate Gaussian mixture (MGM) model to associate local image features to different range patterns as the likelihood. Compared with the state-of-the-art depth estimation method, we achieve similar performance in terms of pixel-wise estimated range error, but superior capability of recovering relative distant relationships between different parts of the image.
{"title":"Depth estimation from monocular color images using natural scene statistics models","authors":"Che-Chun Su, L. Cormack, A. Bovik","doi":"10.1109/IVMSPW.2013.6611900","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611900","url":null,"abstract":"We consider the problem of estimating a dense depth map from a single monocular image. Inspired by psychophysical evidence of visual processing in human vision systems (HVS) and natural scene statistics (NSS) models of image and range, we propose a Bayesian framework to recover detailed 3D scene structure by exploiting the statistical relationships between local image features and depth variations inherent in natural images. By observing that similar depth structures may exist in different types of luminance/chrominance textured regions in natural scenes, we build a dictionary of canonical range patterns as the prior, and fit a multivariate Gaussian mixture (MGM) model to associate local image features to different range patterns as the likelihood. Compared with the state-of-the-art depth estimation method, we achieve similar performance in terms of pixel-wise estimated range error, but superior capability of recovering relative distant relationships between different parts of the image.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"46 22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124668745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611932
G. Chantas, N. Nikolaidis, I. Pitas
A general Bayesian post-processing methodology for performance improvement of object tracking in stereo video sequences is proposed in this paper. We utilize the results of any single channel visual object tracker in a Bayesian framework, in order to refine the tracking accuracy in both stereo video channels. In this framework, a variational Bayesian algorithm is employed, where prior knowledge about the object displacement (movement) is incorporated via a prior distribution. This displacement information is obtained in a preprocessing step, where object displacement is estimated via feature extraction and matching. In parallel, disparity information is extracted and utilized in the same framework. The improvements introduced by the proposed methodology in terms of tracking accuracy are quantified through experimental analysis.
{"title":"A Bayesian methodology for visual object tracking on stereo sequences","authors":"G. Chantas, N. Nikolaidis, I. Pitas","doi":"10.1109/IVMSPW.2013.6611932","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611932","url":null,"abstract":"A general Bayesian post-processing methodology for performance improvement of object tracking in stereo video sequences is proposed in this paper. We utilize the results of any single channel visual object tracker in a Bayesian framework, in order to refine the tracking accuracy in both stereo video channels. In this framework, a variational Bayesian algorithm is employed, where prior knowledge about the object displacement (movement) is incorporated via a prior distribution. This displacement information is obtained in a preprocessing step, where object displacement is estimated via feature extraction and matching. In parallel, disparity information is extracted and utilized in the same framework. The improvements introduced by the proposed methodology in terms of tracking accuracy are quantified through experimental analysis.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125075703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611930
Amin Banitalebi-Dehkordi, M. Pourazad, P. Nasiopoulos
As the evolution of multiview display technology is bringing glasses-free 3DTV closer to reality, MPEG and VCEG are preparing an extension to HEVC to encode multiview video content. View synthesis in the current version of the 3D video codec is performed using PSNR as a quality metric measure. In this paper, we propose a full-reference Human-Visual-System based 3D video quality metric to be used in multiview encoding as an alternative to PSNR. Performance of our metric is tested in a 2-view case scenario. The quality of the compressed stereo pair, formed from a decoded view and a synthesized view, is evaluated at the encoder side. The performance is verified through a series of subjective tests and compared with that of PSNR, SSIM, MS-SSIM, VIFp, and VQM metrics. Experimental results showed that our 3D quality metric has the highest correlation with Mean Opinion Scores (MOS) compared to the other tested metrics.
{"title":"3D video quality metric for 3D video compression","authors":"Amin Banitalebi-Dehkordi, M. Pourazad, P. Nasiopoulos","doi":"10.1109/IVMSPW.2013.6611930","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611930","url":null,"abstract":"As the evolution of multiview display technology is bringing glasses-free 3DTV closer to reality, MPEG and VCEG are preparing an extension to HEVC to encode multiview video content. View synthesis in the current version of the 3D video codec is performed using PSNR as a quality metric measure. In this paper, we propose a full-reference Human-Visual-System based 3D video quality metric to be used in multiview encoding as an alternative to PSNR. Performance of our metric is tested in a 2-view case scenario. The quality of the compressed stereo pair, formed from a decoded view and a synthesized view, is evaluated at the encoder side. The performance is verified through a series of subjective tests and compared with that of PSNR, SSIM, MS-SSIM, VIFp, and VQM metrics. Experimental results showed that our 3D quality metric has the highest correlation with Mean Opinion Scores (MOS) compared to the other tested metrics.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126371753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611901
A. Othmani, A. Piboule, L. Voon
Tree species recognition from Terrestrial Light Detection and Ranging (T-LiDAR) scanner data is essential for estimating forest inventory attributes in a mixed planting. In this paper, we propose a new method for individual tree species recognition based on the analysis of the 3D geometric texture of tree barks. Our method transforms the 3D point cloud of a 30 cm segment of the tree trunk into a depth image on which a hybrid segmentation method using watershed and region merging techniques is applied in order to reveal bark shape characteristics. Finally, shape and intensity features are calculated on the segmented depth image and used to classify five different tree species using a Random Forest (RF) classifier. Our method has been tested using two datasets acquired in two different French forests with different terrain characteristics. The accuracy and precision rates obtained for both datasets are over 89%.
{"title":"Hybrid segmentation of depth images using a watershed and region merging based method for tree species recognition","authors":"A. Othmani, A. Piboule, L. Voon","doi":"10.1109/IVMSPW.2013.6611901","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611901","url":null,"abstract":"Tree species recognition from Terrestrial Light Detection and Ranging (T-LiDAR) scanner data is essential for estimating forest inventory attributes in a mixed planting. In this paper, we propose a new method for individual tree species recognition based on the analysis of the 3D geometric texture of tree barks. Our method transforms the 3D point cloud of a 30 cm segment of the tree trunk into a depth image on which a hybrid segmentation method using watershed and region merging techniques is applied in order to reveal bark shape characteristics. Finally, shape and intensity features are calculated on the segmented depth image and used to classify five different tree species using a Random Forest (RF) classifier. Our method has been tested using two datasets acquired in two different French forests with different terrain characteristics. The accuracy and precision rates obtained for both datasets are over 89%.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128001239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611933
Hao Cheng, Zhixiang You, P. An, Zhaoyang Zhang
This paper introduces several models of the multi-view acquisition/stereo display system. With the use of these models, we can easily analyze the factors impacting on multi-view acquisition/stereo display system, such as stereo angle, number of views, and stereo image resolution. In order to use these factors constructing better multi-view acquisition/stereo display system, the strategy to optimize them are needed. This paper proposes a structure optimization for multi-view acquisition/stereo display system. With the structure optimization, we can adjust the factors conveniently and easily set up a real multi-view acquisition/stereo display system to achieve good effect.
{"title":"Structure optimization for multi-view acquisition and stereo display system","authors":"Hao Cheng, Zhixiang You, P. An, Zhaoyang Zhang","doi":"10.1109/IVMSPW.2013.6611933","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611933","url":null,"abstract":"This paper introduces several models of the multi-view acquisition/stereo display system. With the use of these models, we can easily analyze the factors impacting on multi-view acquisition/stereo display system, such as stereo angle, number of views, and stereo image resolution. In order to use these factors constructing better multi-view acquisition/stereo display system, the strategy to optimize them are needed. This paper proposes a structure optimization for multi-view acquisition/stereo display system. With the structure optimization, we can adjust the factors conveniently and easily set up a real multi-view acquisition/stereo display system to achieve good effect.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131213912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611928
Masanori Sano, W. Bailer, A. Messina, J. Evain, M. Matton
This paper describes a new MPEG-7 profile called AVDP (Audiovisual Description Profile). Firstly, some problems with conventional MPEG-7 profiles are described and the motivation behind the development of AVDP is explained based on requirements from broadcasters and other actors from the media industry. Secondly, the scope and functionalities of AVDP are described. Differences from the existing profiles and the basic AVDP structure and components are explained. Some useful software tools handling AVDP, including for validation and visualization are discussed. Finally the use of AVDP to represent multi-view and panoramic video content is described.
{"title":"The MPEG-7 Audiovisual Description Profile (AVDP) and its application to multi-view video","authors":"Masanori Sano, W. Bailer, A. Messina, J. Evain, M. Matton","doi":"10.1109/IVMSPW.2013.6611928","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611928","url":null,"abstract":"This paper describes a new MPEG-7 profile called AVDP (Audiovisual Description Profile). Firstly, some problems with conventional MPEG-7 profiles are described and the motivation behind the development of AVDP is explained based on requirements from broadcasters and other actors from the media industry. Secondly, the scope and functionalities of AVDP are described. Differences from the existing profiles and the basic AVDP structure and components are explained. Some useful software tools handling AVDP, including for validation and visualization are discussed. Finally the use of AVDP to represent multi-view and panoramic video content is described.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134121707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611913
Pengfei Wan, Gene Cheung, P. Chou, D. Florêncio, Cha Zhang, O. Au
Transmitting from sender compressed texture and depth maps of multiple viewpoints enables image synthesis at receiver from any intermediate virtual viewpoint via depth-image-based rendering (DIBR). We observe that quantized depth maps from different viewpoints of the same 3D scene constitutes multiple descriptions (MD) of the same signal, thus it is possible to reconstruct the 3D scene in higher precision at receiver when multiple depth maps are considered jointly. In this paper, we cast the precision enhancement of 3D surfaces from multiple quantized depth maps as a combinatorial optimization problem. First, we derive a lemma that allows us to increase the precision of a subset of 3D points with certainty, simply by discovering special intersections of quantization bins (QB) from both views. Then, we identify the most probable voxel-containing QB intersections using a shortest-path formulation. Experimental results show that our method can significantly increase the precision of decoded depth maps compared with standard decoding schemes.
{"title":"Precision enhancement of 3D surfaces from multiple quantized depth maps","authors":"Pengfei Wan, Gene Cheung, P. Chou, D. Florêncio, Cha Zhang, O. Au","doi":"10.1109/IVMSPW.2013.6611913","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611913","url":null,"abstract":"Transmitting from sender compressed texture and depth maps of multiple viewpoints enables image synthesis at receiver from any intermediate virtual viewpoint via depth-image-based rendering (DIBR). We observe that quantized depth maps from different viewpoints of the same 3D scene constitutes multiple descriptions (MD) of the same signal, thus it is possible to reconstruct the 3D scene in higher precision at receiver when multiple depth maps are considered jointly. In this paper, we cast the precision enhancement of 3D surfaces from multiple quantized depth maps as a combinatorial optimization problem. First, we derive a lemma that allows us to increase the precision of a subset of 3D points with certainty, simply by discovering special intersections of quantization bins (QB) from both views. Then, we identify the most probable voxel-containing QB intersections using a shortest-path formulation. Experimental results show that our method can significantly increase the precision of decoded depth maps compared with standard decoding schemes.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134613805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611918
Dogancan Temel, G. Al-Regib
The 3D video quality metric (3VQM) was proposed to evaluate the temporal and spatial variation of the depth errors for the depth values that would lead to inconsistencies between left and right views, fast changing disparities, and geometric distortions. Previously, we evaluated 3VQM against subjective scores. In this paper, we show the effectiveness of 3VQM in capturing errors and inconsistencies that exist in the rendered depth-based 3D videos. We further investigate how 3VQM could measure excessive disparities, fast changing disparities, geometric distortions, temporal flickering and/or spatial noise in the form of depth cues inconsistency. Results show that 3VQM best captures the depth inconsistencies based on errors in the reference views. However, the metric is not sensitive to depth map mild errors such as those resulting from blur. We also performed a subjective quality test and showed that 3VQM performs better than PSNR, weighted PSNR and SSIM in terms of accuracy, coherency and consistency.
{"title":"Effectiveness of 3VQM in capturing depth inconsistencies","authors":"Dogancan Temel, G. Al-Regib","doi":"10.1109/IVMSPW.2013.6611918","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611918","url":null,"abstract":"The 3D video quality metric (3VQM) was proposed to evaluate the temporal and spatial variation of the depth errors for the depth values that would lead to inconsistencies between left and right views, fast changing disparities, and geometric distortions. Previously, we evaluated 3VQM against subjective scores. In this paper, we show the effectiveness of 3VQM in capturing errors and inconsistencies that exist in the rendered depth-based 3D videos. We further investigate how 3VQM could measure excessive disparities, fast changing disparities, geometric distortions, temporal flickering and/or spatial noise in the form of depth cues inconsistency. Results show that 3VQM best captures the depth inconsistencies based on errors in the reference views. However, the metric is not sensitive to depth map mild errors such as those resulting from blur. We also performed a subjective quality test and showed that 3VQM performs better than PSNR, weighted PSNR and SSIM in terms of accuracy, coherency and consistency.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117280672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-10DOI: 10.1109/IVMSPW.2013.6611919
F. Huang, A. Tsai, Meng-Tsan Li, Jui-Yang Tsai
A semi-automatic image-based approach for city street modeling was proposed, which takes two types of images as input. One is an orthogonal aerial image of the area of interest and the other is a set of street-view spherical panoramic images. This paper focuses on the accuracy enhancement of camera trajectory recovery, which is crucial in registering two types of image sources. Scale-invariant Feature Transform feature detection and matching methods were employed to identify corresponding image points between each pair of successive panoramic images. Due to the wide field-of-view of spherical panoramic images and high image recording frequency, the number of resultant matches is generally very large. Instead of directly applying RANSAC which is very time consuming, we proposed a method to preprocess those matches. We claim that the majority of incorrect or insignificant matches will be successfully removed. Several real-world experiments were conducted to demonstrate that our method is able to achieve higher accuracy at estimating camera extrinsic parameters, and would consequently lead to a more accurate camera trajectory recovery result.
{"title":"Camera trajectory recovery for image-based city street modeling","authors":"F. Huang, A. Tsai, Meng-Tsan Li, Jui-Yang Tsai","doi":"10.1109/IVMSPW.2013.6611919","DOIUrl":"https://doi.org/10.1109/IVMSPW.2013.6611919","url":null,"abstract":"A semi-automatic image-based approach for city street modeling was proposed, which takes two types of images as input. One is an orthogonal aerial image of the area of interest and the other is a set of street-view spherical panoramic images. This paper focuses on the accuracy enhancement of camera trajectory recovery, which is crucial in registering two types of image sources. Scale-invariant Feature Transform feature detection and matching methods were employed to identify corresponding image points between each pair of successive panoramic images. Due to the wide field-of-view of spherical panoramic images and high image recording frequency, the number of resultant matches is generally very large. Instead of directly applying RANSAC which is very time consuming, we proposed a method to preprocess those matches. We claim that the majority of incorrect or insignificant matches will be successfully removed. Several real-world experiments were conducted to demonstrate that our method is able to achieve higher accuracy at estimating camera extrinsic parameters, and would consequently lead to a more accurate camera trajectory recovery result.","PeriodicalId":170714,"journal":{"name":"IVMSP 2013","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115231926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}