Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706416
Li Yu, Guangtao Fu, Aidong Men, Binji Luo, Huiling Zhao
In this paper, we propose a novel motion compensated prediction (MCP) framework which combines the properties of motion vector restriction and weighted advanced motion vector prediction (AMVP) to achieve higher prediction accuracy for High Efficiency Video Coding (HEVC). In our framework, motion vectors of the prediction units (PUs) surrounding the current PU are checked first by a motion field model, and the geometric relationship between motion vectors of the current PU and its neighboring coded PUs is analyzed. Then whether to use the weighted AMVP prediction method is determined by the motion vector restriction criterion. True motion vectors therefore can be obtained. Experimental results show that the proposed framework achieves BD-PSNR increments ranging from 0.03dB to 0.22dB and the BD-rate saving is up to 6.4%.
{"title":"A novel motion compensated prediction framework using weighted AMVP prediction for HEVC","authors":"Li Yu, Guangtao Fu, Aidong Men, Binji Luo, Huiling Zhao","doi":"10.1109/VCIP.2013.6706416","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706416","url":null,"abstract":"In this paper, we propose a novel motion compensated prediction (MCP) framework which combines the properties of motion vector restriction and weighted advanced motion vector prediction (AMVP) to achieve higher prediction accuracy for High Efficiency Video Coding (HEVC). In our framework, motion vectors of the prediction units (PUs) surrounding the current PU are checked first by a motion field model, and the geometric relationship between motion vectors of the current PU and its neighboring coded PUs is analyzed. Then whether to use the weighted AMVP prediction method is determined by the motion vector restriction criterion. True motion vectors therefore can be obtained. Experimental results show that the proposed framework achieves BD-PSNR increments ranging from 0.03dB to 0.22dB and the BD-rate saving is up to 6.4%.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132575270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706383
J. Korhonen, Claire Mantel, Nino Burini, Søren Forchhammer
Objective image and video quality metrics focus mostly on the digital representation of the signal. However, the display characteristics are also essential for the overall Quality of Experience (QoE). In this paper, we use a model of a backlight dimming system for Liquid Crystal Display (LCD) and show how the modeled image can be used as an input to quality assessment algorithms. For quality assessment, we propose an image quality metric, based on Peak Signal-to-Noise Ratio (PSNR) computation in the CIE L*a*b* color space. The metric takes luminance reduction, color distortion and loss of uniformity in the resulting image in consideration. Subjective evaluations of images generated using different backlight dimming algorithms and clipping strategies show that the proposed metric estimates the perceived image quality more accurately than conventional PSNR.
{"title":"Modeling the color image and video quality on liquid crystal displays with backlight dimming","authors":"J. Korhonen, Claire Mantel, Nino Burini, Søren Forchhammer","doi":"10.1109/VCIP.2013.6706383","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706383","url":null,"abstract":"Objective image and video quality metrics focus mostly on the digital representation of the signal. However, the display characteristics are also essential for the overall Quality of Experience (QoE). In this paper, we use a model of a backlight dimming system for Liquid Crystal Display (LCD) and show how the modeled image can be used as an input to quality assessment algorithms. For quality assessment, we propose an image quality metric, based on Peak Signal-to-Noise Ratio (PSNR) computation in the CIE L*a*b* color space. The metric takes luminance reduction, color distortion and loss of uniformity in the resulting image in consideration. Subjective evaluations of images generated using different backlight dimming algorithms and clipping strategies show that the proposed metric estimates the perceived image quality more accurately than conventional PSNR.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134164457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706429
Manri Cheon, Jong-Seok Lee
This paper presents a study investigating the viewing behavior of human subjects for video contents having different frame rates. Frame rate variability arises when temporal video scalability is considered for adaptive video transmission, and the gaze pattern variation due to the frame rate variability would eventually affect the visual perception, which needs to be considered during perceptual optimization of such a system. We design an eye-tracking experiment using several high definition contents having a wide range of content characteristics. By comparing the gaze points for a normal frame rate condition and a low frame rate condition, it is shown that, although the overall viewing pattern remains quite similar, statistically significant difference is also observed for some time intervals. The difference is analyzed in terms of two factors, namely, overall gaze paths and subject-wise variability.
{"title":"Gaze pattern analysis for video contents with different frame rates","authors":"Manri Cheon, Jong-Seok Lee","doi":"10.1109/VCIP.2013.6706429","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706429","url":null,"abstract":"This paper presents a study investigating the viewing behavior of human subjects for video contents having different frame rates. Frame rate variability arises when temporal video scalability is considered for adaptive video transmission, and the gaze pattern variation due to the frame rate variability would eventually affect the visual perception, which needs to be considered during perceptual optimization of such a system. We design an eye-tracking experiment using several high definition contents having a wide range of content characteristics. By comparing the gaze points for a normal frame rate condition and a low frame rate condition, it is shown that, although the overall viewing pattern remains quite similar, statistically significant difference is also observed for some time intervals. The difference is analyzed in terms of two factors, namely, overall gaze paths and subject-wise variability.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121200835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706410
M. Ismail, Ouiem Bchir, Ahmed Z. Emam
We propose a novel endoscopy video summarization approach based on unsupervised learning and feature discrimination. The proposed learning approach partitions the collection of video frames into homogeneous categories based on their visual and temporal descriptors. Also, it generates possibilistic memberships in order to represent the degree of typicality of each video frame within every category, and reduce the influence of noise frames on the learning process. The algorithm learns iteratively the optimal relevance weight for each feature subset within each cluster. Moreover, it finds the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. The endoscopy video summary consists of the most typical frames in all clusters after discarding noise frames. We compare the performance of the proposed algorithm with state-of-the-art learning approaches. We show that the possibilistic approach is more robust. The endoscopy videos collection includes more than 90k video frames.
{"title":"Endoscopy video summarization based on unsupervised learning and feature discrimination","authors":"M. Ismail, Ouiem Bchir, Ahmed Z. Emam","doi":"10.1109/VCIP.2013.6706410","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706410","url":null,"abstract":"We propose a novel endoscopy video summarization approach based on unsupervised learning and feature discrimination. The proposed learning approach partitions the collection of video frames into homogeneous categories based on their visual and temporal descriptors. Also, it generates possibilistic memberships in order to represent the degree of typicality of each video frame within every category, and reduce the influence of noise frames on the learning process. The algorithm learns iteratively the optimal relevance weight for each feature subset within each cluster. Moreover, it finds the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. The endoscopy video summary consists of the most typical frames in all clusters after discarding noise frames. We compare the performance of the proposed algorithm with state-of-the-art learning approaches. We show that the possibilistic approach is more robust. The endoscopy videos collection includes more than 90k video frames.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121282432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706335
Chao Zhou, Xinggong Zhang, Zongming Guo
Recently, Dynamic Adaptive Streaming over HTTP (DASH) has been widely deployed in the Internet. However, the research about DASH over Multiple Content Distribution Servers (MCDS) is few. Compared with traditional single-server-DASH, MCDS are able to offer expanded bandwidth, link diversity, and reliability. It is, however, a challenging problem to smooth video bitrate switchings over multiple servers due to their diverse bandwidths. In this paper, we propose a block-based rate adaptation method considering both the diverse bandwidths and feedback buffered video time. Multiple fragments are grouped into a block, and the fragments are downloaded in parallel from multiple servers. We propose to adapt video bitrate at the block level rather than at the fragment level. By dynamically adjusting the block length and scheduling fragment requests to multiple servers, the requested video bitrates from the multiple servers are synchronized, making the fragments downloaded orderly. Then, we propose a control-theoretic approach to select an appropriate bitrate for each block. By modeling and linearizing the rate adaption system, we propose a novel Proportional-Derivative (PD) controller to adapt video bitrate with high responsiveness and stability. Theoretical analysis and extensive experiments on the Internet demonstrate the good efficiency of our DASH designs.
近年来,基于HTTP的动态自适应流(Dynamic Adaptive Streaming over HTTP, DASH)在Internet上得到了广泛的应用。然而,关于基于多内容分发服务器(MCDS)的DASH的研究却很少。与传统的单服务器dash相比,MCDS能够提供更大的带宽、链路多样性和可靠性。然而,由于多个服务器的带宽不同,在多个服务器上实现视频比特率的平滑切换是一个具有挑战性的问题。在本文中,我们提出了一种基于块的速率自适应方法,同时考虑了不同的带宽和反馈缓冲视频时间。多个片段被分组成一个块,并从多个服务器并行下载这些片段。我们建议在块级而不是片段级调整视频比特率。通过动态调整块长度和调度片段请求到多个服务器,同步多个服务器请求的视频比特率,使片段有序下载。然后,我们提出了一种控制理论方法来为每个块选择合适的比特率。通过对速率自适应系统进行建模和线性化,我们提出了一种新的比例导数(PD)控制器来适应具有高响应性和稳定性的视频比特率。理论分析和广泛的网络实验证明了DASH设计的良好效率。
{"title":"A control theory based rate adaption scheme for dash over multiple servers","authors":"Chao Zhou, Xinggong Zhang, Zongming Guo","doi":"10.1109/VCIP.2013.6706335","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706335","url":null,"abstract":"Recently, Dynamic Adaptive Streaming over HTTP (DASH) has been widely deployed in the Internet. However, the research about DASH over Multiple Content Distribution Servers (MCDS) is few. Compared with traditional single-server-DASH, MCDS are able to offer expanded bandwidth, link diversity, and reliability. It is, however, a challenging problem to smooth video bitrate switchings over multiple servers due to their diverse bandwidths. In this paper, we propose a block-based rate adaptation method considering both the diverse bandwidths and feedback buffered video time. Multiple fragments are grouped into a block, and the fragments are downloaded in parallel from multiple servers. We propose to adapt video bitrate at the block level rather than at the fragment level. By dynamically adjusting the block length and scheduling fragment requests to multiple servers, the requested video bitrates from the multiple servers are synchronized, making the fragments downloaded orderly. Then, we propose a control-theoretic approach to select an appropriate bitrate for each block. By modeling and linearizing the rate adaption system, we propose a novel Proportional-Derivative (PD) controller to adapt video bitrate with high responsiveness and stability. Theoretical analysis and extensive experiments on the Internet demonstrate the good efficiency of our DASH designs.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128719560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Depth acquisition for moving objects becomes increasingly critical for some applications such as human facial expression recognition. This paper presents a method for capturing the depth maps of moving objects that uses a one-shot black-and-white stripe pattern with the features of simplicity and easily generation. Considering the accuracy of a matching is crucial for a precise depth map but the matching of variant-width stripes is sparse and rough, the phase differences extracted by Gabor filter to achieve a pixel-wise matching with sub-pixel accuracy are used. The details of the derivation are presented to prove that this method based on the phase difference calculated by Gabor filter is valid. In addition, the periodic ambiguity of the encoded stripe is eliminated by the epipolar segment covering a given depth range at a camera-projector calibrating stage to decrease the calculation complexity. Experimental results show that our method can get a dense and accurate depth map of a moving object.
{"title":"Dense depth acquisition via one-shot stripe structured light","authors":"Qin Li, Fu Li, Guangming Shi, Fei Qi, Yuexin Shi, Shan Gao","doi":"10.1109/VCIP.2013.6706402","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706402","url":null,"abstract":"Depth acquisition for moving objects becomes increasingly critical for some applications such as human facial expression recognition. This paper presents a method for capturing the depth maps of moving objects that uses a one-shot black-and-white stripe pattern with the features of simplicity and easily generation. Considering the accuracy of a matching is crucial for a precise depth map but the matching of variant-width stripes is sparse and rough, the phase differences extracted by Gabor filter to achieve a pixel-wise matching with sub-pixel accuracy are used. The details of the derivation are presented to prove that this method based on the phase difference calculated by Gabor filter is valid. In addition, the periodic ambiguity of the encoded stripe is eliminated by the epipolar segment covering a given depth range at a camera-projector calibrating stage to decrease the calculation complexity. Experimental results show that our method can get a dense and accurate depth map of a moving object.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently. Ideally, implementation of CSI provides lossless compression in image coding. In this paper, we consider the lossy compression of the CS measurements in CSI system. We design a universal quantizer for the CS measurements of any input image. The proposed method firstly establishes a universal probability model for the CS measurements in advance, without knowing any information of the input image. Then a fast quantizer is designed based on this established model. Simulation result demonstrates that the proposed method has nearly optimal rate-distortion (R~D) performance, meanwhile, maintains a very low computational complexity at the CS encoder.
{"title":"Universal and low-complexity quantizer design for compressive sensing image coding","authors":"Xiangwei Li, Xuguang Lan, Meng Yang, Jianru Xue, Nanning Zheng","doi":"10.1109/VCIP.2013.6706403","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706403","url":null,"abstract":"Compressive sensing imaging (CSI) is a new framework for image coding, which enables acquiring and compressing a scene simultaneously. The CS encoder shifts the bulk of the system complexity to the decoder efficiently. Ideally, implementation of CSI provides lossless compression in image coding. In this paper, we consider the lossy compression of the CS measurements in CSI system. We design a universal quantizer for the CS measurements of any input image. The proposed method firstly establishes a universal probability model for the CS measurements in advance, without knowing any information of the input image. Then a fast quantizer is designed based on this established model. Simulation result demonstrates that the proposed method has nearly optimal rate-distortion (R~D) performance, meanwhile, maintains a very low computational complexity at the CS encoder.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114778639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706442
Junhui Hou, Lap-Pui Chau, Ying He, N. Magnenat-Thalmann
Compression of mesh-based 3-D models has been an important issue, which ensures efficient storage and transmission. In this paper, we present a very effective compression scheme specifically for expression variation 3-D face models. Firstly, 3-D models are mapped into 2-D parametric domain and corresponded by expression-invariant parameterizaton, leading to 2-D image format representation namely geometry images, which simplifies the 3-D model compression into 2-D image compression. Then, sparse representation with learned dictionaries via K-SVD is applied to each patch from sliced GI so that only few coefficients and their indices are needed to be encoded, leading to low datasize. Experimental results demonstrate that the proposed scheme provides significant improvement in terms of compression performance, especially at low bitrate, compared with existing algorithms.
{"title":"Expression-invariant and sparse representation for mesh-based compression for 3-D face models","authors":"Junhui Hou, Lap-Pui Chau, Ying He, N. Magnenat-Thalmann","doi":"10.1109/VCIP.2013.6706442","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706442","url":null,"abstract":"Compression of mesh-based 3-D models has been an important issue, which ensures efficient storage and transmission. In this paper, we present a very effective compression scheme specifically for expression variation 3-D face models. Firstly, 3-D models are mapped into 2-D parametric domain and corresponded by expression-invariant parameterizaton, leading to 2-D image format representation namely geometry images, which simplifies the 3-D model compression into 2-D image compression. Then, sparse representation with learned dictionaries via K-SVD is applied to each patch from sliced GI so that only few coefficients and their indices are needed to be encoded, leading to low datasize. Experimental results demonstrate that the proposed scheme provides significant improvement in terms of compression performance, especially at low bitrate, compared with existing algorithms.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"653 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121986493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706444
Kai-Han Lo, Y. Wang, K. Hua
Depth map super-resolution is an emerging topic due to the increasing needs and applications using RGB-D sensors. Together with the color image, the corresponding range data provides additional information and makes visual analysis tasks more tractable. However, since the depth maps captured by such sensors are typically with limited resolution, it is preferable to enhance its resolution for improved recognition. In this paper, we present a novel joint trilateral filtering (JTF) algorithm for solving depth map super-resolution (SR) problems. Inspired by bilateral filtering, our JTF utilizes and preserves edge information from the associated high-resolution (HR) image by taking spatial and range information of local pixels. Our proposed further integrates local gradient information of the depth map when synthesizing its HR output, which alleviates textural artifacts like edge discontinuities. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of our approach over prior depth map upsampling works.
{"title":"Joint trilateral filtering for depth map super-resolution","authors":"Kai-Han Lo, Y. Wang, K. Hua","doi":"10.1109/VCIP.2013.6706444","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706444","url":null,"abstract":"Depth map super-resolution is an emerging topic due to the increasing needs and applications using RGB-D sensors. Together with the color image, the corresponding range data provides additional information and makes visual analysis tasks more tractable. However, since the depth maps captured by such sensors are typically with limited resolution, it is preferable to enhance its resolution for improved recognition. In this paper, we present a novel joint trilateral filtering (JTF) algorithm for solving depth map super-resolution (SR) problems. Inspired by bilateral filtering, our JTF utilizes and preserves edge information from the associated high-resolution (HR) image by taking spatial and range information of local pixels. Our proposed further integrates local gradient information of the depth map when synthesizing its HR output, which alleviates textural artifacts like edge discontinuities. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of our approach over prior depth map upsampling works.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121871878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706409
Hu Tian, Bojin Zhuang, Yan Hua, Yanyun Zhao, A. Cai
In this paper we propose a depth recovery approach for monocular videos with or without camera motion. By combining geometric information and moving object extraction, not only the depth of background but also the depth of foreground can be recovered. Furthermore, for cases involving complex camera motion such as fast moving, translating, vertical movement, we propose a novel global motion estimation (GME) method including effective outlier rejection to extract moving objects, and experiments demonstrate that the proposed GME method outperforms most of the state-of-the-art methods. The depth recovery approach we propose is tested on four video sequences with different camera movements. Experimental results show that our approach produces more accurate depth of both background and foreground than existing depth recovery methods.
{"title":"Recovering depth of background and foreground from a monocular video with camera motion","authors":"Hu Tian, Bojin Zhuang, Yan Hua, Yanyun Zhao, A. Cai","doi":"10.1109/VCIP.2013.6706409","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706409","url":null,"abstract":"In this paper we propose a depth recovery approach for monocular videos with or without camera motion. By combining geometric information and moving object extraction, not only the depth of background but also the depth of foreground can be recovered. Furthermore, for cases involving complex camera motion such as fast moving, translating, vertical movement, we propose a novel global motion estimation (GME) method including effective outlier rejection to extract moving objects, and experiments demonstrate that the proposed GME method outperforms most of the state-of-the-art methods. The depth recovery approach we propose is tested on four video sequences with different camera movements. Experimental results show that our approach produces more accurate depth of both background and foreground than existing depth recovery methods.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121712613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}