Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706435
S. Shimizu, Shiori Sugimoto, H. Kimata, Akira Kojima
View synthesis prediction has been studied as an efficient inter-view prediction scheme. Existing view synthesis prediction schemes fall into two types according to the pixel warping direction. While backward warping based view synthesis prediction enables block-based processing, forward warping based view synthesis prediction can handle occlusions properly. This paper proposes a two-step warping based view synthesis prediction; a virtual depth map is first generated by forward warping, and then prediction signals are generated by block-based backward warping using the virtual depth map. The technique of backward-warping-aware depth inpainting is also proposed. Experiments show that the proposed VSP scheme can achieve the decoder runtime reductions of about 37% on average with slight bitrate reductions relative to the conventional forward warping based VSP. Compared to the conventional backward warping based VSP, the proposed method reduces the bitrate for the synthesized views by up to 2.9% and about 2.2% on average.
{"title":"Backward view synthesis prediction using virtual depth map for multiview video plus depth map coding","authors":"S. Shimizu, Shiori Sugimoto, H. Kimata, Akira Kojima","doi":"10.1109/VCIP.2013.6706435","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706435","url":null,"abstract":"View synthesis prediction has been studied as an efficient inter-view prediction scheme. Existing view synthesis prediction schemes fall into two types according to the pixel warping direction. While backward warping based view synthesis prediction enables block-based processing, forward warping based view synthesis prediction can handle occlusions properly. This paper proposes a two-step warping based view synthesis prediction; a virtual depth map is first generated by forward warping, and then prediction signals are generated by block-based backward warping using the virtual depth map. The technique of backward-warping-aware depth inpainting is also proposed. Experiments show that the proposed VSP scheme can achieve the decoder runtime reductions of about 37% on average with slight bitrate reductions relative to the conventional forward warping based VSP. Compared to the conventional backward warping based VSP, the proposed method reduces the bitrate for the synthesized views by up to 2.9% and about 2.2% on average.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":" 32","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706357
Tiecheng Song, Fanman Meng, Bing Luo, Chao Huang
In this paper, we present a robust texture representation by exploring an ensemble of binary codes. The proposed method, called Locally Enhanced Binary Coding (LEBC), is training-free and needs no costly data-to-cluster assignments. Given an input image, a set of features that describe different pixel-wise properties, is first extracted so as to be robust to rotation and illumination changes. Then, these features are binarized and jointly encoded into specific pixel labels. Meanwhile, the Local Binary Pattern (LBP) operator is utilized to encode the neighboring relationship. Finally, based on the statistics of these pixel labels and LBP labels, a joint histogram is built and used for texture representation. Extensive experiments have been conducted on the Outex, CUReT and UIUC texture databases. Impressive classification results have been achieved compared with state-of-the-art LBP-based and even learning-based algorithms.
{"title":"Robust texture representation by using binary code ensemble","authors":"Tiecheng Song, Fanman Meng, Bing Luo, Chao Huang","doi":"10.1109/VCIP.2013.6706357","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706357","url":null,"abstract":"In this paper, we present a robust texture representation by exploring an ensemble of binary codes. The proposed method, called Locally Enhanced Binary Coding (LEBC), is training-free and needs no costly data-to-cluster assignments. Given an input image, a set of features that describe different pixel-wise properties, is first extracted so as to be robust to rotation and illumination changes. Then, these features are binarized and jointly encoded into specific pixel labels. Meanwhile, the Local Binary Pattern (LBP) operator is utilized to encode the neighboring relationship. Finally, based on the statistics of these pixel labels and LBP labels, a joint histogram is built and used for texture representation. Extensive experiments have been conducted on the Outex, CUReT and UIUC texture databases. Impressive classification results have been achieved compared with state-of-the-art LBP-based and even learning-based algorithms.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123232706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706449
E. François, Christophe Gisquet, Jonathan Taquet, G. Laroche, P. Onno
After having issued the version 1 of the new video coding standard HEVC, ISO-MPEG and ITU-T VCEG groups are specifying its scalable extension. The candidate schemes are based on a multi-layer multi-loop coding framework, exploiting inter-layer texture and motion prediction and full base layer picture decoding. Several inter-layer prediction tools have been explored, implemented either using high-level syntax or block-level core HEVC design changes. One of these tools, Generalized Residual Prediction (GRP), has been extensively studied during several meeting cycles. It is based on second order residual prediction, exploiting motion compensation prediction residual in the base layer. This paper is focused on this new mode. The principle of GRP is described with an analysis of several implementation variants completed by a complexity analysis. Performance of these different implementations is provided, showing that noticeable gains can be obtained without significant complexity increase compared to a simple scalable design comprising only texture and motion inter-layer prediction.
{"title":"Exploration of Generalized Residual Prediction in scalable HEVC","authors":"E. François, Christophe Gisquet, Jonathan Taquet, G. Laroche, P. Onno","doi":"10.1109/VCIP.2013.6706449","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706449","url":null,"abstract":"After having issued the version 1 of the new video coding standard HEVC, ISO-MPEG and ITU-T VCEG groups are specifying its scalable extension. The candidate schemes are based on a multi-layer multi-loop coding framework, exploiting inter-layer texture and motion prediction and full base layer picture decoding. Several inter-layer prediction tools have been explored, implemented either using high-level syntax or block-level core HEVC design changes. One of these tools, Generalized Residual Prediction (GRP), has been extensively studied during several meeting cycles. It is based on second order residual prediction, exploiting motion compensation prediction residual in the base layer. This paper is focused on this new mode. The principle of GRP is described with an analysis of several implementation variants completed by a complexity analysis. Performance of these different implementations is provided, showing that noticeable gains can be obtained without significant complexity increase compared to a simple scalable design comprising only texture and motion inter-layer prediction.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115540649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706348
Chang-Ting Tsai, H. Hang
Most existing 3D image quality metrics use 2D image quality assessment (IQA) models to predict the 3D subjective quality. But in a free viewpoint television (FTV) system, the depth map errors often produce object shifting or ghost artifacts on the synthesized pictures due to the use of Depth Image Based Rendering (DIBR) technique. These artifacts are very different from the ordinary 2D distortions such as blur, Gaussian noise, and compression errors. We thus propose a new 3D quality metric to evaluate the quality of stereo images that may contain artifacts introduced by the rendering process due to depth map errors. We first eliminate the consistent pixel shifts inside an object before the usual 2D metric is applied. The experimental results show that the proposed method enhances the correlation of the objective quality score to the 3D subjective scores.
{"title":"Quality assessment of 3D synthesized views with depth map distortion","authors":"Chang-Ting Tsai, H. Hang","doi":"10.1109/VCIP.2013.6706348","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706348","url":null,"abstract":"Most existing 3D image quality metrics use 2D image quality assessment (IQA) models to predict the 3D subjective quality. But in a free viewpoint television (FTV) system, the depth map errors often produce object shifting or ghost artifacts on the synthesized pictures due to the use of Depth Image Based Rendering (DIBR) technique. These artifacts are very different from the ordinary 2D distortions such as blur, Gaussian noise, and compression errors. We thus propose a new 3D quality metric to evaluate the quality of stereo images that may contain artifacts introduced by the rendering process due to depth map errors. We first eliminate the consistent pixel shifts inside an object before the usual 2D metric is applied. The experimental results show that the proposed method enhances the correlation of the objective quality score to the 3D subjective scores.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115561341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706418
Min Zhang, Jin Xie, Xiang Zhou, H. Fujita
Multimedia, including audio, image and video, etc, is a ubiquitous part of modern life. Evaluations, both objective and subjective, are of fundamental importance for numerous multimedia applications. In this paper, based on statistics of local binary pattern (LBP), we propose a novel and efficient quality similarity index for no reference (NR) image quality assessment (IQA). First, with the Laplacian of Gaussian (LOG) filters, the image is decomposed into multi-scale sub-band images. Then, for these sub-band images across different scales, LBP maps are encoded and the LBP histograms are formed as the quality assessment concerning feature. Finally, by support vector regression (SVR), the extracted features are mapped to the image's subjective quality score for NR IQA. The experimental results on LIVE IQA database show that the proposed method is strongly related to subjective quality evaluations and competitive to most of the state-of-the-art NR IQA methods.
{"title":"No reference image quality assessment based on local binary pattern statistics","authors":"Min Zhang, Jin Xie, Xiang Zhou, H. Fujita","doi":"10.1109/VCIP.2013.6706418","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706418","url":null,"abstract":"Multimedia, including audio, image and video, etc, is a ubiquitous part of modern life. Evaluations, both objective and subjective, are of fundamental importance for numerous multimedia applications. In this paper, based on statistics of local binary pattern (LBP), we propose a novel and efficient quality similarity index for no reference (NR) image quality assessment (IQA). First, with the Laplacian of Gaussian (LOG) filters, the image is decomposed into multi-scale sub-band images. Then, for these sub-band images across different scales, LBP maps are encoded and the LBP histograms are formed as the quality assessment concerning feature. Finally, by support vector regression (SVR), the extracted features are mapped to the image's subjective quality score for NR IQA. The experimental results on LIVE IQA database show that the proposed method is strongly related to subjective quality evaluations and competitive to most of the state-of-the-art NR IQA methods.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126758396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706398
Licheng Yu, Yi Xu, Bo Zhang
Single image super-resolution (SR) is a severely unconstrained task. While the self-example-based methods are able to reproduce sharp edges, they perform poorly for textures. For recovering the fine details, higher-level image segmentation and corresponding external texture database are employed in the example-based SR methods, but they involve too much human interaction. In this paper, we discuss the existing problems of example-based technique using scale space analysis. Accordingly, a robust pixel classification method is designed based on the phase congruency model in scale space, which can effectively divide images into edges, textures and flat regions. Then a super-resolution framework is proposed, which can adaptively emphasize the importance of high-frequency residuals in structural examples and scale invariant fractal property in textural regions. Experimental results show that our SR approach is able to present both sharp edges and vivid textures with few artifacts.
{"title":"Single image super-resolution via phase congruency analysis","authors":"Licheng Yu, Yi Xu, Bo Zhang","doi":"10.1109/VCIP.2013.6706398","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706398","url":null,"abstract":"Single image super-resolution (SR) is a severely unconstrained task. While the self-example-based methods are able to reproduce sharp edges, they perform poorly for textures. For recovering the fine details, higher-level image segmentation and corresponding external texture database are employed in the example-based SR methods, but they involve too much human interaction. In this paper, we discuss the existing problems of example-based technique using scale space analysis. Accordingly, a robust pixel classification method is designed based on the phase congruency model in scale space, which can effectively divide images into edges, textures and flat regions. Then a super-resolution framework is proposed, which can adaptively emphasize the importance of high-frequency residuals in structural examples and scale invariant fractal property in textural regions. Experimental results show that our SR approach is able to present both sharp edges and vivid textures with few artifacts.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121585424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706432
Yuanbo Chen, Xin Guo
Description methods based on interest points and Bag-of-Words (BOW) model have gained remarkable success in human action recognition. Despite their popularity, the existing interest point detectors always come with high computational complexity and lose their power when camera is moving. Additionally, vector quantization procedure in BOW model ignores the relationship between bases and is always with large reconstruction errors. In this paper, a spatio-temporal interest point detector based on flow vorticity is used, which can not only suppress most effects of camera motion but also provide prominent interest points around key positions of the moving foreground. Besides, by combining non-negativity constraints of patterns and average pooling function, a Non-negative Locality-constrained Linear Coding (NLLC) model is introduced into action recognition to provide better features representation than the traditional BOW model. Experimental results on two widely used action datasets demonstrate the effectiveness of the proposed approach.
基于兴趣点和词袋模型的描述方法在人体动作识别中取得了显著的成功。现有的兴趣点检测器虽然很受欢迎,但计算复杂度高,且在摄像机移动时性能下降。此外,BOW模型的矢量量化过程忽略了基间的关系,重构误差较大。本文提出了一种基于流涡度的时空兴趣点检测器,该检测器不仅可以抑制摄像机运动的大部分影响,而且可以在运动前景的关键位置周围提供突出的兴趣点。结合模式的非负性约束和平均池化函数,将非负性位置约束线性编码(Non-negative Locality-constrained Linear Coding, NLLC)模型引入到动作识别中,以提供比传统BOW模型更好的特征表示。在两个广泛使用的动作数据集上的实验结果证明了该方法的有效性。
{"title":"Learning non-negative locality-constrained Linear Coding for human action recognition","authors":"Yuanbo Chen, Xin Guo","doi":"10.1109/VCIP.2013.6706432","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706432","url":null,"abstract":"Description methods based on interest points and Bag-of-Words (BOW) model have gained remarkable success in human action recognition. Despite their popularity, the existing interest point detectors always come with high computational complexity and lose their power when camera is moving. Additionally, vector quantization procedure in BOW model ignores the relationship between bases and is always with large reconstruction errors. In this paper, a spatio-temporal interest point detector based on flow vorticity is used, which can not only suppress most effects of camera motion but also provide prominent interest points around key positions of the moving foreground. Besides, by combining non-negativity constraints of patterns and average pooling function, a Non-negative Locality-constrained Linear Coding (NLLC) model is introduced into action recognition to provide better features representation than the traditional BOW model. Experimental results on two widely used action datasets demonstrate the effectiveness of the proposed approach.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114465662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706436
Martin Köppel, Mehdi Ben Makhlouf, Marcus Müller, P. Ndjiki-Nya
In this paper, a novel Depth Image-based Rendering (DIBR) method, which generates virtual views from a video sequence and its associated Depth Maps (DMs), is presented. The proposed approach is especially designed to close holes in extrapolation scenarios, where only one original camera is available or the virtual view is placed outside the range of a set of original cameras. In such scenarios, large image regions become uncovered in the virtual view and need to be filled in a visually pleasing way. In order to handle such disocclussions, a depth preprocessing method is proposed, which is applied prior to 3-D image warping. As a first step, adaptive cross-trilateral median filtering is used to align depth discontinuities in the DM to color discontinuities in the textured image and to further reduce estimation errors in the DM. Then, a temporally consistent and adaptive asymmetric smoothing filter is designed and subsequently applied to the DM. The filter is adaptively weighted in such a way that only the DM regions that may reveal uncovered areas are filtered. Thus, strong distortions in other parts of the virtual textured image are prevented. By smoothing the depth map image, objects are slightly distorted and disocclusions in the virtual view are completely or partially covered. The proposed method shows considerable objective and subjective gains compared to the state-of-the-art one.
{"title":"Temporally consistent adaptive depth map preprocessing for view synthesis","authors":"Martin Köppel, Mehdi Ben Makhlouf, Marcus Müller, P. Ndjiki-Nya","doi":"10.1109/VCIP.2013.6706436","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706436","url":null,"abstract":"In this paper, a novel Depth Image-based Rendering (DIBR) method, which generates virtual views from a video sequence and its associated Depth Maps (DMs), is presented. The proposed approach is especially designed to close holes in extrapolation scenarios, where only one original camera is available or the virtual view is placed outside the range of a set of original cameras. In such scenarios, large image regions become uncovered in the virtual view and need to be filled in a visually pleasing way. In order to handle such disocclussions, a depth preprocessing method is proposed, which is applied prior to 3-D image warping. As a first step, adaptive cross-trilateral median filtering is used to align depth discontinuities in the DM to color discontinuities in the textured image and to further reduce estimation errors in the DM. Then, a temporally consistent and adaptive asymmetric smoothing filter is designed and subsequently applied to the DM. The filter is adaptively weighted in such a way that only the DM regions that may reveal uncovered areas are filtered. Thus, strong distortions in other parts of the virtual textured image are prevented. By smoothing the depth map image, objects are slightly distorted and disocclusions in the virtual view are completely or partially covered. The proposed method shows considerable objective and subjective gains compared to the state-of-the-art one.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117346849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To improve the diagnostic accuracy of breast ultrasound classification, a novel computer-aided diagnosis (CAD) system based on B-Mode and color Doppler flow imaging is proposed. Several new features are modeled and extracted from the static images and color Doppler image sequences to study blood flow characteristics. Moreover, we proposed a novel classifier ensemble strategy for obtaining the benefit of mutual compensation of classifiers with different characteristics. Experimental results demonstrate that the proposed CAD system can improve the true-positive and decrease the false positive detection rate, which is useful for reducing the unnecessary biopsy and death rate.
{"title":"An effective computer aided diagnosis system using B-Mode and color Doppler flow imaging for breast cancer","authors":"Songbo Liu, Heng-Da Cheng, Yan Liu, Jianhua Huang, Yingtao Zhang, Xianglong Tang","doi":"10.1109/VCIP.2013.6706400","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706400","url":null,"abstract":"To improve the diagnostic accuracy of breast ultrasound classification, a novel computer-aided diagnosis (CAD) system based on B-Mode and color Doppler flow imaging is proposed. Several new features are modeled and extracted from the static images and color Doppler image sequences to study blood flow characteristics. Moreover, we proposed a novel classifier ensemble strategy for obtaining the benefit of mutual compensation of classifiers with different characteristics. Experimental results demonstrate that the proposed CAD system can improve the true-positive and decrease the false positive detection rate, which is useful for reducing the unnecessary biopsy and death rate.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123997382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/VCIP.2013.6706368
Jeffrey J. Micallef, R. Farrugia, C. J. Debono
The Distributed Video Coding (DVC) paradigm can theoretically reach the same coding efficiencies of predictive block-based video coding schemes, like H.264/AVC. However, current DVC architectures are still far from this ideal performance. This is mainly attributed to inaccuracies in the Side Information (SI) predicted at the decoder. The work in this paper presents a coding scheme which tries to avoid mismatch in the SI predictions caused by small variations in light intensity. Using the appropriate rounding operator for every coefficient, the proposed method significantly reduces the correlation noise between the Wyner-Ziv (WZ) frame and the corresponding SI, achieving higher coding efficiencies. Experimental results demonstrate that the average Peak Signal-to-Noise Ratio (PSNR) is improved by up to 0.56dB relative to the DISCOVER codec.
{"title":"Adaptive rounding operator for efficient Wyner-Ziv video coding","authors":"Jeffrey J. Micallef, R. Farrugia, C. J. Debono","doi":"10.1109/VCIP.2013.6706368","DOIUrl":"https://doi.org/10.1109/VCIP.2013.6706368","url":null,"abstract":"The Distributed Video Coding (DVC) paradigm can theoretically reach the same coding efficiencies of predictive block-based video coding schemes, like H.264/AVC. However, current DVC architectures are still far from this ideal performance. This is mainly attributed to inaccuracies in the Side Information (SI) predicted at the decoder. The work in this paper presents a coding scheme which tries to avoid mismatch in the SI predictions caused by small variations in light intensity. Using the appropriate rounding operator for every coefficient, the proposed method significantly reduces the correlation noise between the Wyner-Ziv (WZ) frame and the corresponding SI, achieving higher coding efficiencies. Experimental results demonstrate that the average Peak Signal-to-Noise Ratio (PSNR) is improved by up to 0.56dB relative to the DISCOVER codec.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123185113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}