Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702532
H. M. Mohammed, Nikolaus Färber
Hierarchical B-Frames (HBF) has emerged as an efficient video coding tool in recent years. As shown in the literature, this approach results in excellent PSNR gains of >1 dB. However these PSNR gains are not sufficiently assessed in a scientific manner by subjective tests. Hence in this paper, we evaluate HBF coding pattern subjectively by using the MUSHRA test methodology. While MUSHRA is well established in audio coding research, its application to video is a novelty of this paper. We compare HBF with simple IPP coding pattern at either same PSNR or same bit rate. Our results indicate that, HBF gains are clearly subjectively perceptible. Hence, it can be shown that PSNR gains also correlate with a subjective gain. Interestingly, even at same PSNR, HBF is found to be subjectively superior to simple IPP coding.
{"title":"Subjective evaluation of Hierarchical B-Frames using Video-MUSHRA","authors":"H. M. Mohammed, Nikolaus Färber","doi":"10.1109/PCS.2010.5702532","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702532","url":null,"abstract":"Hierarchical B-Frames (HBF) has emerged as an efficient video coding tool in recent years. As shown in the literature, this approach results in excellent PSNR gains of >1 dB. However these PSNR gains are not sufficiently assessed in a scientific manner by subjective tests. Hence in this paper, we evaluate HBF coding pattern subjectively by using the MUSHRA test methodology. While MUSHRA is well established in audio coding research, its application to video is a novelty of this paper. We compare HBF with simple IPP coding pattern at either same PSNR or same bit rate. Our results indicate that, HBF gains are clearly subjectively perceptible. Hence, it can be shown that PSNR gains also correlate with a subjective gain. Interestingly, even at same PSNR, HBF is found to be subjectively superior to simple IPP coding.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121406101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702509
S. Klomp, Marco Munderloh, J. Ostermann
Current video coding standards perform motion estimation at the encoder to predict frames prior to coding them. Since the decoder does not possess the source frames, the estimated motion vectors have to be transmitted as additional side information. Recent research revealed that the data rate can be reduced by performing an additional motion estimation at the decoder. As only already decoded data is used, no additional data has to be transmitted. This paper addresses an improved hierarchical motion estimation algorithm to be used in a decoder-side motion estimation system. A special motion vector latching is used to be more robust for very small block sizes and to better adapt to object borders. With this technique, a dense motion vector field is estimated which reduces the rate by 6.9% in average compared to H.264 / AVC at the same quality.
{"title":"Decoder-side hierarchical motion estimation for dense vector fields","authors":"S. Klomp, Marco Munderloh, J. Ostermann","doi":"10.1109/PCS.2010.5702509","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702509","url":null,"abstract":"Current video coding standards perform motion estimation at the encoder to predict frames prior to coding them. Since the decoder does not possess the source frames, the estimated motion vectors have to be transmitted as additional side information. Recent research revealed that the data rate can be reduced by performing an additional motion estimation at the decoder. As only already decoded data is used, no additional data has to be transmitted. This paper addresses an improved hierarchical motion estimation algorithm to be used in a decoder-side motion estimation system. A special motion vector latching is used to be more robust for very small block sizes and to better adapt to object borders. With this technique, a dense motion vector field is estimated which reduces the rate by 6.9% in average compared to H.264 / AVC at the same quality.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123385004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702511
Yoshinori Suzuki, C. Boon
This paper presents an efficient forward inter prediction method for video coding, targeting at low delay applications. The method applies the idea of template matching averaging (TMA) to the conventional motion compensated prediction (MCP). TMA forms the final predictor of a target block by averaging multiple numbers of candidates. While one of the candidate is specified by a motion vector, the remaining candidates are obtained based on the minimal matching error of a group of reconstructed pixels surrounding the target block, i.e., the template, against the reference frames. In this manner, additional predictors can be obtained without using explicit motion vector. In addition, the averaging of multiple predictors reduces coding noise residing in each of the predictors and hence contributes to improving the prediction efficiency. Simulation results show that the proposed scheme improves coding efficiency up to 4.5%, over the conventional MCP, without incurring coding delay due to backward prediction.
{"title":"An improved low delay inter frame coding using template matching averaging","authors":"Yoshinori Suzuki, C. Boon","doi":"10.1109/PCS.2010.5702511","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702511","url":null,"abstract":"This paper presents an efficient forward inter prediction method for video coding, targeting at low delay applications. The method applies the idea of template matching averaging (TMA) to the conventional motion compensated prediction (MCP). TMA forms the final predictor of a target block by averaging multiple numbers of candidates. While one of the candidate is specified by a motion vector, the remaining candidates are obtained based on the minimal matching error of a group of reconstructed pixels surrounding the target block, i.e., the template, against the reference frames. In this manner, additional predictors can be obtained without using explicit motion vector. In addition, the averaging of multiple predictors reduces coding noise residing in each of the predictors and hence contributes to improving the prediction efficiency. Simulation results show that the proposed scheme improves coding efficiency up to 4.5%, over the conventional MCP, without incurring coding delay due to backward prediction.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121250360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702582
Kazuma Shinoda, Y. Kosugi, Y. Murakami, Masahiro Yamaguchi, N. Ohyama
In the compression technique of hyperspectral image (HSI), PSNR of the reconstructed image is usually used for evaluating the performance of the coding results. For the spectral analysis applications of HSI, it is also important to consider the error in the result of spectral analysis. In the vegetation analysis, for example, the distortion of the vegetation index should be considered in addition to the distortion in the spectral data. This paper presents a HSI compression considering the error of both vegetation index and spectral data. The proposed method separates a hyperspectral data into spectral data for vegetation index and residual data. Both of the data are encoded by using a seamless coding individually. By holding the spectral channels required for vegetation index in the head of the code-stream, a precise vegetation analysis can be done in a low bit rate. Additionally, by decoding the residual data, the spectral data can be reconstructed in low distortion.
{"title":"Hyperspectral image compression suitable for spectral analysis application","authors":"Kazuma Shinoda, Y. Kosugi, Y. Murakami, Masahiro Yamaguchi, N. Ohyama","doi":"10.1109/PCS.2010.5702582","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702582","url":null,"abstract":"In the compression technique of hyperspectral image (HSI), PSNR of the reconstructed image is usually used for evaluating the performance of the coding results. For the spectral analysis applications of HSI, it is also important to consider the error in the result of spectral analysis. In the vegetation analysis, for example, the distortion of the vegetation index should be considered in addition to the distortion in the spectral data. This paper presents a HSI compression considering the error of both vegetation index and spectral data. The proposed method separates a hyperspectral data into spectral data for vegetation index and residual data. Both of the data are encoded by using a seamless coding individually. By holding the spectral channels required for vegetation index in the head of the code-stream, a precise vegetation analysis can be done in a low bit rate. Additionally, by decoding the residual data, the spectral data can be reconstructed in low distortion.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116910460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702504
D. Aliprandi, E. Piccinelli
In this paper we describe an image-based rendering pipeline for interactive real-time Free Viewpoint Television (FTV) on embedded systems. Description of the processing steps and optimizations implemented targeting the hardware acceleration of a commercial programmable Graphics Processing Unit (GPU) is given. As a result, real-time view synthesis at 70 fps in XGA resolution has bee achieved. Restrictions and modifications introduced to support the application on OpenGL ES 2.0 based GPUs for embedded systems have also been discussed.
本文描述了一种用于嵌入式系统交互式实时自由视点电视(FTV)的基于图像的渲染管道。描述了针对商用可编程图形处理单元(GPU)硬件加速的处理步骤和实现的优化。因此,在XGA分辨率下实现了70 fps的实时视图合成。本文还讨论了为支持基于OpenGL ES 2.0的嵌入式系统gpu的应用而引入的限制和修改。
{"title":"Real-time Free Viewpoint Television for embedded systems","authors":"D. Aliprandi, E. Piccinelli","doi":"10.1109/PCS.2010.5702504","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702504","url":null,"abstract":"In this paper we describe an image-based rendering pipeline for interactive real-time Free Viewpoint Television (FTV) on embedded systems. Description of the processing steps and optimizations implemented targeting the hardware acceleration of a commercial programmable Graphics Processing Unit (GPU) is given. As a result, real-time view synthesis at 70 fps in XGA resolution has bee achieved. Restrictions and modifications introduced to support the application on OpenGL ES 2.0 based GPUs for embedded systems have also been discussed.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114271705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702520
Taiga Muromoto, N. Sagara, K. Sugiyama
Inter-picture prediction uses a local decoded picture for the reference, in order to avoid a mismatch between encoding and decoding. However, this scheme does not necessarily result in optimal coding efficiency since it requires encoding the processing altogether. Therefore, we study the use of the original picture as the reference. In this case, although the mismatch causes degradation of the picture quality, the bit amount is reduced. If optimal encoding is used at each macroblock, the overall performance may be improved. Therefore, we propose an adaptive method based on rate distortion optimization. The original picture is used only in the macroblock, if it is lower cost than the local decoded picture is used. Experimental results show a 0.1 to 1.0 dB gain in PSNR in each sequence. The adaptive method is shown to work successfully and the coding performance is improved without side effects.
{"title":"Video encoding with the original picture as the reference picture","authors":"Taiga Muromoto, N. Sagara, K. Sugiyama","doi":"10.1109/PCS.2010.5702520","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702520","url":null,"abstract":"Inter-picture prediction uses a local decoded picture for the reference, in order to avoid a mismatch between encoding and decoding. However, this scheme does not necessarily result in optimal coding efficiency since it requires encoding the processing altogether. Therefore, we study the use of the original picture as the reference. In this case, although the mismatch causes degradation of the picture quality, the bit amount is reduced. If optimal encoding is used at each macroblock, the overall performance may be improved. Therefore, we propose an adaptive method based on rate distortion optimization. The original picture is used only in the macroblock, if it is lower cost than the local decoded picture is used. Experimental results show a 0.1 to 1.0 dB gain in PSNR in each sequence. The adaptive method is shown to work successfully and the coding performance is improved without side effects.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122304983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702465
T. Mishina
Integral Photography (IP) is a photographic technique in which a lens array consisting of a large number of tiny lenses is used to capture and display three-dimensional (3D) images [1]. The displayed 3D images are optical real images, which give a natural 3D feeling in principle, without special viewing glasses. This is considered to be a suitable 3D display method for future 3D television systems. This paper describes an integral 3D television system in which Super Hi-Vision [2] is applied to IP.
{"title":"3D television system based on Integral Photography","authors":"T. Mishina","doi":"10.1109/PCS.2010.5702465","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702465","url":null,"abstract":"Integral Photography (IP) is a photographic technique in which a lens array consisting of a large number of tiny lenses is used to capture and display three-dimensional (3D) images [1]. The displayed 3D images are optical real images, which give a natural 3D feeling in principle, without special viewing glasses. This is considered to be a suitable 3D display method for future 3D television systems. This paper describes an integral 3D television system in which Super Hi-Vision [2] is applied to IP.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702505
Dongming Xue, Akira Kubota, Y. Hatori
Many existing multi-view video coding techniques remove inter-viewpoint redundancy by applying disparity compensation in conventional video coding frameworks, e.g., H264/MPEG4. However, conventional methodology works ineffectively as they ignore the special features of the inter-view-point disparity. This paper proposes a framework using virtual plane (VP) [1] for multi-view image compression, such that we can largely reduce the disparity compensation cost. Based on this VP predictor, we design a poxel[2] (probabilistic voxelized volume) framework, which integrates the information of the cameras in different view-points in the polar axis to obtain a more effective compression performance. In addition, considering the replay convenience of the multi-view video at the receiving side, we reform overhead information in polar axis at the sending side in advance.
{"title":"A high efficiency coding framework for multiple image compression of circular camera array","authors":"Dongming Xue, Akira Kubota, Y. Hatori","doi":"10.1109/PCS.2010.5702505","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702505","url":null,"abstract":"Many existing multi-view video coding techniques remove inter-viewpoint redundancy by applying disparity compensation in conventional video coding frameworks, e.g., H264/MPEG4. However, conventional methodology works ineffectively as they ignore the special features of the inter-view-point disparity. This paper proposes a framework using virtual plane (VP) [1] for multi-view image compression, such that we can largely reduce the disparity compensation cost. Based on this VP predictor, we design a poxel[2] (probabilistic voxelized volume) framework, which integrates the information of the cameras in different view-points in the polar axis to obtain a more effective compression performance. In addition, considering the replay convenience of the multi-view video at the receiving side, we reform overhead information in polar axis at the sending side in advance.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126799005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702549
T. Yoshino, S. Naito, S. Sakazawa, S. Matsumoto
To improve high resolution video coding efficiency under low bit-rate condition, an appropriate coding mode is required from an R-D optimization (RDO) perspective, although a coding mode defined within the H.264 standard is not always optimal for RDO criteria. With this in mind, we previously proposed extended SKIP modes with close-to-optimal R-D characteristics. However, the additional modes did not always satisfy the optimal R-D characteristics, especially for low bit-rate coding. In this paper, we propose an enhanced coding mode capable of providing a candidate corresponding to the minimum R-D cost by controlling the residual signal associated with the extended SKIP mode. The experimental result showed that the PSNR improvement against H.264 and our previous approach reached 0.42 dB and 0.24 dB in the maximum case, respectively.
{"title":"Complementary coding mode design based on R-D cost minimization for extending H.264 coding technology","authors":"T. Yoshino, S. Naito, S. Sakazawa, S. Matsumoto","doi":"10.1109/PCS.2010.5702549","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702549","url":null,"abstract":"To improve high resolution video coding efficiency under low bit-rate condition, an appropriate coding mode is required from an R-D optimization (RDO) perspective, although a coding mode defined within the H.264 standard is not always optimal for RDO criteria. With this in mind, we previously proposed extended SKIP modes with close-to-optimal R-D characteristics. However, the additional modes did not always satisfy the optimal R-D characteristics, especially for low bit-rate coding. In this paper, we propose an enhanced coding mode capable of providing a candidate corresponding to the minimum R-D cost by controlling the residual signal associated with the extended SKIP mode. The experimental result showed that the PSNR improvement against H.264 and our previous approach reached 0.42 dB and 0.24 dB in the maximum case, respectively.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114469198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/PCS.2010.5702493
G. Tech, K. Müller, T. Wiegand
A method for removing irrelevant information from depth maps in Video plus Depth coding is presented. The depth map is filtered in several iterations using a diffusional approach. In each iteration smoothing is carried out in local sample neighborhoods considering the distortion introduced to a rendered view. Smoothing is only applied when the rendered view is not affected. Therefore irrelevant edges and features in the depth map can be damped while the quality of the rendered view is retained. The processed depth maps can be coded at a reduced rate compared to unaltered data. Coding experiments show gains up to 0.5dB for the rendered view at the same bit rate.
{"title":"Diffusion filtering of depth maps in stereo video coding","authors":"G. Tech, K. Müller, T. Wiegand","doi":"10.1109/PCS.2010.5702493","DOIUrl":"https://doi.org/10.1109/PCS.2010.5702493","url":null,"abstract":"A method for removing irrelevant information from depth maps in Video plus Depth coding is presented. The depth map is filtered in several iterations using a diffusional approach. In each iteration smoothing is carried out in local sample neighborhoods considering the distortion introduced to a rendered view. Smoothing is only applied when the rendered view is not affected. Therefore irrelevant edges and features in the depth map can be damped while the quality of the rendered view is retained. The processed depth maps can be coded at a reduced rate compared to unaltered data. Coding experiments show gains up to 0.5dB for the rendered view at the same bit rate.","PeriodicalId":255142,"journal":{"name":"28th Picture Coding Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129433182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}