Pub Date : 2014-12-07DOI: 10.1109/VCIP.2014.7051614
S. Reel, Kam Cheung Patrick Wong, Gene Cheung, L. Dooley
Transmitting texture and depth images of captured camera view(s) of a 3D scene enables a receiver to synthesize novel virtual viewpoint images via Depth-Image-Based Rendering (DIBR). However, a DIBR-synthesized image often contains disocclusion holes, which are spatial regions in the virtual view image that were occluded by foreground objects in the captured camera view(s). In this paper, we propose to complete these disocclusion holes by exploiting the self-similarity characteristic of natural images via nonlocal template-matching (TM). Specifically, we first define self-similarity as nonlocal recurrences of pixel patches within the same image across different scales-one characterization of self-similarity in a given image is the scale range in which these patch recurrences take place. Then, at encoder we segment an image into multiple depth layers using available per-pixel depth values, and characterize self-similarity in each layer with a scale range; scale ranges for all layers are transmitted as side information to the decoder. At decoder, disocclusion holes are completed via TM on a per-layer basis by searching for similar patches within the designated scale range. Experimental results show that our method improves the quality of rendered images over previous disocclusion hole-filling algorithms by up to 3.9dB in PSNR.
通过传输三维场景的纹理和深度图像,接收器可以通过深度图像渲染(deep - image - based Rendering, DIBR)合成新的虚拟视点图像。然而,dibr合成的图像通常包含消光洞,这是虚拟视图图像中被捕获的相机视图中前景物体遮挡的空间区域。在本文中,我们提出利用非局部模板匹配(non - local template-matching, TM)来利用自然图像的自相似特性来弥补这些错位。具体来说,我们首先将自相似性定义为同一图像中不同尺度像素块的非局部递归——给定图像中自相似性的一个特征是这些斑块递归发生的尺度范围。然后,在编码器中,我们使用可用的每像素深度值将图像分割成多个深度层,并使用尺度范围表征每层的自相似性;所有层的尺度范围作为侧信息传输到解码器。在解码器处,通过TM逐层搜索指定尺度范围内的相似斑块,完成咬合孔。实验结果表明,我们的方法在PSNR上比之前的去咬合填充算法提高了3.9dB。
{"title":"Disocclusion hole-filling in DIBR-synthesized images using multi-scale template matching","authors":"S. Reel, Kam Cheung Patrick Wong, Gene Cheung, L. Dooley","doi":"10.1109/VCIP.2014.7051614","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051614","url":null,"abstract":"Transmitting texture and depth images of captured camera view(s) of a 3D scene enables a receiver to synthesize novel virtual viewpoint images via Depth-Image-Based Rendering (DIBR). However, a DIBR-synthesized image often contains disocclusion holes, which are spatial regions in the virtual view image that were occluded by foreground objects in the captured camera view(s). In this paper, we propose to complete these disocclusion holes by exploiting the self-similarity characteristic of natural images via nonlocal template-matching (TM). Specifically, we first define self-similarity as nonlocal recurrences of pixel patches within the same image across different scales-one characterization of self-similarity in a given image is the scale range in which these patch recurrences take place. Then, at encoder we segment an image into multiple depth layers using available per-pixel depth values, and characterize self-similarity in each layer with a scale range; scale ranges for all layers are transmitted as side information to the decoder. At decoder, disocclusion holes are completed via TM on a per-layer basis by searching for similar patches within the designated scale range. Experimental results show that our method improves the quality of rendered images over previous disocclusion hole-filling algorithms by up to 3.9dB in PSNR.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116636554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-07DOI: 10.1109/VCIP.2014.7051515
P. Reel, L. Dooley, Kam Cheung Patrick Wong, A. Börner
Images having either the same or different modalities can be aligned using the systematic process of image registration. Inherent image characteristics including intensity non-uniformities in magnetic resonance images and large homogeneous non-vascular regions in retinal and other generic image types however, pose a significant challenge to their registration. This paper presents an adaptive expectation maximisation for principal component analysis with mutual information (aEMPCA-MI) similarity measure for image registration. It introduces a novel iterative process to adaptively select the most significant principal components using Kaiser rule and applies 4-pixel connectivity for feature extraction together with Wichard's bin size selection in calculating the MI. Both quantitative and qualitative results on a diverse range of image datasets, conclusively demonstrate the superior image registration performance of aEMPCA-MI compared with existing Mi-based similarity measures.
{"title":"Robust image registration using adaptive expectation maximisation based PCA","authors":"P. Reel, L. Dooley, Kam Cheung Patrick Wong, A. Börner","doi":"10.1109/VCIP.2014.7051515","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051515","url":null,"abstract":"Images having either the same or different modalities can be aligned using the systematic process of image registration. Inherent image characteristics including intensity non-uniformities in magnetic resonance images and large homogeneous non-vascular regions in retinal and other generic image types however, pose a significant challenge to their registration. This paper presents an adaptive expectation maximisation for principal component analysis with mutual information (aEMPCA-MI) similarity measure for image registration. It introduces a novel iterative process to adaptively select the most significant principal components using Kaiser rule and applies 4-pixel connectivity for feature extraction together with Wichard's bin size selection in calculating the MI. Both quantitative and qualitative results on a diverse range of image datasets, conclusively demonstrate the superior image registration performance of aEMPCA-MI compared with existing Mi-based similarity measures.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128406367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-07DOI: 10.1109/VCIP.2014.7051507
A. Arrufat, P. Philippe, O. Déforges
State of the art video coders are based on prediction and transform coding. The transform decorrelates the signal to achieve high compression levels. In this paper we propose improving the performances of the latest video coding standard, HEVC, by adding a set of rate-distortion optimised transforms (RDOTs). The transform design is based upon a cost function that incorporates a bit rate constraint. These new RDOTs compete against classical HEVC transforms in the rate-distortion optimisation (RDO) loop in the same way as prediction modes and block sizes, providing additional coding possibilities. Reductions in BD-rate of around 2% are demonstrated when making these transforms available in HEVC.
{"title":"Rate-distortion optimised transform competition for intra coding in HEVC","authors":"A. Arrufat, P. Philippe, O. Déforges","doi":"10.1109/VCIP.2014.7051507","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051507","url":null,"abstract":"State of the art video coders are based on prediction and transform coding. The transform decorrelates the signal to achieve high compression levels. In this paper we propose improving the performances of the latest video coding standard, HEVC, by adding a set of rate-distortion optimised transforms (RDOTs). The transform design is based upon a cost function that incorporates a bit rate constraint. These new RDOTs compete against classical HEVC transforms in the rate-distortion optimisation (RDO) loop in the same way as prediction modes and block sizes, providing additional coding possibilities. Reductions in BD-rate of around 2% are demonstrated when making these transforms available in HEVC.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125613878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-07DOI: 10.1109/VCIP.2014.7051556
Khouloud Samrouth, O. Déforges, Yi Liu, W. Falou, Mohamad Khalil
Along with the digital evolution, image post-production and indexing have become one of the most advanced and desired services in the lossless 3D image domain. The 3D context provides a significant gain in terms of semantics for scene representation. However, it also induces many drawbacks including monitoring visual degradation of compressed 3D image (especially upon edges), and increased complexity for scene representation. In this paper, we propose a semantic region representation and a scalable coding scheme. First, the semantic region representation scheme is based on a low resolution version of the 3D image. It provides the possibility to segment the image according to a desirable balance between 2D and depth. Second, the scalable coding scheme consists in selecting a number of regions as a Region of Interest (RoI), based on the region representation, in order to be refined at a higher bitrate. Experiments show that the proposed scheme provides a high coherence between texture, depth and regions and ensures an efficient solution to the problems of compression and scene representation in the 3D image domain.
{"title":"A joint 3D image semantic segmentation and scalable coding scheme with ROI approach","authors":"Khouloud Samrouth, O. Déforges, Yi Liu, W. Falou, Mohamad Khalil","doi":"10.1109/VCIP.2014.7051556","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051556","url":null,"abstract":"Along with the digital evolution, image post-production and indexing have become one of the most advanced and desired services in the lossless 3D image domain. The 3D context provides a significant gain in terms of semantics for scene representation. However, it also induces many drawbacks including monitoring visual degradation of compressed 3D image (especially upon edges), and increased complexity for scene representation. In this paper, we propose a semantic region representation and a scalable coding scheme. First, the semantic region representation scheme is based on a low resolution version of the 3D image. It provides the possibility to segment the image according to a desirable balance between 2D and depth. Second, the scalable coding scheme consists in selecting a number of regions as a Region of Interest (RoI), based on the region representation, in order to be refined at a higher bitrate. Experiments show that the proposed scheme provides a high coherence between texture, depth and regions and ensures an efficient solution to the problems of compression and scene representation in the 3D image domain.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"11 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114002988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-07DOI: 10.1109/VCIP.2014.7051504
A. Arrufat, P. Philippe, O. Déforges
Transform coding plays a crucial role in video coders. Recently, additional transforms based on the DST and the DCT have been included in the latest video coding standard, HEVC. Those transforms were introduced after a thoroughly analysis of the video signal properties. In this paper, we design additional transforms by using an alternative learning approach. The appropriateness of the design over the classical KLT learning is also shown. Subsequently, the additional designed transforms are applied to the latest HEVC scheme. Results show that coding performance is improved compared to the standard. Additional results show that the coding performance can be significantly further improved by using non-separable transforms. Bitrate reductions in the range of 2% over HEVC are achieved with those proposed transforms.
{"title":"Non-separable mode dependent transforms for intra coding in HEVC","authors":"A. Arrufat, P. Philippe, O. Déforges","doi":"10.1109/VCIP.2014.7051504","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051504","url":null,"abstract":"Transform coding plays a crucial role in video coders. Recently, additional transforms based on the DST and the DCT have been included in the latest video coding standard, HEVC. Those transforms were introduced after a thoroughly analysis of the video signal properties. In this paper, we design additional transforms by using an alternative learning approach. The appropriateness of the design over the classical KLT learning is also shown. Subsequently, the additional designed transforms are applied to the latest HEVC scheme. Results show that coding performance is improved compared to the standard. Additional results show that the coding performance can be significantly further improved by using non-separable transforms. Bitrate reductions in the range of 2% over HEVC are achieved with those proposed transforms.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131922137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051497
M. Takagi, H. Fujii, A. Shimizu
In this paper, we propose a method of estimating subjective video quality with various spatial and temporal resolutions without encoding. Under a given bitrate constraint, the combination of resolution and frame rate that provides best subjective video quality depends on the video content. To maximize subjective video quality, several studies have proposed models that can estimate subjective quality with various resolutions and frame rates. However, to determine the optimal resolution and frame rate that maximize subjective video quality, it is necessary to estimate subjective video quality at each combination of resolution/frame rate/bitrate. This takes considerable time with previously reported methods because they require an encoding process for decoding videos or obtaining pre analysis. To address this issue, we developed a method that does not require an encoding process to estimate subjective video quality.
{"title":"Optimized spatial and temporal resolution based on subjective quality estimation without encoding","authors":"M. Takagi, H. Fujii, A. Shimizu","doi":"10.1109/VCIP.2014.7051497","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051497","url":null,"abstract":"In this paper, we propose a method of estimating subjective video quality with various spatial and temporal resolutions without encoding. Under a given bitrate constraint, the combination of resolution and frame rate that provides best subjective video quality depends on the video content. To maximize subjective video quality, several studies have proposed models that can estimate subjective quality with various resolutions and frame rates. However, to determine the optimal resolution and frame rate that maximize subjective video quality, it is necessary to estimate subjective video quality at each combination of resolution/frame rate/bitrate. This takes considerable time with previously reported methods because they require an encoding process for decoding videos or obtaining pre analysis. To address this issue, we developed a method that does not require an encoding process to estimate subjective video quality.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116958530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051545
Xin Deng, Mai Xu, Shengxi Li, Zulin Wang
In this paper, we present a novel complexity control method of HEVC to adjust its encoding complexity. First, a region-of-interest (ROI) attention model is established, which defines different weights for various regions according to their importance. Then, the complexity control algorithm is proposed with a distortion-complexity optimization model, to determine the maximum depth of the largest coding units (LCUs) according to their weights. We can reduce the encoding complexity to a given target level at the cost of little distortion loss. Finally, the experimental results show that the encoding complexity can drop to a pre-defined target complexity as low as 20% with bias less than 7%. Meanwhile, our method is verified to preserve the quality of ROI better than another state-of-the-art approach.
{"title":"Complexity control of HEVC based on region-of-interest attention model","authors":"Xin Deng, Mai Xu, Shengxi Li, Zulin Wang","doi":"10.1109/VCIP.2014.7051545","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051545","url":null,"abstract":"In this paper, we present a novel complexity control method of HEVC to adjust its encoding complexity. First, a region-of-interest (ROI) attention model is established, which defines different weights for various regions according to their importance. Then, the complexity control algorithm is proposed with a distortion-complexity optimization model, to determine the maximum depth of the largest coding units (LCUs) according to their weights. We can reduce the encoding complexity to a given target level at the cost of little distortion loss. Finally, the experimental results show that the encoding complexity can drop to a pre-defined target complexity as low as 20% with bias less than 7%. Meanwhile, our method is verified to preserve the quality of ROI better than another state-of-the-art approach.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"418 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127076880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051543
M. Bätz, Andrea Eichenseer, Markus Jonscher, Jürgen Seiler, André Kaup
Increasing the spatial resolution is an ongoing research topic in image processing. A recently presented approach applies a non-regular sampling mask on a low resolution sensor and subsequently reconstructs the masked area via an extrapolation algorithm to obtain a high resolution image. This paper introduces an acceleration of this approach for use with full color sensors. Instead of employing the effective, yet computationally expensive extrapolation algorithm on each of the three RGB channels, a color space conversion is performed and only the luminance channel is then reconstructed using this algorithm. As natural images contain much less information in the chrominance channels, a fast linear interpolation technique can here be used to accelerate the whole reconstruction procedure. Simulation results show that an average speed up factor of 2.9 is thus achieved, while the loss in visual quality stays imperceptible. Comparisons of PSNR results confirm this.
{"title":"Accelerated hybrid image reconstruction for non-regular sampling color sensors","authors":"M. Bätz, Andrea Eichenseer, Markus Jonscher, Jürgen Seiler, André Kaup","doi":"10.1109/VCIP.2014.7051543","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051543","url":null,"abstract":"Increasing the spatial resolution is an ongoing research topic in image processing. A recently presented approach applies a non-regular sampling mask on a low resolution sensor and subsequently reconstructs the masked area via an extrapolation algorithm to obtain a high resolution image. This paper introduces an acceleration of this approach for use with full color sensors. Instead of employing the effective, yet computationally expensive extrapolation algorithm on each of the three RGB channels, a color space conversion is performed and only the luminance channel is then reconstructed using this algorithm. As natural images contain much less information in the chrominance channels, a fast linear interpolation technique can here be used to accelerate the whole reconstruction procedure. Simulation results show that an average speed up factor of 2.9 is thus achieved, while the loss in visual quality stays imperceptible. Comparisons of PSNR results confirm this.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126196543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051540
Xin Ye, Dandan Ding, Lu Yu
The flexible coding structure in High Efficiency Video Coding (HEVC) introduces many challenges to real-time implementation of the integer-pel motion estimation (IME). In this paper, a hardware-oriented IME algorithm naming parallel clustering tree search (PCTS) is proposed, where various prediction units (PU) are processed simultaneously with a parallel scheme. The PCTS consists of four hierarchical search steps. After each search step, PUs with the same MV candidate are clustered to one group. And the next search step is shared by PUs in the same group. Owing to the top-down tree-structure search strategy of the PCTS, search processes are highly shared among different PUs and system throughput is thus significantly increased. As a result, the hardware implementation based on the proposed algorithm can support real-time video applications of QFHD (3840×2160) at 30fps.
{"title":"A hardware-oriented IME algorithm and its implementation for HEVC","authors":"Xin Ye, Dandan Ding, Lu Yu","doi":"10.1109/VCIP.2014.7051540","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051540","url":null,"abstract":"The flexible coding structure in High Efficiency Video Coding (HEVC) introduces many challenges to real-time implementation of the integer-pel motion estimation (IME). In this paper, a hardware-oriented IME algorithm naming parallel clustering tree search (PCTS) is proposed, where various prediction units (PU) are processed simultaneously with a parallel scheme. The PCTS consists of four hierarchical search steps. After each search step, PUs with the same MV candidate are clustered to one group. And the next search step is shared by PUs in the same group. Owing to the top-down tree-structure search strategy of the PCTS, search processes are highly shared among different PUs and system throughput is thus significantly increased. As a result, the hardware implementation based on the proposed algorithm can support real-time video applications of QFHD (3840×2160) at 30fps.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125390856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/VCIP.2014.7051589
Xuguang Zuo, Lu Yu
Scalable high efficiency video coding (SHVC) is now being developed by the Joint Collaborative Team on Video Coding (JCT-VC). In SHVC, the enhancement layer (EL) employs the same tree structured coding unit (CU) and 35 intra prediction modes as the base layer (BL), which results in heavy computation load. To speed up the mode decision process in the EL, the correlations of the CU depth and intra prediction modes between the BL and the EL are exploited in this paper. Based on the correlations an EL CU depth early skip algorithm and a fast intra prediction mode decision algorithm are proposed for all intra spatial scalability. Experimental results show that 45.3% and 42.3% coding time of the EL can be saved in AH Intra 1.5× spatial scalability and 2× spatial scalability respectively. In the meantime, the R-D performance degraded less than 0.05% compared with SHVC Test Model (SHM) 5.0.
可扩展高效视频编码(SHVC)目前正在由视频编码联合协作小组(JCT-VC)开发。在SHVC中,增强层(EL)与基础层(BL)使用相同的树结构编码单元(CU)和35种内预测模式,导致计算量很大。为了加快EL中的模式决策过程,本文利用了BL和EL之间的CU深度和内部预测模式的相关性。在此基础上,提出了一种具有空间可扩展性的EL - CU深度早期跳过算法和快速预测模式决策算法。实验结果表明,在AH Intra 1.5倍空间可扩展性和2倍空间可扩展性下,EL编码时间分别节省45.3%和42.3%。与SHVC测试模型(SHM) 5.0相比,R-D性能下降幅度小于0.05%。
{"title":"Fast mode decision method for all intra spatial scalability in SHVC","authors":"Xuguang Zuo, Lu Yu","doi":"10.1109/VCIP.2014.7051589","DOIUrl":"https://doi.org/10.1109/VCIP.2014.7051589","url":null,"abstract":"Scalable high efficiency video coding (SHVC) is now being developed by the Joint Collaborative Team on Video Coding (JCT-VC). In SHVC, the enhancement layer (EL) employs the same tree structured coding unit (CU) and 35 intra prediction modes as the base layer (BL), which results in heavy computation load. To speed up the mode decision process in the EL, the correlations of the CU depth and intra prediction modes between the BL and the EL are exploited in this paper. Based on the correlations an EL CU depth early skip algorithm and a fast intra prediction mode decision algorithm are proposed for all intra spatial scalability. Experimental results show that 45.3% and 42.3% coding time of the EL can be saved in AH Intra 1.5× spatial scalability and 2× spatial scalability respectively. In the meantime, the R-D performance degraded less than 0.05% compared with SHVC Test Model (SHM) 5.0.","PeriodicalId":166978,"journal":{"name":"2014 IEEE Visual Communications and Image Processing Conference","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116420267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}