Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456290
Hossam Amer, Abdullah M. Rashwan, E. Yang
High Efficiency Video Coding (HEVC) improves rate distortion (RD) performance significantly, but at the same time is computationally expensive due to the adoption of a large variety of coding unit (CU) sizes in its RD optimization. In this paper, we investigate the application of fully connected neural networks (NNs) to this time-sensitive application to improve its time complexity, while controlling the resulting bitrate loss. Specifically, four NNs are introduced with one NN for each depth of the coding tree unit. These NNs either split the current CU or terminate the CU search algorithm. Because training of NNs is time-consuming and requires large training data, we further propose a novel training strategy in which offline training and online adaptation work together to overcome this limitation. Our features are extracted from original frames based on the Laplacian Transparent Composite Model (LPTCM). Experiments carried out on all-intra configuration for HEVC reveal that our method is among the best NN methods, with an average time saving of 38% and an average controlled bitrate loss of 1.6%, compared to original HEVC.
高效视频编码(High Efficiency Video Coding, HEVC)可以显著提高码率失真(rate distortion, RD)性能,但同时由于在码率失真(rate distortion, RD)优化中采用了多种不同的编码单元(Coding unit, CU)尺寸,因此计算成本很高。在本文中,我们研究了全连接神经网络(NNs)在这种时间敏感应用中的应用,以提高其时间复杂度,同时控制由此产生的比特率损失。具体来说,引入了四个神经网络,每个神经网络对应编码树单元的每个深度。这些神经网络要么拆分当前的CU,要么终止CU搜索算法。由于神经网络的训练耗时且需要大量的训练数据,我们进一步提出了一种离线训练和在线适应相结合的新型训练策略来克服这一限制。我们的特征是基于拉普拉斯透明复合模型(LPTCM)从原始帧中提取的。在HEVC的全帧内配置上进行的实验表明,我们的方法是最好的神经网络方法之一,与原始HEVC相比,平均节省38%的时间,平均控制比特率损失为1.6%。
{"title":"Fully Connected Network for HEVC CU Split Decision equipped with Laplacian Transparent Composite Model","authors":"Hossam Amer, Abdullah M. Rashwan, E. Yang","doi":"10.1109/PCS.2018.8456290","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456290","url":null,"abstract":"High Efficiency Video Coding (HEVC) improves rate distortion (RD) performance significantly, but at the same time is computationally expensive due to the adoption of a large variety of coding unit (CU) sizes in its RD optimization. In this paper, we investigate the application of fully connected neural networks (NNs) to this time-sensitive application to improve its time complexity, while controlling the resulting bitrate loss. Specifically, four NNs are introduced with one NN for each depth of the coding tree unit. These NNs either split the current CU or terminate the CU search algorithm. Because training of NNs is time-consuming and requires large training data, we further propose a novel training strategy in which offline training and online adaptation work together to overcome this limitation. Our features are extracted from original frames based on the Laplacian Transparent Composite Model (LPTCM). Experiments carried out on all-intra configuration for HEVC reveal that our method is among the best NN methods, with an average time saving of 38% and an average controlled bitrate loss of 1.6%, compared to original HEVC.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122951301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456280
Yuwen He, Xiaoyu Xiu, Philippe Hanhart, Yan Ye, Fanyi Duanmu, Yao Wang
In this paper, a novel hybrid cubemap projection (HCP) is proposed to improve the 360-degree video coding efficiency. HCP allows adaptive sampling adjustments in the horizontal and vertical directions within each cube face. HCP parameters of each cube face can be adjusted based on the input 360-degree video content characteristics for a better sampling efficiency. The HCP parameters can be updated periodically to adapt to temporal content variation. An efficient HCP parameter estimation algorithm is proposed to reduce the computational complexity of parameter estimation. Experimental results demonstrate that HCP format achieves on average luma (Y) BD-rate reduction of 11.51%, 8.0%, and 0.54% compared to equirectangular projection format, cubemap projection format, and adjusted cubemap projection format, respectively, in terms of end-to-end WS-PSNR.
{"title":"Content-Adaptive 360-Degree Video Coding Using Hybrid Cubemap Projection","authors":"Yuwen He, Xiaoyu Xiu, Philippe Hanhart, Yan Ye, Fanyi Duanmu, Yao Wang","doi":"10.1109/PCS.2018.8456280","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456280","url":null,"abstract":"In this paper, a novel hybrid cubemap projection (HCP) is proposed to improve the 360-degree video coding efficiency. HCP allows adaptive sampling adjustments in the horizontal and vertical directions within each cube face. HCP parameters of each cube face can be adjusted based on the input 360-degree video content characteristics for a better sampling efficiency. The HCP parameters can be updated periodically to adapt to temporal content variation. An efficient HCP parameter estimation algorithm is proposed to reduce the computational complexity of parameter estimation. Experimental results demonstrate that HCP format achieves on average luma (Y) BD-rate reduction of 11.51%, 8.0%, and 0.54% compared to equirectangular projection format, cubemap projection format, and adjusted cubemap projection format, respectively, in terms of end-to-end WS-PSNR.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122983907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456279
Markus Kuchhold, Maik Simon, T. Sikora
We propose a novel lossy block-based image compression approach. Our approach builds on non-linear autoencoders that can, when properly trained, explore non-linear statistical dependencies in the image blocks for redundancy reduction. In contrast the DCT employed in JPEG is inherently restricted to exploration of linear dependencies using a second-order statistics framework. The coder is based on pre-trained class-specific Restricted Boltzmann Machines (RBM). These machines are statistical variants of neural network autoencoders that directly map pixel values in image blocks into coded bits. Decoders can be implemented with low computational complexity in a codebook design. Experimental results show that our RBM-codec outperforms JPEG at high compression rates, both in terms of PSNR, SSIM and subjective results.
{"title":"Restricted Boltzmann Machine Image Compression","authors":"Markus Kuchhold, Maik Simon, T. Sikora","doi":"10.1109/PCS.2018.8456279","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456279","url":null,"abstract":"We propose a novel lossy block-based image compression approach. Our approach builds on non-linear autoencoders that can, when properly trained, explore non-linear statistical dependencies in the image blocks for redundancy reduction. In contrast the DCT employed in JPEG is inherently restricted to exploration of linear dependencies using a second-order statistics framework. The coder is based on pre-trained class-specific Restricted Boltzmann Machines (RBM). These machines are statistical variants of neural network autoencoders that directly map pixel values in image blocks into coded bits. Decoders can be implemented with low computational complexity in a codebook design. Experimental results show that our RBM-codec outperforms JPEG at high compression rates, both in terms of PSNR, SSIM and subjective results.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115516293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456260
T. Richter, R. Clark
While ISO WG1 recently celebrated the 25th anniversary of its most successful standard, it seems to be more than surprising that up to now, no reference implementation of this standard exists. During an ongoing activity aiming at filling this gap, several observations have been made in how far the “living standard” deviates from the ISO documents. In particular, applying the official reference testing procedure of JPEG, available as ITU Recommendation T.83 or ISO/IEC 10918-2, turned out to be more a challenge than expected. This document sheds some light on the JPEG ISO standard, and our findings during reference testing a legacy, 25 year old standard.
{"title":"Why JPEG is not JPEG — Testing a 25 years old Standard","authors":"T. Richter, R. Clark","doi":"10.1109/PCS.2018.8456260","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456260","url":null,"abstract":"While ISO WG1 recently celebrated the 25th anniversary of its most successful standard, it seems to be more than surprising that up to now, no reference implementation of this standard exists. During an ongoing activity aiming at filling this gap, several observations have been made in how far the “living standard” deviates from the ISO documents. In particular, applying the official reference testing procedure of JPEG, available as ITU Recommendation T.83 or ISO/IEC 10918-2, turned out to be more a challenge than expected. This document sheds some light on the JPEG ISO standard, and our findings during reference testing a legacy, 25 year old standard.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132541649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456306
Ignace P. Saenen, Ruben Verhack, Vasileios Avramelos, G. Wallendael, P. Lambert
Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities such as light field images and video. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. Previous research has shown the feasibility of real-time pixel-parallel rendering of static light field images. Each pixel is independently reconstructed by kernels that lay in its vicinity. The number of kernels involved forms the bottleneck on the achievable framerate. The goal of this paper is twofold. Firstly, we introduce pixel-level rendering of light field video, as previous work only rendered static content. Secondly, we investigate rendering using a predefined number of most significant kernels. As such, we can deliver hard real-time constraints by trading off the reconstruction quality.
{"title":"Hard Real-Time, Pixel-Parallel Rendering of Light Field Videos Using Steered Mixture-of-Experts","authors":"Ignace P. Saenen, Ruben Verhack, Vasileios Avramelos, G. Wallendael, P. Lambert","doi":"10.1109/PCS.2018.8456306","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456306","url":null,"abstract":"Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities such as light field images and video. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. Previous research has shown the feasibility of real-time pixel-parallel rendering of static light field images. Each pixel is independently reconstructed by kernels that lay in its vicinity. The number of kernels involved forms the bottleneck on the achievable framerate. The goal of this paper is twofold. Firstly, we introduce pixel-level rendering of light field video, as previous work only rendered static content. Secondly, we investigate rendering using a predefined number of most significant kernels. As such, we can deliver hard real-time constraints by trading off the reconstruction quality.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124054920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456254
Osamu Watanabe, H. Kobayashi, H. Kiya
An efficient two-layer coding method using the histogram packing technique with the backward compatibility to the legacy JPEG is proposed in this paper. The JPEG XT, which is the international standard to compress HDR images, adopts two-layer coding scheme for backward compatibility to the legacy JPEG. However, this two-layer coding structure does not give better lossless performance than the other existing single-layer coding methods for HDR images. Moreover, the JPEG XT has problems on determination of the lossless coding parameters; Finding appropriate combination of the parameter values is necessary to achieve good lossless performance. The histogram sparseness of HDR images is discussed and it is pointed out that the histogram packing technique considering the sparseness is able to improve the performance of lossless compression for HDR images and a novel two-layer coding with the histogram packing technique is proposed. The experimental results demonstrate that not only the proposed method has a better lossless compression performance than that of the JPEG XT, but also there is no need to determine image-dependent parameter values for good compression performance in spite of having the backward compatibility to the well known legacy JPEG standard.
{"title":"Two-layer Lossless HDR Coding considering Histogram Sparseness with Backward Compatibility to JPEG","authors":"Osamu Watanabe, H. Kobayashi, H. Kiya","doi":"10.1109/PCS.2018.8456254","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456254","url":null,"abstract":"An efficient two-layer coding method using the histogram packing technique with the backward compatibility to the legacy JPEG is proposed in this paper. The JPEG XT, which is the international standard to compress HDR images, adopts two-layer coding scheme for backward compatibility to the legacy JPEG. However, this two-layer coding structure does not give better lossless performance than the other existing single-layer coding methods for HDR images. Moreover, the JPEG XT has problems on determination of the lossless coding parameters; Finding appropriate combination of the parameter values is necessary to achieve good lossless performance. The histogram sparseness of HDR images is discussed and it is pointed out that the histogram packing technique considering the sparseness is able to improve the performance of lossless compression for HDR images and a novel two-layer coding with the histogram packing technique is proposed. The experimental results demonstrate that not only the proposed method has a better lossless compression performance than that of the JPEG XT, but also there is no need to determine image-dependent parameter values for good compression performance in spite of having the backward compatibility to the well known legacy JPEG standard.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124065259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456239
Jennifer Rasch, Jonathan Pfaff, Michael Schäfer, H. Schwarz, Martin Winken, Mischa Siekmann, D. Marpe, T. Wiegand
In this paper we combine state of the art video compression and Partial Differential Equation (PDE) based image processing methods. We introduce a new signal adaptive method to filter the predictions of a hybrid video codec using a system of PDEs describing a diffusion process. The method can be applied to intra as well as inter predictions. The filter is embedded into the framework of HEVC. The efficiency of the HEVC video codec is improved by up to −2.76% for All Intra and −3.56% for Random Access measured in Bjøntegaard delta (BD) rate. Coding gains of up to −8.76% can be observed for individual test sequences.
{"title":"A Signal Adaptive Diffusion Filter For Video Coding","authors":"Jennifer Rasch, Jonathan Pfaff, Michael Schäfer, H. Schwarz, Martin Winken, Mischa Siekmann, D. Marpe, T. Wiegand","doi":"10.1109/PCS.2018.8456239","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456239","url":null,"abstract":"In this paper we combine state of the art video compression and Partial Differential Equation (PDE) based image processing methods. We introduce a new signal adaptive method to filter the predictions of a hybrid video codec using a system of PDEs describing a diffusion process. The method can be applied to intra as well as inter predictions. The filter is embedded into the framework of HEVC. The efficiency of the HEVC video codec is improved by up to −2.76% for All Intra and −3.56% for Random Access measured in Bjøntegaard delta (BD) rate. Coding gains of up to −8.76% can be observed for individual test sequences.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123705502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456287
Md. Asikuzzaman, M. Pickering
Effective motion compensated prediction plays a significant role in efficient video compression. Image registration can be used to estimate the motion of the scene in a frame by finding the geometric transformation which automatically aligns reference and target images. In the video coding literature, image registration has been applied to find the global motion in a video frame. However, if the motion of individual objects in a frame is inconsistent across time, the global motion may provide a very inefficient representation of the true motion present in the scene. In this paper we propose a motion estimation algorithm for video coding using a new similarity measure called the edge position difference (EPD). This technique estimates the motion of the individual objects based on matching the edges of objects rather than estimating the motion using the pixel values in the frame. Experimental results demonstrate that the proposed edge-based similarity measure approach achieves superior motion compensated prediction for objects in a scene when compared to the approach which only considers the pixel values of the frame.
{"title":"Object-Based Motion Estimation Using the EPD Similarity Measure","authors":"Md. Asikuzzaman, M. Pickering","doi":"10.1109/PCS.2018.8456287","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456287","url":null,"abstract":"Effective motion compensated prediction plays a significant role in efficient video compression. Image registration can be used to estimate the motion of the scene in a frame by finding the geometric transformation which automatically aligns reference and target images. In the video coding literature, image registration has been applied to find the global motion in a video frame. However, if the motion of individual objects in a frame is inconsistent across time, the global motion may provide a very inefficient representation of the true motion present in the scene. In this paper we propose a motion estimation algorithm for video coding using a new similarity measure called the edge position difference (EPD). This technique estimates the motion of the individual objects based on matching the edges of objects rather than estimating the motion using the pixel values in the frame. Experimental results demonstrate that the proposed edge-based similarity measure approach achieves superior motion compensated prediction for objects in a scene when compared to the approach which only considers the pixel values of the frame.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456242
Ruben Verhack, G. Wallendael, Martijn Courteaux, P. Lambert, T. Sikora
Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. The goal of this paper is to introduce SMoE for 4D light field videos by including the temporal dimension. However, these videos contain vast amounts of samples due to the large number of views per frame. Previous work on static light field images mitigated the problem by hard subdividing the modeling problem. However, such a hard subdivision introduces visually disturbing block artifacts on moving objects in dynamic image data. We propose a novel modeling method that does not result in block artifacts while minimizing the computational complexity and which allows for a varying spread of kernels in the spatio-temporal domain. Experiments validate that we can progressively model light field videos with increasing objective quality up to 0.97 SSIM.
{"title":"Progressive Modeling of Steered Mixture-of-Experts for Light Field Video Approximation","authors":"Ruben Verhack, G. Wallendael, Martijn Courteaux, P. Lambert, T. Sikora","doi":"10.1109/PCS.2018.8456242","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456242","url":null,"abstract":"Steered Mixture-of-Experts (SMoE) is a novel framework for the approximation, coding, and description of image modalities. The future goal is to arrive at a representation for Six Degrees-of-Freedom (6DoF) image data. The goal of this paper is to introduce SMoE for 4D light field videos by including the temporal dimension. However, these videos contain vast amounts of samples due to the large number of views per frame. Previous work on static light field images mitigated the problem by hard subdividing the modeling problem. However, such a hard subdivision introduces visually disturbing block artifacts on moving objects in dynamic image data. We propose a novel modeling method that does not result in block artifacts while minimizing the computational complexity and which allows for a varying spread of kernels in the spatio-temporal domain. Experiments validate that we can progressively model light field videos with increasing objective quality up to 0.97 SSIM.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128761518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}