Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456265
S. Schwarz, M. Hannuksela, Vida Fakour Sevom, Nahid Sheikhi-Pour
Due to the increased popularity of augmented and virtual reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of ‘tele-immersive’ applications, such as virtual (VR), augmented (AR) or mixed (MR) reality with Six Degrees of Freedom (6DoF) capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology.
{"title":"2D Video Coding of Volumetric Video Data","authors":"S. Schwarz, M. Hannuksela, Vida Fakour Sevom, Nahid Sheikhi-Pour","doi":"10.1109/PCS.2018.8456265","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456265","url":null,"abstract":"Due to the increased popularity of augmented and virtual reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today’s networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of ‘tele-immersive’ applications, such as virtual (VR), augmented (AR) or mixed (MR) reality with Six Degrees of Freedom (6DoF) capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126520870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456249
Yue Chen, D. Mukherjee, Jingning Han, Adrian Grange, Yaowu Xu, Zoe Liu, Sarah Parker, Cheng Chen, Hui Su, Urvang Joshi, Ching-Han Chiang, Yunqing Wang, Paul Wilkins, Jim Bankoski, Luc N. Trudeau, N. Egge, J. Valin, T. Davies, Steinar Midtskogen, A. Norkin, Peter De Rivaz
AV1 is an emerging open-source and royalty-free video compression format, which is jointly developed and finalized in early 2018 by the Alliance for Open Media (AOMedia) industry consortium. The main goal of AV1 development is to achieve substantial compression gain over state-of-the-art codecs while maintaining practical decoding complexity and hardware feasibility. This paper provides a brief technical overview of key coding techniques in AV1 along with preliminary compression performance comparison against VP9 and HEVC.
{"title":"An Overview of Core Coding Tools in the AV1 Video Codec","authors":"Yue Chen, D. Mukherjee, Jingning Han, Adrian Grange, Yaowu Xu, Zoe Liu, Sarah Parker, Cheng Chen, Hui Su, Urvang Joshi, Ching-Han Chiang, Yunqing Wang, Paul Wilkins, Jim Bankoski, Luc N. Trudeau, N. Egge, J. Valin, T. Davies, Steinar Midtskogen, A. Norkin, Peter De Rivaz","doi":"10.1109/PCS.2018.8456249","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456249","url":null,"abstract":"AV1 is an emerging open-source and royalty-free video compression format, which is jointly developed and finalized in early 2018 by the Alliance for Open Media (AOMedia) industry consortium. The main goal of AV1 development is to achieve substantial compression gain over state-of-the-art codecs while maintaining practical decoding complexity and hardware feasibility. This paper provides a brief technical overview of key coding techniques in AV1 along with preliminary compression performance comparison against VP9 and HEVC.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125791572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456278
Shiba Kuanar, C. Conly, K. Rao
High Efficiency Video Coding (HEVC), which is the latest video coding standard currently, achieves up to 50% bit rate reduction compared to previous H.264/AVC standard. While performing the block based video coding, these lossy compression techniques produce various artifacts like blurring, distortion, ringing, and contouring effects on output frames, especially at low bit rates. To reduce those compression artifacts HEVC adopted two post processing filtering technique namely de-blocking filter (DBF) and sample adaptive offset (SAO) on the decoder side. While DBF applies to samples located at block boundaries, SAO nonlinear operation applies adaptively to samples satisfying the gradient based conditions through a lookup table. Again SAO filter corrects the quantization errors by sending edge offset values to decoders. This operation consumes extra signaling bit and becomes an overhead to network. In this paper, we proposed a Convolutional Neural Network (CNN) based architecture for SAO in-loop filtering operation without modifying anything on encoding process. Our experimental results show that our proposed model outperformed previous state-of-the-art models in terms of BD-PSNR (0.408 dB) and BD-BR (3.44%), measured on a widely available standard video sequences.
{"title":"Deep Learning Based HEVC In-Loop Filtering for Decoder Quality Enhancement","authors":"Shiba Kuanar, C. Conly, K. Rao","doi":"10.1109/PCS.2018.8456278","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456278","url":null,"abstract":"High Efficiency Video Coding (HEVC), which is the latest video coding standard currently, achieves up to 50% bit rate reduction compared to previous H.264/AVC standard. While performing the block based video coding, these lossy compression techniques produce various artifacts like blurring, distortion, ringing, and contouring effects on output frames, especially at low bit rates. To reduce those compression artifacts HEVC adopted two post processing filtering technique namely de-blocking filter (DBF) and sample adaptive offset (SAO) on the decoder side. While DBF applies to samples located at block boundaries, SAO nonlinear operation applies adaptively to samples satisfying the gradient based conditions through a lookup table. Again SAO filter corrects the quantization errors by sending edge offset values to decoders. This operation consumes extra signaling bit and becomes an overhead to network. In this paper, we proposed a Convolutional Neural Network (CNN) based architecture for SAO in-loop filtering operation without modifying anything on encoding process. Our experimental results show that our proposed model outperformed previous state-of-the-art models in terms of BD-PSNR (0.408 dB) and BD-BR (3.44%), measured on a widely available standard video sequences.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124867899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456241
Cheng Chen, Jingning Han, Yaowu Xu
Compound motion compensated prediction that combines reconstructed reference blocks to exploit the temporal correlation is a major component in the hierarchical coding scheme. A uniform combination that applies equal weights to reference blocks regardless of distances towards the current frame is widely employed in mainstream codecs. Linear distance weighted combination, while reflecting the temporal correlation, is likely to ignore the quantization noise factor and hence degrade the prediction quality. This work builds on the premise that the compound prediction mode effectively embeds two functionalities - exploiting temporal correlation in the video signal and canceling the quantization noise from reference blocks. A modified distance weighting scheme is introduced to optimize the trade-off between these two factors. It quantizes the weights to limit the minimum contribution from both reference blocks for noise cancellation. We further introduces a hybrid scheme allowing the codec to switch between the proposed distance weighted compound mode and the averaging mode to provide more flexibility for the trade-off between temporal correlation and noise cancellation. The scheme is implemented in the AV1 codec as part of the syntax definition. It is experimentally demonstrated to provide on average 1.5% compression gains across a wide range of test sets.
{"title":"A Hybrid Weighted Compound Motion Compensated Prediction for Video Compression","authors":"Cheng Chen, Jingning Han, Yaowu Xu","doi":"10.1109/PCS.2018.8456241","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456241","url":null,"abstract":"Compound motion compensated prediction that combines reconstructed reference blocks to exploit the temporal correlation is a major component in the hierarchical coding scheme. A uniform combination that applies equal weights to reference blocks regardless of distances towards the current frame is widely employed in mainstream codecs. Linear distance weighted combination, while reflecting the temporal correlation, is likely to ignore the quantization noise factor and hence degrade the prediction quality. This work builds on the premise that the compound prediction mode effectively embeds two functionalities - exploiting temporal correlation in the video signal and canceling the quantization noise from reference blocks. A modified distance weighting scheme is introduced to optimize the trade-off between these two factors. It quantizes the weights to limit the minimum contribution from both reference blocks for noise cancellation. We further introduces a hybrid scheme allowing the codec to switch between the proposed distance weighted compound mode and the averaging mode to provide more flexibility for the trade-off between temporal correlation and noise cancellation. The scheme is implemented in the AV1 codec as part of the syntax definition. It is experimentally demonstrated to provide on average 1.5% compression gains across a wide range of test sets.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128490448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456284
A. Choudhury, S. Daly
This study builds on previous work exploring machine learning and perceptual transforms in predicting overall display quality as a function of image quality dimensions that correspond to physical display parameters. Previously, we found that the use of perceptually transformed parameters or machine learning exceeded the performance of predictors using just physical parameters and linear regression. Further, the combination of perceptually transformed parameters with machine learning allowed for robustness to parameters outside of the data set, both for cases of interpolation and extrapolation. Here we apply machine learning at a more intrinsic level. We first evaluate how well the machine learning can develop predictors of the individual dimensions of the overall quality, and then how well those individual predictors can be consolidated across themselves to predict the overall display quality. Having predictions of individual dimensions of quality that are closely related to specific hardware design choices enables more nimble cost trade-off design options.
{"title":"Machine learning as applied intrinsically to individual dimensions of HDR Display Quality","authors":"A. Choudhury, S. Daly","doi":"10.1109/PCS.2018.8456284","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456284","url":null,"abstract":"This study builds on previous work exploring machine learning and perceptual transforms in predicting overall display quality as a function of image quality dimensions that correspond to physical display parameters. Previously, we found that the use of perceptually transformed parameters or machine learning exceeded the performance of predictors using just physical parameters and linear regression. Further, the combination of perceptually transformed parameters with machine learning allowed for robustness to parameters outside of the data set, both for cases of interpolation and extrapolation. Here we apply machine learning at a more intrinsic level. We first evaluate how well the machine learning can develop predictors of the individual dimensions of the overall quality, and then how well those individual predictors can be consolidated across themselves to predict the overall display quality. Having predictions of individual dimensions of quality that are closely related to specific hardware design choices enables more nimble cost trade-off design options.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129406526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456307
Glenn Herrou, W. Hamidouche, L. Morin
Scalable video coding enables to compress the video at different formats within a single layered bitstream. SHVC, the scalable extension of the High Efficiency Video Coding (HEVC) standard, enables x2 spatial scalability, among other additional features. The closed-loop architecture of the SHVC codec is based on the use of multiple instances of the HEVC codec to encode the video layers, which considerably increases the encoding complexity. With the arrival of new immersive video formats, like 4K, 8K, High Frame Rate (HFR) and 360° videos, the quantity of data to compress is exploding, making the use of high-complexity coding algorithms unsuitable. In this paper, we propose a lowcomplexity scalable coding scheme based on the use of a single HEVC codec instance and a wavelet-based decomposition as pre-processing. The pre-encoding image decomposition relies on well-known simple Discrete Wavelet Transform (DWT) kernels, such as Haar or Le Gall 5/3. Compared to SHVC, the proposed architecture achieves a similar rate distortion performance with a coding complexity reduction of 50%.
可扩展的视频编码可以在单个层比特流中压缩不同格式的视频。SHVC是高效视频编码(HEVC)标准的可扩展扩展,支持x2空间可扩展性,以及其他附加功能。SHVC编解码器的闭环架构是基于使用多个HEVC编解码器实例对视频层进行编码,这大大增加了编码复杂度。随着新的沉浸式视频格式的出现,如4K、8K、高帧率(HFR)和360°视频,需要压缩的数据量呈爆炸式增长,这使得使用高复杂性的编码算法变得不合适。在本文中,我们提出了一种基于单个HEVC编解码器实例和基于小波分解作为预处理的低复杂度可扩展编码方案。预编码图像分解依赖于众所周知的简单离散小波变换(DWT)核,如Haar或Le Gall 5/3。与SHVC相比,该结构在编码复杂度降低50%的情况下实现了相似的率失真性能。
{"title":"Wavelet Decomposition Pre-processing for Spatial Scalability Video Compression Scheme","authors":"Glenn Herrou, W. Hamidouche, L. Morin","doi":"10.1109/PCS.2018.8456307","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456307","url":null,"abstract":"Scalable video coding enables to compress the video at different formats within a single layered bitstream. SHVC, the scalable extension of the High Efficiency Video Coding (HEVC) standard, enables x2 spatial scalability, among other additional features. The closed-loop architecture of the SHVC codec is based on the use of multiple instances of the HEVC codec to encode the video layers, which considerably increases the encoding complexity. With the arrival of new immersive video formats, like 4K, 8K, High Frame Rate (HFR) and 360° videos, the quantity of data to compress is exploding, making the use of high-complexity coding algorithms unsuitable. In this paper, we propose a lowcomplexity scalable coding scheme based on the use of a single HEVC codec instance and a wavelet-based decomposition as pre-processing. The pre-encoding image decomposition relies on well-known simple Discrete Wavelet Transform (DWT) kernels, such as Haar or Le Gall 5/3. Compared to SHVC, the proposed architecture achieves a similar rate distortion performance with a coding complexity reduction of 50%.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456252
E. Alexiou, T. Ebrahimi
Recent advances in depth sensing and display technologies, along with the significant growth of interest for augmented and virtual reality applications, lay the foundation for the rapid evolution of applications that provide immersive experiences. In such applications, advanced content representations are required in order to increase the engagement of the user with the displayed imageries. Point clouds have emerged as a promising solution to this aim, due to their efficiency in capturing, storing, delivering and rendering of 3D immersive contents. As in any type of imaging, the evaluation of point clouds in terms of visual quality is essential. In this paper, benchmarking results of the state-of-the-art objective metrics in geometry-only point clouds are reported and analyzed under two different types of geometry degradations, namely Gaussian noise and octree- based compression. Human ratings obtained from two subjective experiments are used as the ground truth. Our results show that most objective quality metrics perform well in the presence of noise, whereas one particular method has high predictive power and outperforms the others after octree-based encoding.
{"title":"Benchmarking of Objective Quality Metrics for Colorless Point Clouds","authors":"E. Alexiou, T. Ebrahimi","doi":"10.1109/PCS.2018.8456252","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456252","url":null,"abstract":"Recent advances in depth sensing and display technologies, along with the significant growth of interest for augmented and virtual reality applications, lay the foundation for the rapid evolution of applications that provide immersive experiences. In such applications, advanced content representations are required in order to increase the engagement of the user with the displayed imageries. Point clouds have emerged as a promising solution to this aim, due to their efficiency in capturing, storing, delivering and rendering of 3D immersive contents. As in any type of imaging, the evaluation of point clouds in terms of visual quality is essential. In this paper, benchmarking results of the state-of-the-art objective metrics in geometry-only point clouds are reported and analyzed under two different types of geometry degradations, namely Gaussian noise and octree- based compression. Human ratings obtained from two subjective experiments are used as the ground truth. Our results show that most objective quality metrics perform well in the presence of noise, whereas one particular method has high predictive power and outperforms the others after octree-based encoding.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"41 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120918293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456243
Haiqiang Wang, Xinfeng Zhang, Chao Yang, C.-C. Jay Kuo
The just-noticeable-difference (JND) visual perception property has received much attention in characterizing human subjective viewing experience of compressed video. In this work, we quantity the JND-based video quality assessment model using the satisfied user ratio (SUR) curve, and show that the SUR model can be greatly simplified since the JND points of multiple subjects for the same content in the VideoSet can be well modeled by the normal distribution. Then, we design an SUR prediction method with video quality degradation features and masking features and use them to predict the first, second and the third JND points and their corresponding SUR curves. Finally, we verify the performance of the proposed SUR prediction method with different configurations on the VideoSet. The experimental results demonstrate that the proposed SUR prediction method achieves good performance in various resolutions with the mean absolute error (MAE) of the SUR smaller than 0.05 on average.
{"title":"Analysis and Prediction of JND-Based Video Quality Model","authors":"Haiqiang Wang, Xinfeng Zhang, Chao Yang, C.-C. Jay Kuo","doi":"10.1109/PCS.2018.8456243","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456243","url":null,"abstract":"The just-noticeable-difference (JND) visual perception property has received much attention in characterizing human subjective viewing experience of compressed video. In this work, we quantity the JND-based video quality assessment model using the satisfied user ratio (SUR) curve, and show that the SUR model can be greatly simplified since the JND points of multiple subjects for the same content in the VideoSet can be well modeled by the normal distribution. Then, we design an SUR prediction method with video quality degradation features and masking features and use them to predict the first, second and the third JND points and their corresponding SUR curves. Finally, we verify the performance of the proposed SUR prediction method with different configurations on the VideoSet. The experimental results demonstrate that the proposed SUR prediction method achieves good performance in various resolutions with the mean absolute error (MAE) of the SUR smaller than 0.05 on average.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127684293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456297
Masaru Takeuchi, Shintaro Saika, Yusuke Sakamoto, Tatsuya Nagashima, Zhengxue Cheng, Kenji Kanai, J. Katto, Kaijin Wei, Ju Zengwei, Xu Wei
We introduce a perceptual video quality driven video encoding solution for optimized adaptive streaming. By using multiple bitrate/resolution encoding like MPEG-DASH, video streaming services can deliver the best video stream to a client, under the conditions of the client's available bandwidth and viewing device capability. However, conventional fixed encoding recipes (i.e., resolution-bitrate pairs) suffer from many problems, such as improper resolution selection and stream redundancy. To avoid these problems, we propose a novel video coding method, which generates multiple representations with constant JustNoticeable Difference (JND) interval. For this purpose, we developed a JND scale estimator using Support Vector Regression (SVR), and designed a pre-encoder which outputs an encoding recipe with constant JND interval in an adaptive manner to input video.
{"title":"Perceptual Quality Driven Adaptive Video Coding Using JND Estimation","authors":"Masaru Takeuchi, Shintaro Saika, Yusuke Sakamoto, Tatsuya Nagashima, Zhengxue Cheng, Kenji Kanai, J. Katto, Kaijin Wei, Ju Zengwei, Xu Wei","doi":"10.1109/PCS.2018.8456297","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456297","url":null,"abstract":"We introduce a perceptual video quality driven video encoding solution for optimized adaptive streaming. By using multiple bitrate/resolution encoding like MPEG-DASH, video streaming services can deliver the best video stream to a client, under the conditions of the client's available bandwidth and viewing device capability. However, conventional fixed encoding recipes (i.e., resolution-bitrate pairs) suffer from many problems, such as improper resolution selection and stream redundancy. To avoid these problems, we propose a novel video coding method, which generates multiple representations with constant JustNoticeable Difference (JND) interval. For this purpose, we developed a JND scale estimator using Support Vector Regression (SVR), and designed a pre-encoder which outputs an encoding recipe with constant JND interval in an adaptive manner to input video.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125872696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}