Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456304
Jing Li, Lukáš Krasula, P. Callet, Zhi Li, Yoann Baveye
The Internet streaming is changing the way of watching videos for people. Traditional quality assessment on the cable/satellite broadcasting system mainly focused on the perceptual quality. Nowadays, this concept has been extended to Quality of Experience (QoE) which considers also the contextual factors, such as the environment, the display devices, etc. In this study, we focus on the influence of devices on QoE. A subjective experiment was conducted by using our proposed AccAnn methodology. The observers evaluated the QoE of the video sequences by considering their Acceptance and Annoyance. Two devices were used in this study, TV and Tablet. The experimental results showed that the device was a significant influence factor on QoE. In addition, we found that this influence varied with the QoE of the video sequences. To quantify this influence, the Eliminated-By-Aspects model was used. The results could be used for the training of a device-neutral objective QoE metric. For video streaming providers, the quantification results of the influence from devices could be used to optimize the selection of streaming content. On one hand it could satisfy the QoE expectations of the observers according to the used devices, on the other hand it could help to save the bitrates.
{"title":"Quantifying the Influence of Devices on Quality of Experience for Video Streaming","authors":"Jing Li, Lukáš Krasula, P. Callet, Zhi Li, Yoann Baveye","doi":"10.1109/PCS.2018.8456304","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456304","url":null,"abstract":"The Internet streaming is changing the way of watching videos for people. Traditional quality assessment on the cable/satellite broadcasting system mainly focused on the perceptual quality. Nowadays, this concept has been extended to Quality of Experience (QoE) which considers also the contextual factors, such as the environment, the display devices, etc. In this study, we focus on the influence of devices on QoE. A subjective experiment was conducted by using our proposed AccAnn methodology. The observers evaluated the QoE of the video sequences by considering their Acceptance and Annoyance. Two devices were used in this study, TV and Tablet. The experimental results showed that the device was a significant influence factor on QoE. In addition, we found that this influence varied with the QoE of the video sequences. To quantify this influence, the Eliminated-By-Aspects model was used. The results could be used for the training of a device-neutral objective QoE metric. For video streaming providers, the quantification results of the influence from devices could be used to optimize the selection of streaming content. On one hand it could satisfy the QoE expectations of the observers according to the used devices, on the other hand it could help to save the bitrates.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"418 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116682674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456281
Adeel Abbas, David Newman, Srilakshmi Akula, Akhil Konda
Recently, the Joint Video Exploration Team (JVET) issued a Call for Proposals (CFP) for video compression technology that is expected to be successor to HEVC. In this paper, we present some of the technology from our joint response in the 360° video category of CFP. Goal was to keep design as simple as possible, with picture level preprocessing and without 360 specific coding tools. The response is based on a relatively new projection called Rotated Sphere Projection (RSP). RSP splits and surrounds the sphere using two faces that are cropped from Equirectangular Projection (ERP), in the same way as two flat pieces of rubber are stitched to form a tennis ball. This approach allows RSP to get closer to the sphere than Cube Map, achieving more continuity while preserving 3:2 aspect ratio. Our results show an average BDrate Luma coding gain of 10.5% compared to ERP using HEVC.
{"title":"Next Generation Video Coding for Spherical Content","authors":"Adeel Abbas, David Newman, Srilakshmi Akula, Akhil Konda","doi":"10.1109/PCS.2018.8456281","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456281","url":null,"abstract":"Recently, the Joint Video Exploration Team (JVET) issued a Call for Proposals (CFP) for video compression technology that is expected to be successor to HEVC. In this paper, we present some of the technology from our joint response in the 360° video category of CFP. Goal was to keep design as simple as possible, with picture level preprocessing and without 360 specific coding tools. The response is based on a relatively new projection called Rotated Sphere Projection (RSP). RSP splits and surrounds the sphere using two faces that are cropped from Equirectangular Projection (ERP), in the same way as two flat pieces of rubber are stitched to form a tennis ball. This approach allows RSP to get closer to the sphere than Cube Map, achieving more continuity while preserving 3:2 aspect ratio. Our results show an average BDrate Luma coding gain of 10.5% compared to ERP using HEVC.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121750493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper studies the video bit-rate required for 8K 119.88 Hz (120 Hz) high efficiency video coding (HEVC)/H.265 temporal scalable coding that can partially decode 59.94 Hz (60 Hz) video frames from compressed 120 Hz bit-streams. We compress 8K 120 Hz test sequences using software that emulates our developing HEVC/H.265 encoder and conduct two types of subjective evaluation experiments to investigate the appropriate bit-rate for both 8K 120 and 60 Hz videos for broadcasting purpose. From the results of the experiments, we conclude that the required video bit-rate for 8K 120 Hz temporal scalable coding is estimated to be between 85 and 110 Mbps, which is equivalent to the practical bit-rate for 8K 60 Hz videos, and the appropriate bitrate allocation for the 8K 60 Hz video in 8K 120 Hz temporal scalable video coding at 85 Mbps is presumed to be ∼80 Mbps.
{"title":"A Study on the Required Video Bit-rate for 8K 120-Hz HEVC/H.265 Temporal Scalable Coding","authors":"Yasuko Sugito, Shinya Iwasaki, Kazuhiro Chida, Kazuhisa Iguchi, Kikufumi Kanda, Xuying Lei, H. Miyoshi, Kimihiko Kazui","doi":"10.1109/PCS.2018.8456288","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456288","url":null,"abstract":"This paper studies the video bit-rate required for 8K 119.88 Hz (120 Hz) high efficiency video coding (HEVC)/H.265 temporal scalable coding that can partially decode 59.94 Hz (60 Hz) video frames from compressed 120 Hz bit-streams. We compress 8K 120 Hz test sequences using software that emulates our developing HEVC/H.265 encoder and conduct two types of subjective evaluation experiments to investigate the appropriate bit-rate for both 8K 120 and 60 Hz videos for broadcasting purpose. From the results of the experiments, we conclude that the required video bit-rate for 8K 120 Hz temporal scalable coding is estimated to be between 85 and 110 Mbps, which is equivalent to the practical bit-rate for 8K 60 Hz videos, and the appropriate bitrate allocation for the 8K 60 Hz video in 8K 120 Hz temporal scalable video coding at 85 Mbps is presumed to be ∼80 Mbps.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116108396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456309
Weijia Zhu, A. Segall
During the exploration of video coding technology for potential next generation standards, the Joint Video Exploration Team (JVET) has been studying quad-tree plus binary-tree (QTBT) partition structures within its Joint Exploration Model (JEM). This QTBT partition structure provides more flexibility compared with the quad-tree only partition structure in HEVC. Here, we further consider the QTBT structure and extended it to allow quad-tree partitioning to be performed both before and after a binary-tree partition. We refer to this structure as a compound split tree (CST). To show the efficacy of the approach, we implemented the method into JEM7. The method achieved 1.25%, 2.11% and 1.87% BD-bitrate savings for Y, U and V components on average under the random-access configuration, respectively.
{"title":"Compound Split Tree for Video Coding","authors":"Weijia Zhu, A. Segall","doi":"10.1109/PCS.2018.8456309","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456309","url":null,"abstract":"During the exploration of video coding technology for potential next generation standards, the Joint Video Exploration Team (JVET) has been studying quad-tree plus binary-tree (QTBT) partition structures within its Joint Exploration Model (JEM). This QTBT partition structure provides more flexibility compared with the quad-tree only partition structure in HEVC. Here, we further consider the QTBT structure and extended it to allow quad-tree partitioning to be performed both before and after a binary-tree partition. We refer to this structure as a compound split tree (CST). To show the efficacy of the approach, we implemented the method into JEM7. The method achieved 1.25%, 2.11% and 1.87% BD-bitrate savings for Y, U and V components on average under the random-access configuration, respectively.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134058435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456246
Alex Mackin, Mariana Afonso, Fan Zhang, D. Bull
This paper presents a full reference objective video quality metric (SRQM), which characterises the relationship between variations in spatial resolution and visual quality in the context of adaptive video formats. SRQM uses wavelet decomposition, subband combination with perceptually inspired weights, and spatial pooling, to estimate the relative quality between the frames of a high resolution reference video, and one that has been spatially adapted through a combination of down and upsampling. The uVI-SR video database is used to benchmark SRQM against five commonly-used quality metrics. The database contains 24 diverse video sequences that span a range of spatial resolutions up to UHD-I $(3840times 2160)$. An in- depth analysis demonstrates that SRQM is statistically superior to the other quality metrics for all tested adaptation filters, and all with relatively low computational complexity.
{"title":"SRQM: A Video Quality Metric for Spatial Resolution Adaptation","authors":"Alex Mackin, Mariana Afonso, Fan Zhang, D. Bull","doi":"10.1109/PCS.2018.8456246","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456246","url":null,"abstract":"This paper presents a full reference objective video quality metric (SRQM), which characterises the relationship between variations in spatial resolution and visual quality in the context of adaptive video formats. SRQM uses wavelet decomposition, subband combination with perceptually inspired weights, and spatial pooling, to estimate the relative quality between the frames of a high resolution reference video, and one that has been spatially adapted through a combination of down and upsampling. The uVI-SR video database is used to benchmark SRQM against five commonly-used quality metrics. The database contains 24 diverse video sequences that span a range of spatial resolutions up to UHD-I $(3840times 2160)$. An in- depth analysis demonstrates that SRQM is statistically superior to the other quality metrics for all tested adaptation filters, and all with relatively low computational complexity.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132766263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High dynamic range (HDR) image has larger luminance range than conventional low dynamic range (LDR) image, which is more consistent with human visual system (HVS). Recently, JPEG committee releases a new HDR image compression standard JPEG XT. It decomposes input HDR image into base layer and extension layer. However, this method doesn’t make full use of HVS, causing waste of bits on imperceptible regions to human eyes. In this paper, a visual saliency based HDR image compression scheme is proposed. The saliency map of tone mapped HDR image is first extracted, then is used to guide extension layer encoding. The compression quality is adaptive to the saliency of the coding region of the image. Extensive experimental results show that our method outperforms JPEG XT profile A, B, C, and offers the JPEG compatibility at the same time. Moreover, our method can provide progressive coding of extension layer.
{"title":"High Dynamic Range Image Compression Based on Visual Saliency","authors":"Shenda Li, Jin Wang, Qing Zhu","doi":"10.1017/ATSIP.2020.15","DOIUrl":"https://doi.org/10.1017/ATSIP.2020.15","url":null,"abstract":"High dynamic range (HDR) image has larger luminance range than conventional low dynamic range (LDR) image, which is more consistent with human visual system (HVS). Recently, JPEG committee releases a new HDR image compression standard JPEG XT. It decomposes input HDR image into base layer and extension layer. However, this method doesn’t make full use of HVS, causing waste of bits on imperceptible regions to human eyes. In this paper, a visual saliency based HDR image compression scheme is proposed. The saliency map of tone mapped HDR image is first extracted, then is used to guide extension layer encoding. The compression quality is adaptive to the saliency of the coding region of the image. Extensive experimental results show that our method outperforms JPEG XT profile A, B, C, and offers the JPEG compatibility at the same time. Moreover, our method can provide progressive coding of extension layer.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123164943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/PCS.2018.8456293
C. Bampis, A. Bovik, Zhi Li
When developing data-driven video quality assessment algorithms, the size of the available ground truth subjective data may hamper the generalization capabilities of the trained models. Nevertheless, if the application context is known a priori, leveraging data-driven approaches for video quality prediction can deliver promising results. Towards achieving highperforming video quality prediction for compression and scaling artifacts, Netflix developed the Video Multi-method Assessment Fusion (VMAF) Framework, a full-reference prediction system which uses a regression scheme to integrate multiple perceptionmotivated features to predict video quality. However, the current version of VMAF does not fully capture temporal video features relevant to temporal video distortions. To achieve this goal, we developed Ensemble VMAF (E-VMAF): a video quality predictor that combines two models: VMAF and predictions based on entropic differencing features calculated on video frames and frame differences. We demonstrate the improved performance of E-VMAF on various subjective video databases. The proposed model will become available as part of the open source package in https://github. com/Netflix/vmaf.
{"title":"A Simple Prediction Fusion Improves Data-driven Full-Reference Video Quality Assessment Models","authors":"C. Bampis, A. Bovik, Zhi Li","doi":"10.1109/PCS.2018.8456293","DOIUrl":"https://doi.org/10.1109/PCS.2018.8456293","url":null,"abstract":"When developing data-driven video quality assessment algorithms, the size of the available ground truth subjective data may hamper the generalization capabilities of the trained models. Nevertheless, if the application context is known a priori, leveraging data-driven approaches for video quality prediction can deliver promising results. Towards achieving highperforming video quality prediction for compression and scaling artifacts, Netflix developed the Video Multi-method Assessment Fusion (VMAF) Framework, a full-reference prediction system which uses a regression scheme to integrate multiple perceptionmotivated features to predict video quality. However, the current version of VMAF does not fully capture temporal video features relevant to temporal video distortions. To achieve this goal, we developed Ensemble VMAF (E-VMAF): a video quality predictor that combines two models: VMAF and predictions based on entropic differencing features calculated on video frames and frame differences. We demonstrate the improved performance of E-VMAF on various subjective video databases. The proposed model will become available as part of the open source package in https://github. com/Netflix/vmaf.","PeriodicalId":433667,"journal":{"name":"2018 Picture Coding Symposium (PCS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114412572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}