Pub Date : 2024-02-22DOI: 10.1109/TBC.2024.3358756
Li Yang;Jianzhang Liu;Shufeng Li;Deyou Zhang;Zhiping Xia
Video streaming services have gradually become the dominant use case on the Internet, where people’s focus has shifted from a service-centric approach to a user-centric approach. This paper proposes a continuous Quality of Experience (QoE) evaluation method based on Quality of Service (QoS) parameters, which combines the advantages of subjective and objective research methods. The proposed method can accurately calculate the achievable QoE based on user QoS. Additionally, we develop a QoE-Ensemble MLP prediction model, employing ensemble learning and MLP techniques, to overcome limitations of the QoE evaluation method and accurately predict user QoE based on QoS parameters. Furthermore, we propose a low-complexity network bandwidth allocation algorithm based on a QoE prediction model to help service providers minimize network bandwidth waste while meeting user QoE requirements. Finally, the experiments show that our QoE evaluation model and network bandwidth allocation algorithm have better performance. And according to the result analysis, we also got the connection between QoS parameters and QoE.
{"title":"Enhancing User Experience in Ultra HD Cloud Performing Arts Live Streaming: A QoS-to-QoE Mapping Approach","authors":"Li Yang;Jianzhang Liu;Shufeng Li;Deyou Zhang;Zhiping Xia","doi":"10.1109/TBC.2024.3358756","DOIUrl":"10.1109/TBC.2024.3358756","url":null,"abstract":"Video streaming services have gradually become the dominant use case on the Internet, where people’s focus has shifted from a service-centric approach to a user-centric approach. This paper proposes a continuous Quality of Experience (QoE) evaluation method based on Quality of Service (QoS) parameters, which combines the advantages of subjective and objective research methods. The proposed method can accurately calculate the achievable QoE based on user QoS. Additionally, we develop a QoE-Ensemble MLP prediction model, employing ensemble learning and MLP techniques, to overcome limitations of the QoE evaluation method and accurately predict user QoE based on QoS parameters. Furthermore, we propose a low-complexity network bandwidth allocation algorithm based on a QoE prediction model to help service providers minimize network bandwidth waste while meeting user QoE requirements. Finally, the experiments show that our QoE evaluation model and network bandwidth allocation algorithm have better performance. And according to the result analysis, we also got the connection between QoS parameters and QoE.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"413-428"},"PeriodicalIF":4.5,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panoramic video undergoes projection onto a two-dimensional plane for compression and subsequent back-projection onto a sphere for display. This process introduces inconsistency between compression distortion and perceived spherical distortion, which causes a serious loss in coding efficiency. Meanwhile, the existing independent rate-distortion optimization (RDO) model for spherical distortion solely accounts for the current coding frame and neglects its influence on subsequent frames, which leads to sub-optimal coding performance. To this end, we propose a spherical distortion temporal propagation and spatial mapping model for efficient panoramic video coding. First, a zero-delay spherical distortion backward propagation chain is established in the temporal domain, and distortion impact factors are computed. Then, an accurate spatial mapping relationship between spherical distortion and coding distortion is constructed, along with the calculation of spatial mapping weights. Finally, these components are integrated into spherical RDO. The experimental results demonstrated the effectiveness of the proposed algorithm. Compared to the versatile video coding test model (VTM-14.0) with a 360Lib extension under low-delay P frame and B frame configurations, the proposed algorithm achieves bitrate savings of 9.4% (up to 19.4%) and 8.5% (up to 19.0%) by using WSPSNR as the distortion evaluation index, respectively. Additionally, the coding time was reduced by 14.53% and 15.65%, respectively.
全景视频先投影到二维平面上进行压缩,然后再反投影到球面上进行显示。这一过程会导致压缩失真与感知球面失真不一致,从而严重降低编码效率。同时,现有的球形失真独立速率-失真优化(RDO)模型只考虑当前编码帧,忽略了其对后续帧的影响,导致编码性能未达到最佳。为此,我们提出了一种球形失真时间传播和空间映射模型,用于高效的全景视频编码。首先,在时域建立零延迟球形失真后向传播链,并计算失真影响因子。然后,构建球形失真与编码失真之间的精确空间映射关系,并计算空间映射权重。最后,将这些组件集成到球形 RDO 中。实验结果证明了所提算法的有效性。在低延迟 P 帧和 B 帧配置下,与带有 360Lib 扩展的通用视频编码测试模型(VTM-14.0)相比,以 WSPSNR 作为失真评估指标,所提算法分别节省了 9.4% (最高 19.4%)和 8.5%(最高 19.0%)的比特率。此外,编码时间也分别缩短了 14.53% 和 15.65%。
{"title":"Spherical Distortion Temporal Propagation and Spatial Mapping Model for Efficient Panoramic Video Coding","authors":"Xu Yang;Minfeng Huang;Hongwei Guo;Shengxi Li;Lei Luo;Ce Zhu","doi":"10.1109/TBC.2024.3358749","DOIUrl":"10.1109/TBC.2024.3358749","url":null,"abstract":"Panoramic video undergoes projection onto a two-dimensional plane for compression and subsequent back-projection onto a sphere for display. This process introduces inconsistency between compression distortion and perceived spherical distortion, which causes a serious loss in coding efficiency. Meanwhile, the existing independent rate-distortion optimization (RDO) model for spherical distortion solely accounts for the current coding frame and neglects its influence on subsequent frames, which leads to sub-optimal coding performance. To this end, we propose a spherical distortion temporal propagation and spatial mapping model for efficient panoramic video coding. First, a zero-delay spherical distortion backward propagation chain is established in the temporal domain, and distortion impact factors are computed. Then, an accurate spatial mapping relationship between spherical distortion and coding distortion is constructed, along with the calculation of spatial mapping weights. Finally, these components are integrated into spherical RDO. The experimental results demonstrated the effectiveness of the proposed algorithm. Compared to the versatile video coding test model (VTM-14.0) with a 360Lib extension under low-delay P frame and B frame configurations, the proposed algorithm achieves bitrate savings of 9.4% (up to 19.4%) and 8.5% (up to 19.0%) by using WSPSNR as the distortion evaluation index, respectively. Additionally, the coding time was reduced by 14.53% and 15.65%, respectively.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"654-666"},"PeriodicalIF":4.5,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the latest video coding standard, Versatile Video Coding (VVC) is highly efficient at the cost of very high coding complexity, which seriously hinders its practical application. Therefore, it is very crucial to improve its coding speed. In this paper, we propose a learning-based fast split mode (SM) and directional mode (DM) decision algorithm for VVC intra prediction using a deep learning approach. Specifically, given the observation that the SM distributions of coding units (CUs) of different sizes are significantly distinct, we first design the neural networks separately and train the SM models for all CUs of different sizes to obtain the probability of SMs and skip the unlikely ones. Second, given a similar observation that the DM distributions of CUs of different sizes are distinct, we design neural networks to train the DM models for all CUs of different sizes separately to obtain the probabilities of DMs, and then adaptively select candidate DMs based on probabilities of their located SMs. Third, after an SM is checked, we select its probability, residual coefficients, rate-distortion (RD) cost, etc. as features, and design a lightweight neural network (LNN) model to early terminate SM selection. Experimental results demonstrate that the proposed algorithm can reduce the encoding time of VVC by 70.73% with 2.44% increase in Bjøntegaard delta bit-rate (BDBR) on average.
作为最新的视频编码标准,多功能视频编码(VVC)具有极高的编码效率,但代价是极高的编码复杂度,这严重阻碍了它的实际应用。因此,提高其编码速度至关重要。在本文中,我们提出了一种基于学习的快速分割模式(SM)和定向模式(DM)决策算法,利用深度学习方法进行 VVC 内部预测。具体来说,鉴于观察到不同大小的编码单元(CU)的 SM 分布明显不同,我们首先分别设计神经网络,并对所有不同大小的 CU 训练 SM 模型,以获得 SM 的概率并跳过不可能的 SM。其次,鉴于不同大小的 CU 的 DM 分布具有类似的观察结果,我们设计神经网络,分别训练所有不同大小的 CU 的 DM 模型,以获得 DM 的概率,然后根据其所在 SM 的概率自适应地选择候选 DM。第三,在检查出 SM 后,我们选择其概率、残差系数、速率失真(RD)成本等作为特征,并设计轻量级神经网络(LNN)模型来提前终止 SM 选择。实验结果表明,所提出的算法可将 VVC 的编码时间缩短 70.73%,而 Bjøntegaard delta 比特率(BDBR)平均提高 2.44%。
{"title":"Learning-Based Fast Splitting and Directional Mode Decision for VVC Intra Prediction","authors":"Yuanyuan Huang;Junyi Yu;Dayong Wang;Xin Lu;Frederic Dufaux;Hui Guo;Ce Zhu","doi":"10.1109/TBC.2024.3360729","DOIUrl":"10.1109/TBC.2024.3360729","url":null,"abstract":"As the latest video coding standard, Versatile Video Coding (VVC) is highly efficient at the cost of very high coding complexity, which seriously hinders its practical application. Therefore, it is very crucial to improve its coding speed. In this paper, we propose a learning-based fast split mode (SM) and directional mode (DM) decision algorithm for VVC intra prediction using a deep learning approach. Specifically, given the observation that the SM distributions of coding units (CUs) of different sizes are significantly distinct, we first design the neural networks separately and train the SM models for all CUs of different sizes to obtain the probability of SMs and skip the unlikely ones. Second, given a similar observation that the DM distributions of CUs of different sizes are distinct, we design neural networks to train the DM models for all CUs of different sizes separately to obtain the probabilities of DMs, and then adaptively select candidate DMs based on probabilities of their located SMs. Third, after an SM is checked, we select its probability, residual coefficients, rate-distortion (RD) cost, etc. as features, and design a lightweight neural network (LNN) model to early terminate SM selection. Experimental results demonstrate that the proposed algorithm can reduce the encoding time of VVC by 70.73% with 2.44% increase in Bjøntegaard delta bit-rate (BDBR) on average.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"681-692"},"PeriodicalIF":4.5,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timing mismatch among signals of different receiving antennas degrades the maximum ratio combining (MRC) performance in broadcasting systems, and thus limits the output signal-to-noise ratio (SNR) after MRC. To circumvent this limitation, in this paper, a time-robust MRC (TR-MRC) receiver for broadcasting reception enhancement is designed by employing the tapped-delay-line (TDL) structure, where the weight matrix is derived as iterated-form and the average output SNR is calculated as closed-form. Simulation results confirm that the proposed TR-MRC receiver can improve the average output SNR and reduce the bit error rate (BER) under imperfect time alignment, especially for large number of receiving antennas and high input SNR. However, blindly increasing the TDL order can not bring significant improvement of the TR-MRC performance with unknown timing mismatch, showing a trade-off between the output SNR and the implementation complexity introduced by increasing TDL order.
{"title":"Time-Robust MRC Design for Broadcasting Reception Enhancement","authors":"Wenbo Guo;Hongzhi Zhao;Jiaxin Du;Shihai Shao;Youxi Tang","doi":"10.1109/TBC.2024.3353569","DOIUrl":"10.1109/TBC.2024.3353569","url":null,"abstract":"Timing mismatch among signals of different receiving antennas degrades the maximum ratio combining (MRC) performance in broadcasting systems, and thus limits the output signal-to-noise ratio (SNR) after MRC. To circumvent this limitation, in this paper, a time-robust MRC (TR-MRC) receiver for broadcasting reception enhancement is designed by employing the tapped-delay-line (TDL) structure, where the weight matrix is derived as iterated-form and the average output SNR is calculated as closed-form. Simulation results confirm that the proposed TR-MRC receiver can improve the average output SNR and reduce the bit error rate (BER) under imperfect time alignment, especially for large number of receiving antennas and high input SNR. However, blindly increasing the TDL order can not bring significant improvement of the TR-MRC performance with unknown timing mismatch, showing a trade-off between the output SNR and the implementation complexity introduced by increasing TDL order.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"747-752"},"PeriodicalIF":4.5,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent methods that project images into deep feature spaces to evaluate quality degradation have produced inefficient results due to biased mappings; i.e., these projections are not aligned with the perceptions of humans. In this paper, we develop a hyperdebiasing framework to address such bias in full-reference image quality assessment. First, we perform orthogonal Tucker decomposition on the top of feature tensors extracted by a feature extraction network to project features into a robust content-agnostic space and effectively eliminate the bias caused by subtle image perturbations. Second, we propose a hypernetwork in which the content-aware parameters are produced for reprojecting features in a deep subspace for quality prediction. By leveraging the content diversity of large-scale blind-reference datasets, the perception rule between image content and image quality is established. Third, a quality prediction network is proposed by combining debiased content-aware and content-agnostic features to predict the final image quality score. To demonstrate the efficacy of our proposed method, we conducted numerous experiments on comprehensive databases. The experimental results validate that our method achieves state-of-the-art performance in predicting image quality.
{"title":"HDIQA: A Hyper Debiasing Framework for Full Reference Image Quality Assessment","authors":"Mingliang Zhou;Heqiang Wang;Xuekai Wei;Yong Feng;Jun Luo;Huayan Pu;Jinglei Zhao;Liming Wang;Zhigang Chu;Xin Wang;Bin Fang;Zhaowei Shang","doi":"10.1109/TBC.2024.3353573","DOIUrl":"10.1109/TBC.2024.3353573","url":null,"abstract":"Recent methods that project images into deep feature spaces to evaluate quality degradation have produced inefficient results due to biased mappings; i.e., these projections are not aligned with the perceptions of humans. In this paper, we develop a hyperdebiasing framework to address such bias in full-reference image quality assessment. First, we perform orthogonal Tucker decomposition on the top of feature tensors extracted by a feature extraction network to project features into a robust content-agnostic space and effectively eliminate the bias caused by subtle image perturbations. Second, we propose a hypernetwork in which the content-aware parameters are produced for reprojecting features in a deep subspace for quality prediction. By leveraging the content diversity of large-scale blind-reference datasets, the perception rule between image content and image quality is established. Third, a quality prediction network is proposed by combining debiased content-aware and content-agnostic features to predict the final image quality score. To demonstrate the efficacy of our proposed method, we conducted numerous experiments on comprehensive databases. The experimental results validate that our method achieves state-of-the-art performance in predicting image quality.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"545-554"},"PeriodicalIF":4.5,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elevating traditional 2-dimensional (2D) plane display to 4-dimensional (4D) light field display can significantly enhance users’ immersion and realism, because light field image (LFI) provides various visual cues in terms of multi-view disparity, motion disparity, and selective focus. Therefore, it is crucial to establish a light field image quality assessment (LF-IQA) model that aligns with human visual perception characteristics. However, it has always been a challenge to evaluate the perceptual quality of multiple light field visual cues simultaneously and consistently. To this end, this paper proposes a Transformer-based explicit learning of light field geometry for the no-reference light field image quality assessment. Specifically, to explicitly learn the light field epipolar geometry, we stack up light field sub-aperture images (SAIs) to form four SAI stacks according to four specific light field angular directions, and use a sub-grouping strategy to hierarchically learn the local and global light field geometric features. Then, a Transformer encoder with a spatial-shift tokenization strategy is applied to learn structure-aware light field geometric distortion representation, which is used to regress the final quality score. Evaluation experiments are carried out on three commonly used light field image quality assessment datasets: Win5-LID, NBU-LF1.0, and MPI-LFA. Experimental results demonstrate that our model outperforms state-of-the-art methods and exhibits a high correlation with human perception. The source code is publicly available at https://github.com/windyz77/GeoNRLFIQA