Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-09-12 DOI:10.1109/TIP.2024.3455989

Somdyuti Paul;Andrey Norkin;Alan C. Bovik

{"title":"Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning","authors":"Somdyuti Paul;Andrey Norkin;Alan C. Bovik","doi":"10.1109/TIP.2024.3455989","DOIUrl":null,"url":null,"abstract":"Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjøntegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5114-5128"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10679528/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible to capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 53.8% by our method, while the average Bjøntegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.26%, and the mean absolute deviation of the BD-rate distribution was 0.57%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过循环学习进行自适应视频流的凸面体预测

自适应视频流依赖于构建高效的比特率梯形图，以便在带宽限制条件下为观众提供最佳视觉质量。与内容相关的比特率梯形选择的传统方法要求用多个编码参数对视频镜头进行预编码，以找到由所得到的速率-质量曲线的凸壳给出的最佳工作点。然而，这一预编码步骤等同于在可能的编码参数空间中进行穷举搜索，在计算和时间支出方面都会造成巨大的开销。为了减少这一开销，我们提出了一种基于深度学习的内容感知凸壳预测方法。我们采用递归卷积网络（RCN）来隐式分析视频镜头的时空复杂性，从而预测其凸壳。我们提出的 RCN-Hull 模型采用两步迁移学习方案进行训练，既能确保足够的内容多样性以分析场景复杂性，又能捕捉原始源视频的场景统计数据。实验结果表明，与现有方法相比，我们提出的模型能更好地逼近最优凸壳，并能节省大量时间。通过我们的方法，预编码时间平均减少了 53.8%，而预测的凸壳相对于地面实况的平均比特率（BD-rate）为 0.26%，BD-rate 分布的平均绝对偏差为 0.57%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量