Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming With Two-Stage Training

IF 3.2 1区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Broadcasting Pub Date : 2023-12-15 DOI:10.1109/TBC.2023.3334137
Yuhong Xie;Yuan Zhang;Tao Lin
{"title":"Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming With Two-Stage Training","authors":"Yuhong Xie;Yuan Zhang;Tao Lin","doi":"10.1109/TBC.2023.3334137","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has demonstrated remarkable potential within the domain of video adaptive bitrate (ABR) optimization. However, training a well-performing DRL agent in the two-tier 360° video streaming system is non-trivial. The conventional DRL training approach fails to enable the model to start learning from simpler environments and then progressively explore more challenging ones, leading to suboptimal asymptotic performance and poor long-tail performance. In this paper, we propose a novel approach called DCRL360, which seamlessly integrates automatic curriculum learning (ACL) with DRL techniques to enable adaptive decision-making for 360° video bitrate selection and chunk scheduling. To tackle the training issue, we introduce a structured two-stage training framework. The first stage focuses on the selection of tasks conducive to learning, guided by a newly introduced training metric called Pscore, to enhance asymptotic performance. The newly introduced metric takes into consideration multiple facets, including performance improvement potential, the risk of being forgotten, and the uncertainty of a decision, to encourage the agent to train in rewarding environments. The second stage utilizes existing rule-based techniques to identify challenging tasks for fine-tuning the model, thereby alleviating the long-tail effect. Our experimental results demonstrate that DCRL360 outperforms state-of-the-art algorithms under various network conditions - including 5G/LTE/Broadband - with a remarkable improvement of 6.51-20.86% in quality of experience (QoE), as well as a reduction in bandwidth wastage by 10.60-31.50%.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"441-452"},"PeriodicalIF":3.2000,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10361536/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Deep reinforcement learning (DRL) has demonstrated remarkable potential within the domain of video adaptive bitrate (ABR) optimization. However, training a well-performing DRL agent in the two-tier 360° video streaming system is non-trivial. The conventional DRL training approach fails to enable the model to start learning from simpler environments and then progressively explore more challenging ones, leading to suboptimal asymptotic performance and poor long-tail performance. In this paper, we propose a novel approach called DCRL360, which seamlessly integrates automatic curriculum learning (ACL) with DRL techniques to enable adaptive decision-making for 360° video bitrate selection and chunk scheduling. To tackle the training issue, we introduce a structured two-stage training framework. The first stage focuses on the selection of tasks conducive to learning, guided by a newly introduced training metric called Pscore, to enhance asymptotic performance. The newly introduced metric takes into consideration multiple facets, including performance improvement potential, the risk of being forgotten, and the uncertainty of a decision, to encourage the agent to train in rewarding environments. The second stage utilizes existing rule-based techniques to identify challenging tasks for fine-tuning the model, thereby alleviating the long-tail effect. Our experimental results demonstrate that DCRL360 outperforms state-of-the-art algorithms under various network conditions - including 5G/LTE/Broadband - with a remarkable improvement of 6.51-20.86% in quality of experience (QoE), as well as a reduction in bandwidth wastage by 10.60-31.50%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用深度课程强化学习进行自适应 360° 视频流两阶段训练
深度强化学习(DRL)在视频自适应比特率(ABR)优化领域展现出了巨大的潜力。然而,在双层 360° 视频流系统中训练一个性能良好的 DRL 代理并非易事。传统的 DRL 训练方法无法使模型从较简单的环境开始学习,然后逐步探索更具挑战性的环境,从而导致渐近性能不理想和长尾性能不佳。在本文中,我们提出了一种名为 DCRL360 的新方法,该方法将自动课程学习 (ACL) 与 DRL 技术无缝集成,实现了 360° 视频比特率选择和块调度的自适应决策。为了解决训练问题,我们引入了一个结构化的两阶段训练框架。第一阶段的重点是选择有利于学习的任务,以新引入的名为 Pscore 的训练指标为指导,提高渐近性能。新引入的指标考虑了多个方面,包括提高性能的潜力、被遗忘的风险和决策的不确定性,以鼓励代理在有回报的环境中进行训练。第二阶段利用现有的基于规则的技术来确定具有挑战性的任务,以便对模型进行微调,从而缓解长尾效应。我们的实验结果表明,在各种网络条件下(包括 5G/LTE/宽带),DCRL360 的性能优于最先进的算法,体验质量(QoE)显著提高了 6.51-20.86%,带宽浪费减少了 10.60-31.50%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Broadcasting
IEEE Transactions on Broadcasting 工程技术-电信学
CiteScore
9.40
自引率
31.10%
发文量
79
审稿时长
6-12 weeks
期刊介绍: The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”
期刊最新文献
Front Cover Table of Contents Table of Contents IEEE Transactions on Broadcasting Information for Authors IEEE Transactions on Broadcasting Information for Authors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1