{"title":"利用深度课程强化学习进行自适应 360° 视频流两阶段训练","authors":"Yuhong Xie;Yuan Zhang;Tao Lin","doi":"10.1109/TBC.2023.3334137","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has demonstrated remarkable potential within the domain of video adaptive bitrate (ABR) optimization. However, training a well-performing DRL agent in the two-tier 360° video streaming system is non-trivial. The conventional DRL training approach fails to enable the model to start learning from simpler environments and then progressively explore more challenging ones, leading to suboptimal asymptotic performance and poor long-tail performance. In this paper, we propose a novel approach called DCRL360, which seamlessly integrates automatic curriculum learning (ACL) with DRL techniques to enable adaptive decision-making for 360° video bitrate selection and chunk scheduling. To tackle the training issue, we introduce a structured two-stage training framework. The first stage focuses on the selection of tasks conducive to learning, guided by a newly introduced training metric called Pscore, to enhance asymptotic performance. The newly introduced metric takes into consideration multiple facets, including performance improvement potential, the risk of being forgotten, and the uncertainty of a decision, to encourage the agent to train in rewarding environments. The second stage utilizes existing rule-based techniques to identify challenging tasks for fine-tuning the model, thereby alleviating the long-tail effect. Our experimental results demonstrate that DCRL360 outperforms state-of-the-art algorithms under various network conditions - including 5G/LTE/Broadband - with a remarkable improvement of 6.51-20.86% in quality of experience (QoE), as well as a reduction in bandwidth wastage by 10.60-31.50%.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"441-452"},"PeriodicalIF":3.2000,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming With Two-Stage Training\",\"authors\":\"Yuhong Xie;Yuan Zhang;Tao Lin\",\"doi\":\"10.1109/TBC.2023.3334137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (DRL) has demonstrated remarkable potential within the domain of video adaptive bitrate (ABR) optimization. However, training a well-performing DRL agent in the two-tier 360° video streaming system is non-trivial. The conventional DRL training approach fails to enable the model to start learning from simpler environments and then progressively explore more challenging ones, leading to suboptimal asymptotic performance and poor long-tail performance. In this paper, we propose a novel approach called DCRL360, which seamlessly integrates automatic curriculum learning (ACL) with DRL techniques to enable adaptive decision-making for 360° video bitrate selection and chunk scheduling. To tackle the training issue, we introduce a structured two-stage training framework. The first stage focuses on the selection of tasks conducive to learning, guided by a newly introduced training metric called Pscore, to enhance asymptotic performance. The newly introduced metric takes into consideration multiple facets, including performance improvement potential, the risk of being forgotten, and the uncertainty of a decision, to encourage the agent to train in rewarding environments. The second stage utilizes existing rule-based techniques to identify challenging tasks for fine-tuning the model, thereby alleviating the long-tail effect. Our experimental results demonstrate that DCRL360 outperforms state-of-the-art algorithms under various network conditions - including 5G/LTE/Broadband - with a remarkable improvement of 6.51-20.86% in quality of experience (QoE), as well as a reduction in bandwidth wastage by 10.60-31.50%.\",\"PeriodicalId\":13159,\"journal\":{\"name\":\"IEEE Transactions on Broadcasting\",\"volume\":\"70 2\",\"pages\":\"441-452\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2023-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Broadcasting\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10361536/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10361536/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming With Two-Stage Training
Deep reinforcement learning (DRL) has demonstrated remarkable potential within the domain of video adaptive bitrate (ABR) optimization. However, training a well-performing DRL agent in the two-tier 360° video streaming system is non-trivial. The conventional DRL training approach fails to enable the model to start learning from simpler environments and then progressively explore more challenging ones, leading to suboptimal asymptotic performance and poor long-tail performance. In this paper, we propose a novel approach called DCRL360, which seamlessly integrates automatic curriculum learning (ACL) with DRL techniques to enable adaptive decision-making for 360° video bitrate selection and chunk scheduling. To tackle the training issue, we introduce a structured two-stage training framework. The first stage focuses on the selection of tasks conducive to learning, guided by a newly introduced training metric called Pscore, to enhance asymptotic performance. The newly introduced metric takes into consideration multiple facets, including performance improvement potential, the risk of being forgotten, and the uncertainty of a decision, to encourage the agent to train in rewarding environments. The second stage utilizes existing rule-based techniques to identify challenging tasks for fine-tuning the model, thereby alleviating the long-tail effect. Our experimental results demonstrate that DCRL360 outperforms state-of-the-art algorithms under various network conditions - including 5G/LTE/Broadband - with a remarkable improvement of 6.51-20.86% in quality of experience (QoE), as well as a reduction in bandwidth wastage by 10.60-31.50%.
期刊介绍:
The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”