Gangqiang Zhou, Run Wu, Miao Hu, Yipeng Zhou, Tom Z. J. Fu, Di Wu
Variable Bitrate (VBR) video encoding can provide much high quality-to-bits ratio compared to the widely adopted Constant Bitrate (CBR) encoding, and thus receives significant attentions by content providers in recent years. However, it is challenging to design efficient adaptive bitrate algorithms for VBR-encoded videos due to the sharply fluctuating chunk size and the resulting bitrate burstiness. In this paper, we propose a neural adaptive streaming framework called Vibra for VBR-encoded videos, which can well accommodate the high fluctuation of video chunk sizes and improve the quality-of-experience (QoE) of end users significantly. Our framework takes the characteristics of VBR-encoded videos into account, and adopts the technique of deep reinforcement learning to train a model for bitrate adaptation. We also conduct extensive trace-driven experiments, and the results show that Vibra outperforms the state-of-the-art ABR algorithms with an improvement of 8.17% -- 29.21% in terms of the average QoE.
{"title":"Vibra","authors":"Gangqiang Zhou, Run Wu, Miao Hu, Yipeng Zhou, Tom Z. J. Fu, Di Wu","doi":"10.1145/3458306.3460993","DOIUrl":"https://doi.org/10.1145/3458306.3460993","url":null,"abstract":"Variable Bitrate (VBR) video encoding can provide much high quality-to-bits ratio compared to the widely adopted Constant Bitrate (CBR) encoding, and thus receives significant attentions by content providers in recent years. However, it is challenging to design efficient adaptive bitrate algorithms for VBR-encoded videos due to the sharply fluctuating chunk size and the resulting bitrate burstiness. In this paper, we propose a neural adaptive streaming framework called Vibra for VBR-encoded videos, which can well accommodate the high fluctuation of video chunk sizes and improve the quality-of-experience (QoE) of end users significantly. Our framework takes the characteristics of VBR-encoded videos into account, and adopts the technique of deep reinforcement learning to train a model for bitrate adaptation. We also conduct extensive trace-driven experiments, and the results show that Vibra outperforms the state-of-the-art ABR algorithms with an improvement of 8.17% -- 29.21% in terms of the average QoE.","PeriodicalId":429348,"journal":{"name":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"25 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114389823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenxiao Luo, Zelong Wang, Jinyu Chen, Miao Hu, Yipeng Zhou, Tom Z. J. Fu, Di Wu
The prevalence of personal devices motivates the rapid development of crowdsourced livecast in recent years. However, there exists huge diversity of upstream bandwidth among amateur broadcasters. Moreover, the highest video quality that can be streamed is limited by the hardware configuration of broadcaster devices (e.g., 540p for low-end mobile devices). The above factors pose significant challenges to the ingestion of high-resolution live video streams, and result in poor quality-of-experience (QoE) for viewers. In this paper, we propose a novel live video ingest approach called CrowdSR for crowdsourced livecast. CrowdSR can transform a low-resolution video stream uploaded by weak devices into a high-resolution video stream via super-resolution, and then deliver the stream to viewers. CrowdSR can exploit crowdsourced high-resolution video patches from similar broadcasters to speedup model training. Different from previous work, our approach does not require any modification at the client side, and thus is more practical and easy to implement. Finally, we implement and evaluate CrowdSR by conducting a series of real-world experiments. The results show that CrowdSR significantly outperforms the baseline approaches by 0.42-1.09 dB in terms of PSNR and 0.006-0.014 in terms of SSIM.
{"title":"CrowdSR","authors":"Zhenxiao Luo, Zelong Wang, Jinyu Chen, Miao Hu, Yipeng Zhou, Tom Z. J. Fu, Di Wu","doi":"10.1145/3458306.3462170","DOIUrl":"https://doi.org/10.1145/3458306.3462170","url":null,"abstract":"The prevalence of personal devices motivates the rapid development of crowdsourced livecast in recent years. However, there exists huge diversity of upstream bandwidth among amateur broadcasters. Moreover, the highest video quality that can be streamed is limited by the hardware configuration of broadcaster devices (e.g., 540p for low-end mobile devices). The above factors pose significant challenges to the ingestion of high-resolution live video streams, and result in poor quality-of-experience (QoE) for viewers. In this paper, we propose a novel live video ingest approach called CrowdSR for crowdsourced livecast. CrowdSR can transform a low-resolution video stream uploaded by weak devices into a high-resolution video stream via super-resolution, and then deliver the stream to viewers. CrowdSR can exploit crowdsourced high-resolution video patches from similar broadcasters to speedup model training. Different from previous work, our approach does not require any modification at the client side, and thus is more practical and easy to implement. Finally, we implement and evaluate CrowdSR by conducting a series of real-world experiments. The results show that CrowdSR significantly outperforms the baseline approaches by 0.42-1.09 dB in terms of PSNR and 0.006-0.014 in terms of SSIM.","PeriodicalId":429348,"journal":{"name":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121749688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conventional tile-based 360° video streaming methods, including deep reinforcement learning (DRL) based, ignore the interactive nature of 360° video streaming and download tiles following fixed sequential orders, thus failing to respond to the user's head motion changes. We show that these existing solutions suffer from either the prefetch accuracy or the playback stability drop. Furthermore, these methods are constrained to serve only one fixed streaming preference, causing extra training overhead and the lack of generalization on unseen preferences. In this paper, we propose a dual-queue streaming framework, with accuracy and stability purposes respectively, to enable the DRL agent to determine and change the tile download order without incurring overhead. We also design a preference-aware DRL algorithm to incentivize the agent to learn preference-dependent ABR decisions efficiently. Compared with state-of-the-art DRL baselines, our method not only significantly improves the streaming quality, e.g., increasing the average streaming quality by 13.6% on a public dataset, but also demonstrates better performance and generalization under dynamic preferences, e.g., an average quality improvement of 19.9% on unseen preferences.
{"title":"PAAS","authors":"Chenglei Wu, Zhi Wang, Lifeng Sun","doi":"10.1145/3458306.3460995","DOIUrl":"https://doi.org/10.1145/3458306.3460995","url":null,"abstract":"Conventional tile-based 360° video streaming methods, including deep reinforcement learning (DRL) based, ignore the interactive nature of 360° video streaming and download tiles following fixed sequential orders, thus failing to respond to the user's head motion changes. We show that these existing solutions suffer from either the prefetch accuracy or the playback stability drop. Furthermore, these methods are constrained to serve only one fixed streaming preference, causing extra training overhead and the lack of generalization on unseen preferences. In this paper, we propose a dual-queue streaming framework, with accuracy and stability purposes respectively, to enable the DRL agent to determine and change the tile download order without incurring overhead. We also design a preference-aware DRL algorithm to incentivize the agent to learn preference-dependent ABR decisions efficiently. Compared with state-of-the-art DRL baselines, our method not only significantly improves the streaming quality, e.g., increasing the average streaming quality by 13.6% on a public dataset, but also demonstrates better performance and generalization under dynamic preferences, e.g., an average quality improvement of 19.9% on unseen preferences.","PeriodicalId":429348,"journal":{"name":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117206315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Kattadige, Aravindh Raman, Kanchana Thilakarathna, Andra Lutu, Diego Perino
Streaming 360° video demands high bandwidth and low latency, and poses significant challenges to Internet Service Providers (ISPs) and Mobile Network Operators (MNOs). The identification of 360° video traffic can therefore benefits fixed and mobile carriers to optimize their network and provide better Quality of Experience (QoE) to the user. However, end-to-end encryption of network traffic has obstructed identifying those 360° videos from regular videos. As a solution this paper presents 360NorVic, a near-realtime and offline Machine Learning (ML) classification engine to distinguish 360° videos from regular videos when streamed from mobile devices. We collect packet and flow level data for over 800 video traces from YouTube & Facebook accounting for 200 unique videos under varying streaming conditions. Our results show that for near-realtime and offline classification at packet level, average accuracy exceeds 95%, and that for flow level, 360NorVic achieves more than 92% average accuracy. Finally, we pilot our solution in the commercial network of a large MNO showing the feasibility and effectiveness of 360NorVic in production settings.
{"title":"360NorVic","authors":"C. Kattadige, Aravindh Raman, Kanchana Thilakarathna, Andra Lutu, Diego Perino","doi":"10.1145/3458306.3460998","DOIUrl":"https://doi.org/10.1145/3458306.3460998","url":null,"abstract":"Streaming 360° video demands high bandwidth and low latency, and poses significant challenges to Internet Service Providers (ISPs) and Mobile Network Operators (MNOs). The identification of 360° video traffic can therefore benefits fixed and mobile carriers to optimize their network and provide better Quality of Experience (QoE) to the user. However, end-to-end encryption of network traffic has obstructed identifying those 360° videos from regular videos. As a solution this paper presents 360NorVic, a near-realtime and offline Machine Learning (ML) classification engine to distinguish 360° videos from regular videos when streamed from mobile devices. We collect packet and flow level data for over 800 video traces from YouTube & Facebook accounting for 200 unique videos under varying streaming conditions. Our results show that for near-realtime and offline classification at packet level, average accuracy exceeds 95%, and that for flow level, 360NorVic achieves more than 92% average accuracy. Finally, we pilot our solution in the commercial network of a large MNO showing the feasibility and effectiveness of 360NorVic in production settings.","PeriodicalId":429348,"journal":{"name":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Dharmasiri, C. Kattadige, V. Zhang, Kanchana Thilakarathna
Unlike conventional videos, 360° videos give freedom to users to turn their heads, watch and interact with the content owing to its immersive spherical environment. Although these movements are arbitrary, similarities can be observed between viewport patterns of different users and different videos. Identifying such patterns can assist both content and network providers to enhance the 360° video streaming process, eventually increasing the end-user Quality of Experience (QoE). But a study on how viewport patterns display similarities across different video content, and their potential applications has not yet been done. In this paper, we present a comprehensive analysis of a dataset of 88 360° videos and propose a novel video categorization algorithm that is based on similarities of viewports. First, we propose a novel viewport clustering algorithm that outperforms the existing algorithms in terms of clustering viewports with similar positioning and speed. Next, we develop a novel and unique dynamic video segment categorization algorithm that shows notable improvement in similarity for viewport distributions within the clusters when compared to that of existing static video categorizations.
{"title":"Viewport-aware dynamic 360° video segment categorization","authors":"A. Dharmasiri, C. Kattadige, V. Zhang, Kanchana Thilakarathna","doi":"10.1145/3458306.3461000","DOIUrl":"https://doi.org/10.1145/3458306.3461000","url":null,"abstract":"Unlike conventional videos, 360° videos give freedom to users to turn their heads, watch and interact with the content owing to its immersive spherical environment. Although these movements are arbitrary, similarities can be observed between viewport patterns of different users and different videos. Identifying such patterns can assist both content and network providers to enhance the 360° video streaming process, eventually increasing the end-user Quality of Experience (QoE). But a study on how viewport patterns display similarities across different video content, and their potential applications has not yet been done. In this paper, we present a comprehensive analysis of a dataset of 88 360° videos and propose a novel video categorization algorithm that is based on similarities of viewports. First, we propose a novel viewport clustering algorithm that outperforms the existing algorithms in terms of clustering viewports with similar positioning and speed. Next, we develop a novel and unique dynamic video segment categorization algorithm that shows notable improvement in similarity for viewport distributions within the clusters when compared to that of existing static video categorizations.","PeriodicalId":429348,"journal":{"name":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127105940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","authors":"","doi":"10.1145/3458306","DOIUrl":"https://doi.org/10.1145/3458306","url":null,"abstract":"","PeriodicalId":429348,"journal":{"name":"Proceedings of the 31st ACM Workshop on Network and Operating Systems Support for Digital Audio and Video","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}