P. Freitas, Sana Alamgeer, W. Y. L. Akamine, Mylène C. Q. Farias
Due to the rapid development of multimedia technologies, over the last decades image quality assessment (IQA) has become an important topic. As a consequence, a great research effort has been made to develop computational models that estimate image quality. Among the possible IQA approaches, blind IQA (BIQA) is of fundamental interest as it can be used in most multimedia applications. BIQA techniques measure the perceptual quality of an image without using the reference (or pristine) image. This paper proposes a new BIQA method that uses a combination of texture features and saliency maps of an image. Texture features are extracted from the images using the local binary pattern (LBP) operator at multiple scales. To extract the salient of an image, i.e. the areas of the image that are the main attractors of the viewers' attention, we use computational visual attention models that output saliency maps. These saliency maps can be used as weighting functions for the LBP maps at multiple scales. We propose an operator that produces a combination of multiscale LBP maps and saliency maps, which is called the multiscale salient local binary pattern (MSLBP) operator. To define which is the best model to be used in the proposed operator, we investigate the performance of several saliency models. Experimental results demonstrate that the proposed method is able to estimate the quality of impaired images with a wide variety of distortions. The proposed metric has a better prediction accuracy than state-of-the-art IQA methods.
{"title":"Blind image quality assessment based on multiscale salient local binary patterns","authors":"P. Freitas, Sana Alamgeer, W. Y. L. Akamine, Mylène C. Q. Farias","doi":"10.1145/3204949.3204960","DOIUrl":"https://doi.org/10.1145/3204949.3204960","url":null,"abstract":"Due to the rapid development of multimedia technologies, over the last decades image quality assessment (IQA) has become an important topic. As a consequence, a great research effort has been made to develop computational models that estimate image quality. Among the possible IQA approaches, blind IQA (BIQA) is of fundamental interest as it can be used in most multimedia applications. BIQA techniques measure the perceptual quality of an image without using the reference (or pristine) image. This paper proposes a new BIQA method that uses a combination of texture features and saliency maps of an image. Texture features are extracted from the images using the local binary pattern (LBP) operator at multiple scales. To extract the salient of an image, i.e. the areas of the image that are the main attractors of the viewers' attention, we use computational visual attention models that output saliency maps. These saliency maps can be used as weighting functions for the LBP maps at multiple scales. We propose an operator that produces a combination of multiscale LBP maps and saliency maps, which is called the multiscale salient local binary pattern (MSLBP) operator. To define which is the best model to be used in the proposed operator, we investigate the performance of several saliency models. Experimental results demonstrate that the proposed method is able to estimate the quality of impaired images with a wide variety of distortions. The proposed metric has a better prediction accuracy than state-of-the-art IQA methods.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115412906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing popularity of mobile and wearable devices with built-in cameras and social media sites are now threatening people's visual privacy. Motivated by recent user studies that people's visual privacy concerns are closely related to context, we propose Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences. We define four context elements in a photo, including location, scene, others' presences, and hand gestures. Users can specify their context-dependent privacy preferences based on the above four elements. Cardea will offer fine-grained visual privacy protection service to those who request protection using their identifiable information. We present how Cardea can be integrated into: a) privacy-protecting camera apps, where captured photos will be processed before being saved locally; and b) online social media and networking sites, where uploaded photos will first be examined to protect individuals' visual privacy, before they become visible to others. Our evaluation results on an implemented prototype demonstrate that Cardea is effective with 86% overall accuracy and is welcomed by users, showing promising future of context-aware visual privacy protection for photo taking and sharing.
{"title":"Cardea","authors":"Jiayu Shu, Rui Zheng, P. Hui","doi":"10.1145/3204949.3204973","DOIUrl":"https://doi.org/10.1145/3204949.3204973","url":null,"abstract":"The growing popularity of mobile and wearable devices with built-in cameras and social media sites are now threatening people's visual privacy. Motivated by recent user studies that people's visual privacy concerns are closely related to context, we propose Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences. We define four context elements in a photo, including location, scene, others' presences, and hand gestures. Users can specify their context-dependent privacy preferences based on the above four elements. Cardea will offer fine-grained visual privacy protection service to those who request protection using their identifiable information. We present how Cardea can be integrated into: a) privacy-protecting camera apps, where captured photos will be processed before being saved locally; and b) online social media and networking sites, where uploaded photos will first be examined to protect individuals' visual privacy, before they become visible to others. Our evaluation results on an implemented prototype demonstrate that Cardea is effective with 86% overall accuracy and is welcomed by users, showing promising future of context-aware visual privacy protection for photo taking and sharing.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":" 39","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Until recently, there was no emotional speech dataset available in Canadian French. This was a limiting factor for research activities not only in Canada, but also elsewhere. This paper introduces the newly released Canadian French Emotional (CaFE) speech dataset and gives details about its design and content. This dataset contains six different sentences, pronounced by six male and six female actors, in six basic emotions plus one neutral emotion. The six basic emotions are acted in two different intensities. The audio is digitally recorded at high-resolution (192 kHz sampling rate, 24 bits per sample). This new dataset is freely available under a Creative Commons license (CC BY-NC-SA 4.0).
{"title":"A canadian french emotional speech dataset","authors":"P. Gournay, Olivier Lahaie, R. Lefebvre","doi":"10.1145/3204949.3208121","DOIUrl":"https://doi.org/10.1145/3204949.3208121","url":null,"abstract":"Until recently, there was no emotional speech dataset available in Canadian French. This was a limiting factor for research activities not only in Canada, but also elsewhere. This paper introduces the newly released Canadian French Emotional (CaFE) speech dataset and gives details about its design and content. This dataset contains six different sentences, pronounced by six male and six female actors, in six basic emotions plus one neutral emotion. The six basic emotions are acted in two different intensities. The audio is digitally recorded at high-resolution (192 kHz sampling rate, 24 bits per sample). This new dataset is freely available under a Creative Commons license (CC BY-NC-SA 4.0).","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128521343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Fabián Romero Rondón, L. Sassatelli, F. Precioso, R. Aparicio-Pardo
While Virtual Reality (VR) represents a revolution in the user experience, current VR systems are flawed on different aspects. The difficulty to focus naturally in current headsets incurs visual discomfort and cognitive overload, while high-end headsets require tethered powerful hardware for scene synthesis. One of the major solutions envisioned to address these problems is foveated rendering. We consider the problem of streaming stored 360° videos to a VR headset equipped with eye-tracking and foveated rendering capabilities. Our end research goal is to make high-performing foveated streaming systems allowing the playback buffer to build up to absorb the network variations, which is permitted in none of the current proposals. We present our foveated streaming prototype based on the FOVE, one of the first commercially available headsets with an integrated eye-tracker. We build on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters. The low- and high-resolution frames are then merged at the client to approach the natural focusing process.
{"title":"Foveated streaming of virtual reality videos","authors":"Miguel Fabián Romero Rondón, L. Sassatelli, F. Precioso, R. Aparicio-Pardo","doi":"10.1145/3204949.3208114","DOIUrl":"https://doi.org/10.1145/3204949.3208114","url":null,"abstract":"While Virtual Reality (VR) represents a revolution in the user experience, current VR systems are flawed on different aspects. The difficulty to focus naturally in current headsets incurs visual discomfort and cognitive overload, while high-end headsets require tethered powerful hardware for scene synthesis. One of the major solutions envisioned to address these problems is foveated rendering. We consider the problem of streaming stored 360° videos to a VR headset equipped with eye-tracking and foveated rendering capabilities. Our end research goal is to make high-performing foveated streaming systems allowing the playback buffer to build up to absorb the network variations, which is permitted in none of the current proposals. We present our foveated streaming prototype based on the FOVE, one of the first commercially available headsets with an integrated eye-tracker. We build on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters. The low- and high-resolution frames are then merged at the client to approach the natural focusing process.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121948433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cise Midoglu, Mohamed Moulay, V. Mancuso, Özgü Alay, Andra Lutu, C. Griwodz
Video streaming is a very popular service among the end-users of Mobile Broadband (MBB) networks. DASH and WebRTC are two key technologies in the delivery of mobile video. In this work, we empirically assess the performance of video streaming with DASH and WebRTC in operational MBB networks, by using a large number of programmable network probes spread over several countries in the context of the MONROE project. We collect a large dataset from more than 300 video streaming experiments. Our dataset consists of network traces, performance indicators captured during the streaming sessions, and experiment metadata. The dataset captures the wide variability in video streaming performance, and unveils how mobile broadband is still not offering consistent quality guarantees across different countries and networks, especially for users on the move. We open source our complete software toolset and provide the video dataset as open data.
{"title":"Open video datasets over operational mobile networks with MONROE","authors":"Cise Midoglu, Mohamed Moulay, V. Mancuso, Özgü Alay, Andra Lutu, C. Griwodz","doi":"10.1145/3204949.3208138","DOIUrl":"https://doi.org/10.1145/3204949.3208138","url":null,"abstract":"Video streaming is a very popular service among the end-users of Mobile Broadband (MBB) networks. DASH and WebRTC are two key technologies in the delivery of mobile video. In this work, we empirically assess the performance of video streaming with DASH and WebRTC in operational MBB networks, by using a large number of programmable network probes spread over several countries in the context of the MONROE project. We collect a large dataset from more than 300 video streaming experiments. Our dataset consists of network traces, performance indicators captured during the streaming sessions, and experiment metadata. The dataset captures the wide variability in video streaming performance, and unveils how mobile broadband is still not offering consistent quality guarantees across different countries and networks, especially for users on the move. We open source our complete software toolset and provide the video dataset as open data.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130807617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sajjad Taheri, A. Veidenbaum, A. Nicolau, Ningxin Hu, M. Haghighat
The Web is the world's most ubiquitous compute platform and the foundation of digital economy. Ever since its birth in early 1990's, web capabilities have been increasing in both quantity and quality. However, in spite of all such progress, computer vision is not mainstream on the web yet. The reasons are historical and include lack of sufficient performance of JavaScript, lack of camera support in the standard web APIs, and lack of comprehensive computer-vision libraries. These problems are about to get solved, resulting in the potential of an immersive and perceptual web with transformational effects including in online shopping, education, and entertainment among others. This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform. OpenCV is the most popular computer-vision library with a comprehensive set of vision functions and a large developer community. OpenCV is implemented in C++ and up until now, it was not available in the web browsers without the help of unpopular native plugins. This work leverage OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge. It is provided in a format that is easy for JavaScript engines to highly optimize and has an API that is easy for the web programmers to adopt and develop applications. In addition, OpenCV parallel implementations that target SIMD units and multiprocessors can be ported to equivalent web primitives, providing better performance for real-time and interactive use cases.
{"title":"OpenCV.js: computer vision processing for the open web platform","authors":"Sajjad Taheri, A. Veidenbaum, A. Nicolau, Ningxin Hu, M. Haghighat","doi":"10.1145/3204949.3208126","DOIUrl":"https://doi.org/10.1145/3204949.3208126","url":null,"abstract":"The Web is the world's most ubiquitous compute platform and the foundation of digital economy. Ever since its birth in early 1990's, web capabilities have been increasing in both quantity and quality. However, in spite of all such progress, computer vision is not mainstream on the web yet. The reasons are historical and include lack of sufficient performance of JavaScript, lack of camera support in the standard web APIs, and lack of comprehensive computer-vision libraries. These problems are about to get solved, resulting in the potential of an immersive and perceptual web with transformational effects including in online shopping, education, and entertainment among others. This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform. OpenCV is the most popular computer-vision library with a comprehensive set of vision functions and a large developer community. OpenCV is implemented in C++ and up until now, it was not available in the web browsers without the help of unpopular native plugins. This work leverage OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge. It is provided in a format that is easy for JavaScript engines to highly optimize and has an API that is easy for the web programmers to adopt and develop applications. In addition, OpenCV parallel implementations that target SIMD units and multiprocessors can be ported to equivalent web primitives, providing better performance for real-time and interactive use cases.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114641422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xavier Corbillon, F. D. Simone, G. Simon, P. Frossard
Full immersion inside a Virtual Reality (VR) scene requires six Degrees of Freedom (6DoF) applications where the user is allowed to perform translational and rotational movements within the virtual space. The implementation of 6DoF applications is however still an open question. In this paper we study a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras. The user can only switch positions to predefined viewpoints (VPs). We focus on the new challenges that are introduced by adaptive MVP 360-degree video streaming. We introduce several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching. We model three video-segment download strategies for an adaptive streaming client into Mixed Integer Linear Programming (MILP) problems: an omniscient download scheduler; one where the client proactively downloads all VPs to guarantee fast VP switch; one where the client reacts to the user's navigation pattern. We recorded a one MVP 360-degree video with three VPs, implemented a mobile MVP 360-degree video player, and recorded the viewing patterns of multiple users navigating the content. We solved the adaptive streaming optimization problems on this video considering the collected navigation traces. The results emphasize the gains obtained by using tiles in terms of objective quality of the delivered content. They also emphasize the importance of performing further study on VP switching prediction to reduce the bandwidth consumption and to measure the impact of VP switching delay on the subjective Quality of Experience (QoE).
{"title":"Dynamic adaptive streaming for multi-viewpoint omnidirectional videos","authors":"Xavier Corbillon, F. D. Simone, G. Simon, P. Frossard","doi":"10.1145/3204949.3204968","DOIUrl":"https://doi.org/10.1145/3204949.3204968","url":null,"abstract":"Full immersion inside a Virtual Reality (VR) scene requires six Degrees of Freedom (6DoF) applications where the user is allowed to perform translational and rotational movements within the virtual space. The implementation of 6DoF applications is however still an open question. In this paper we study a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras. The user can only switch positions to predefined viewpoints (VPs). We focus on the new challenges that are introduced by adaptive MVP 360-degree video streaming. We introduce several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching. We model three video-segment download strategies for an adaptive streaming client into Mixed Integer Linear Programming (MILP) problems: an omniscient download scheduler; one where the client proactively downloads all VPs to guarantee fast VP switch; one where the client reacts to the user's navigation pattern. We recorded a one MVP 360-degree video with three VPs, implemented a mobile MVP 360-degree video player, and recorded the viewing patterns of multiple users navigating the content. We solved the adaptive streaming optimization problems on this video considering the collected navigation traces. The results emphasize the gains obtained by using tiles in terms of objective quality of the delivered content. They also emphasize the importance of performing further study on VP switching prediction to reduce the bandwidth consumption and to measure the impact of VP switching delay on the subjective Quality of Experience (QoE).","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116868945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liyang Sun, Fanyi Duanmu, Yong Liu, Yao Wang, Y. Ye, Hang Shi, David H. Dai
360° video streaming is a key component of the emerging Virtual Reality (VR) and Augmented Reality (AR) applications. In 360° video streaming, a user may freely navigate through the captured 360° video scene by changing her desired Field-of-View. High-throughput and low-delay data transfers enabled by 5G wireless networks can potentially facilitate untethered 360° video streaming experience. Meanwhile, the high volatility of 5G wireless links present unprecedented challenges for smooth 360° video streaming. In this paper, novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction. We systematically investigate various design trade-offs on streaming quality and robustness. Through simulations driven by real 5G network bandwidth traces and user viewing direction traces, we demonstrate that the proposed 360° video streaming solutions can achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.
{"title":"Multi-path multi-tier 360-degree video streaming in 5G networks","authors":"Liyang Sun, Fanyi Duanmu, Yong Liu, Yao Wang, Y. Ye, Hang Shi, David H. Dai","doi":"10.1145/3204949.3204978","DOIUrl":"https://doi.org/10.1145/3204949.3204978","url":null,"abstract":"360° video streaming is a key component of the emerging Virtual Reality (VR) and Augmented Reality (AR) applications. In 360° video streaming, a user may freely navigate through the captured 360° video scene by changing her desired Field-of-View. High-throughput and low-delay data transfers enabled by 5G wireless networks can potentially facilitate untethered 360° video streaming experience. Meanwhile, the high volatility of 5G wireless links present unprecedented challenges for smooth 360° video streaming. In this paper, novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction. We systematically investigate various design trade-offs on streaming quality and robustness. Through simulations driven by real 5G network bandwidth traces and user viewing direction traces, we demonstrate that the proposed 360° video streaming solutions can achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128203045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Ribezzo, Giuseppe Samela, Vittorio Palmisano, L. D. Cicco, S. Mascolo
Virtual Reality/Augmented Reality applications require streaming 360° videos to implement new services in a diverse set of fields such as entertainment, art, e-health, e-learning, and smart factories. Providing a high Quality of Experience when streaming 360° videos is particularly challenging due to the very high required network bandwidth. In this paper, we showcase a proof-of-concept implementation of a complete DASH-compliant delivery system for 360° videos that: 1) allows reducing the required bitrate, 2) is independent of the employed encoder, 3) leverages technologies that are already available in the vast majority of mobile platforms and devices. The demo platform allows the user to directly experiment with various parameters, such as the duration of segments, the compression scheme, and the adaptive streaming algorithm parameters.
{"title":"A DASH video streaming system for immersive contents","authors":"Giuseppe Ribezzo, Giuseppe Samela, Vittorio Palmisano, L. D. Cicco, S. Mascolo","doi":"10.1145/3204949.3208107","DOIUrl":"https://doi.org/10.1145/3204949.3208107","url":null,"abstract":"Virtual Reality/Augmented Reality applications require streaming 360° videos to implement new services in a diverse set of fields such as entertainment, art, e-health, e-learning, and smart factories. Providing a high Quality of Experience when streaming 360° videos is particularly challenging due to the very high required network bandwidth. In this paper, we showcase a proof-of-concept implementation of a complete DASH-compliant delivery system for 360° videos that: 1) allows reducing the required bitrate, 2) is independent of the employed encoder, 3) leverages technologies that are already available in the vast majority of mobile platforms and devices. The demo platform allows the user to directly experiment with various parameters, such as the duration of segments, the compression scheme, and the adaptive streaming algorithm parameters.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129299518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian He, M. Qureshi, L. Qiu, Jin Li, Feng Li, Lei Han
Video rate adaptation has large impact on quality of experience (QoE). However, existing video rate adaptation is rather limited due to a small number of rate choices, which results in (i) under-selection, (ii) rate fluctuation, and (iii) frequent rebuffering. Moreover, selecting a single video rate for a 360° video can be even more limiting, since not all portions of a video frame are equally important. To address these limitations, we identify new dimensions to adapt user QoE - dropping video frames, slowing down video play rate, and adapting different portions in 360° videos. These new dimensions along with rate adaptation give us a more fine-grained adaptation and significantly improve user QoE. We further develop a simple yet effective learning strategy to automatically adapt the buffer reservation to avoid performance degradation beyond optimization horizon. We implement our approach Favor in VLC, a well known open source media player, and demonstrate that Favor on average out-performs Model Predictive Control (MPC), rate-based, and buffer-based adaptation for regular videos by 24%, 36%, and 41%, respectively, and 2X for 360° videos.
{"title":"Favor: fine-grained video rate adaptation","authors":"Jian He, M. Qureshi, L. Qiu, Jin Li, Feng Li, Lei Han","doi":"10.1145/3204949.3204957","DOIUrl":"https://doi.org/10.1145/3204949.3204957","url":null,"abstract":"Video rate adaptation has large impact on quality of experience (QoE). However, existing video rate adaptation is rather limited due to a small number of rate choices, which results in (i) under-selection, (ii) rate fluctuation, and (iii) frequent rebuffering. Moreover, selecting a single video rate for a 360° video can be even more limiting, since not all portions of a video frame are equally important. To address these limitations, we identify new dimensions to adapt user QoE - dropping video frames, slowing down video play rate, and adapting different portions in 360° videos. These new dimensions along with rate adaptation give us a more fine-grained adaptation and significantly improve user QoE. We further develop a simple yet effective learning strategy to automatically adapt the buffer reservation to avoid performance degradation beyond optimization horizon. We implement our approach Favor in VLC, a well known open source media player, and demonstrate that Favor on average out-performs Model Predictive Control (MPC), rate-based, and buffer-based adaptation for regular videos by 24%, 36%, and 41%, respectively, and 2X for 360° videos.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}