S. Hicks, S. Eskeland, M. Lux, T. de Lange, K. Randel, Mattis Jeppsson, Konstantin Pogorelov, P. Halvorsen, M. Riegler
Automatic detection of diseases is a growing field of interest, and machine learning in form of deep learning neural networks are frequently explored as a potential tool for the medical video analysis. To both improve the "black box"-understanding and assist in the administrative duties of writing an examination report, we release an automated multimedia reporting software dissecting the neural network to learn the intermediate analysis steps, i.e., we are adding a new level of understanding and explainability by looking into the deep learning algorithms decision processes. The presented open-source software can be used for easy retrieval and reuse of data for automatic report generation, comparisons, teaching and research. As an example, we use live colonoscopy as a use case which is the gold standard examination of the large bowel, commonly performed for clinical and screening purposes. The added information has potentially a large value, and reuse of the data for the automatic reporting may potentially save the doctors large amounts of time.
{"title":"Mimir","authors":"S. Hicks, S. Eskeland, M. Lux, T. de Lange, K. Randel, Mattis Jeppsson, Konstantin Pogorelov, P. Halvorsen, M. Riegler","doi":"10.1145/3204949.3208129","DOIUrl":"https://doi.org/10.1145/3204949.3208129","url":null,"abstract":"Automatic detection of diseases is a growing field of interest, and machine learning in form of deep learning neural networks are frequently explored as a potential tool for the medical video analysis. To both improve the \"black box\"-understanding and assist in the administrative duties of writing an examination report, we release an automated multimedia reporting software dissecting the neural network to learn the intermediate analysis steps, i.e., we are adding a new level of understanding and explainability by looking into the deep learning algorithms decision processes. The presented open-source software can be used for easy retrieval and reuse of data for automatic report generation, comparisons, teaching and research. As an example, we use live colonoscopy as a use case which is the gold standard examination of the large bowel, commonly performed for clinical and screening purposes. The added information has potentially a large value, and reuse of the data for the automatic reporting may potentially save the doctors large amounts of time.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123560611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we introduce a client emulator for experimenting with DASH video. dashc is a standalone, compact, easy-to-build and easy-to-use command line software tool. The design and implementation of dashc were motivated by the pressing need to conduct network experiments with large numbers of video clients. The highly scalable dashc has low CPU and memory usage. dashc collects necessary statistics about video delivery performance in a convenient format, facilitating thorough post hoc analysis. The code of dashc is modular and new video adaptation algorithm can easily be added. We compare dashc to a state-of-the art client and demonstrate its efficacy for large-scale experiments using the Mininet virtual network.
{"title":"dashc: a highly scalable client emulator for DASH video","authors":"A. Reviakin, A. Zahran, C. Sreenan","doi":"10.1145/3204949.3208135","DOIUrl":"https://doi.org/10.1145/3204949.3208135","url":null,"abstract":"In this paper we introduce a client emulator for experimenting with DASH video. dashc is a standalone, compact, easy-to-build and easy-to-use command line software tool. The design and implementation of dashc were motivated by the pressing need to conduct network experiments with large numbers of video clients. The highly scalable dashc has low CPU and memory usage. dashc collects necessary statistics about video delivery performance in a convenient format, facilitating thorough post hoc analysis. The code of dashc is modular and new video adaptation algorithm can easily be added. We compare dashc to a state-of-the art client and demonstrate its efficacy for large-scale experiments using the Mininet virtual network.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123566695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gunkel, H. Stokking, Martin Prins, N. V. D. Stap, F. T. Haar, O. Niamut
Virtual Reality (VR) and 360-degree video are set to become part of the future social environment, enriching and enhancing the way we share experiences and collaborate remotely. While Social VR applications are getting more momentum, most services regarding Social VR focus on animated avatars. In this demo, we present our efforts towards Social VR services based on photo-realistic video recordings. In this demo paper, we focus on two parts, the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds. We enhance a green screen (chroma key) like cut-out of the person with depth data, allowing point cloud based rendering in the client. Further, the paper presents a user study with 54 people evaluating a three-people communication use case and a technical analysis to move towards 3D representations of users. This demo consists of two shared virtual environments to communicate and interact with others, i.e. i) a 360-degree virtual space with users being represented as 2D video streams (with the background removed) and ii) a 3D space with users being represented as point clouds (based on color and depth video data).
{"title":"Virtual reality conferencing: multi-user immersive VR experiences on the web","authors":"S. Gunkel, H. Stokking, Martin Prins, N. V. D. Stap, F. T. Haar, O. Niamut","doi":"10.1145/3204949.3208115","DOIUrl":"https://doi.org/10.1145/3204949.3208115","url":null,"abstract":"Virtual Reality (VR) and 360-degree video are set to become part of the future social environment, enriching and enhancing the way we share experiences and collaborate remotely. While Social VR applications are getting more momentum, most services regarding Social VR focus on animated avatars. In this demo, we present our efforts towards Social VR services based on photo-realistic video recordings. In this demo paper, we focus on two parts, the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds. We enhance a green screen (chroma key) like cut-out of the person with depth data, allowing point cloud based rendering in the client. Further, the paper presents a user study with 54 people evaluating a three-people communication use case and a technical analysis to move towards 3D representations of users. This demo consists of two shared virtual environments to communicate and interact with others, i.e. i) a 360-degree virtual space with users being represented as 2D video streams (with the background removed) and ii) a 3D space with users being represented as point clouds (based on color and depth video data).","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123641529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Gómez, Juan A. Núñez, Mario Montagud, S. Fernández
ImmersiaTV is a H2020 European project that targets the creation of novel forms of TV content production, delivery and consumption to enable customizable and immersive multi-screen TV experiences. The goal is not only to provide an efficient support for multi-screen scenarios, but also to achieve a seamless integration between the traditional TV content formats and consumption devices with the emerging omnidirectional ones, thus opening the door to new fascinating scenarios. This paper initially provides an overview of the end-to-end platform that is being developed in the project. Then, the created contents and considered pilot scenarios are briefly described. Finally, the paper provides details about the consumption part of the ImmersiaTV platform to be showcased. In particular, it enables a customizable, interactive and synchronized consumption of traditional and omnidirectional contents from an opera performance, in multiscreen scenarios, composed of main TVs, tablets and Head Mounted Displays (HMDs).
{"title":"ImmersiaTV","authors":"David Gómez, Juan A. Núñez, Mario Montagud, S. Fernández","doi":"10.1145/3204949.3209620","DOIUrl":"https://doi.org/10.1145/3204949.3209620","url":null,"abstract":"ImmersiaTV is a H2020 European project that targets the creation of novel forms of TV content production, delivery and consumption to enable customizable and immersive multi-screen TV experiences. The goal is not only to provide an efficient support for multi-screen scenarios, but also to achieve a seamless integration between the traditional TV content formats and consumption devices with the emerging omnidirectional ones, thus opening the door to new fascinating scenarios. This paper initially provides an overview of the end-to-end platform that is being developed in the project. Then, the created contents and considered pilot scenarios are briefly described. Finally, the paper provides details about the consumption part of the ImmersiaTV platform to be showcased. In particular, it enables a customizable, interactive and synchronized consumption of traditional and omnidirectional contents from an opera performance, in multiscreen scenarios, composed of main TVs, tablets and Head Mounted Displays (HMDs).","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128598229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeroen van der Hooft, Dries Pauwels, C. D. Boom, Stefano Petrangeli, T. Wauters, F. Turck
Nowadays, news-based websites and portals provide significant amounts of multimedia content to accompany news stories and articles. Within this context, HTTP Adaptive Streaming is generally used to deliver video over the best-effort Internet, allowing smooth video playback and a good Quality of Experience (QoE). To stimulate user engagement with the provided content, such as browsing and switching between videos, reducing the video's startup time has become more and more important: while the current median load time is in the order of seconds, research has shown that user waiting times must remain below two seconds to achieve an acceptable QoE. We developed a framework for low-latent delivery of news-related video content, integrating four optimizations either at server-side, client-side, or at the application layer. Using these optimizations, the video's startup time can be reduced significantly, allowing user interaction and fast switching between available content. In this paper, we describe a proof of concept of this framework, using a large dataset of a major Belgian news provider. A dashboard is provided, which allows the user to interact with available video content and assess the gains of the proposed optimizations. Particularly, we demonstrate how the proposed optimizations consistently reduce the video's startup time in different mobile network scenarios. These reductions allow the news provider to improve the user's QoE, reducing the startup time to values well below two seconds in different mobile network scenarios.
{"title":"Low-latency delivery of news-based video content","authors":"Jeroen van der Hooft, Dries Pauwels, C. D. Boom, Stefano Petrangeli, T. Wauters, F. Turck","doi":"10.1145/3204949.3208110","DOIUrl":"https://doi.org/10.1145/3204949.3208110","url":null,"abstract":"Nowadays, news-based websites and portals provide significant amounts of multimedia content to accompany news stories and articles. Within this context, HTTP Adaptive Streaming is generally used to deliver video over the best-effort Internet, allowing smooth video playback and a good Quality of Experience (QoE). To stimulate user engagement with the provided content, such as browsing and switching between videos, reducing the video's startup time has become more and more important: while the current median load time is in the order of seconds, research has shown that user waiting times must remain below two seconds to achieve an acceptable QoE. We developed a framework for low-latent delivery of news-related video content, integrating four optimizations either at server-side, client-side, or at the application layer. Using these optimizations, the video's startup time can be reduced significantly, allowing user interaction and fast switching between available content. In this paper, we describe a proof of concept of this framework, using a large dataset of a major Belgian news provider. A dashboard is provided, which allows the user to interact with available video content and assess the gains of the proposed optimizations. Particularly, we demonstrate how the proposed optimizations consistently reduce the video's startup time in different mobile network scenarios. These reductions allow the news provider to improve the user's QoE, reducing the startup time to values well below two seconds in different mobile network scenarios.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128214956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Klaus Schöffmann, M. Taschwer, S. Sarny, Bernd Münzer, Manfred Jürgen Primus, Doris Putzgruber
Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.
{"title":"Cataract-101: video dataset of 101 cataract surgeries","authors":"Klaus Schöffmann, M. Taschwer, S. Sarny, Bernd Münzer, Manfred Jürgen Primus, Doris Putzgruber","doi":"10.1145/3204949.3208137","DOIUrl":"https://doi.org/10.1145/3204949.3208137","url":null,"abstract":"Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The computing power and bandwidth of the current VR are limited when compared to the high-quality VR. To overcome these limits, this study proposes a new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC). The proposed SHVC and HEVC encoders generate the bitstream that can transmit tiles independently. Therefore, the bitstream generated by the proposed encoder can be extracted in units of tiles. In accordance with what is discussed in the standard, the proposed extractor extracts the bitstream of the tiles corresponding to the viewport. SHVC video bitstream extracted by the proposed methods consist of (i) an SHVC base layer (BL) which represents the entire 360-degree area and (ii) an SHVC enhancement layer (EL) for selective streaming with viewport (region of interest (ROI)) tiles. When the proposed HEVC encoder is used, low and high resolution sequences are separately encoded as the BL and EL of SHVC. By streaming the BL(low resolution) and selective EL(high resolution) tiles with ROI instead of streaming whole high quality 360-degree video, the proposed method can reduce the network bandwidth as well as the computational complexity on the decoder side. Experimental results show more than 47% bandwidth reduction.
{"title":"Implementing 360 video tiled streaming system","authors":"Jangwoo Son, Dongmin Jang, Eun‐Seok Ryu","doi":"10.1145/3204949.3208119","DOIUrl":"https://doi.org/10.1145/3204949.3208119","url":null,"abstract":"The computing power and bandwidth of the current VR are limited when compared to the high-quality VR. To overcome these limits, this study proposes a new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC). The proposed SHVC and HEVC encoders generate the bitstream that can transmit tiles independently. Therefore, the bitstream generated by the proposed encoder can be extracted in units of tiles. In accordance with what is discussed in the standard, the proposed extractor extracts the bitstream of the tiles corresponding to the viewport. SHVC video bitstream extracted by the proposed methods consist of (i) an SHVC base layer (BL) which represents the entire 360-degree area and (ii) an SHVC enhancement layer (EL) for selective streaming with viewport (region of interest (ROI)) tiles. When the proposed HEVC encoder is used, low and high resolution sequences are separately encoded as the BL and EL of SHVC. By streaming the BL(low resolution) and selective EL(high resolution) tiles with ROI instead of streaming whole high quality 360-degree video, the proposed method can reduce the network bandwidth as well as the computational complexity on the decoder side. Experimental results show more than 47% bandwidth reduction.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129306770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefano Petrangeli, Dries Pauwels, Jeroen van der Hooft, T. Wauters, F. Turck, Jürgen Slowack
Remote collaboration is common nowadays in conferencing, tele-health and remote teaching applications. To support these interactive use cases, Real-Time Communication (RTC) solutions, as the open-source WebRTC framework, are generally used. WebRTC is peer-to-peer by design, which entails that each sending peer needs to encode a separate, independent stream for each receiving peer in the remote session. This approach is therefore expensive in terms of number of encoders and not able to scale well for a large number of users. To overcome this issue, a WebRTC-compliant framework is proposed in this paper, where only a limited number of encoders are used at sender-side. Consequently, each encoder can transmit to a multitude of receivers at the same time. The conference controller, a centralized Selective Forwarding Unit (SFU), dynamically forwards the most suitable stream to each of the receivers, based on their bandwidth conditions. Moreover, the controller dynamically recomputes the encoding bitrates of the sender, to follow the long-term bandwidth variations of the receivers and increase the delivered video quality. The benefits of this framework are showcased using a demo implemented using the Jitsi-Videobridge software, a WebRTC SFU, for the controller and the Chrome browser for the peers. Particularly, we demonstrate how our framework can improve the received video quality up to 15% compared to an approach where the encoding bitrates are static and do not change over time.
{"title":"Improving quality and scalability of webRTC video collaboration applications","authors":"Stefano Petrangeli, Dries Pauwels, Jeroen van der Hooft, T. Wauters, F. Turck, Jürgen Slowack","doi":"10.1145/3204949.3208109","DOIUrl":"https://doi.org/10.1145/3204949.3208109","url":null,"abstract":"Remote collaboration is common nowadays in conferencing, tele-health and remote teaching applications. To support these interactive use cases, Real-Time Communication (RTC) solutions, as the open-source WebRTC framework, are generally used. WebRTC is peer-to-peer by design, which entails that each sending peer needs to encode a separate, independent stream for each receiving peer in the remote session. This approach is therefore expensive in terms of number of encoders and not able to scale well for a large number of users. To overcome this issue, a WebRTC-compliant framework is proposed in this paper, where only a limited number of encoders are used at sender-side. Consequently, each encoder can transmit to a multitude of receivers at the same time. The conference controller, a centralized Selective Forwarding Unit (SFU), dynamically forwards the most suitable stream to each of the receivers, based on their bandwidth conditions. Moreover, the controller dynamically recomputes the encoding bitrates of the sender, to follow the long-term bandwidth variations of the receivers and increase the delivered video quality. The benefits of this framework are showcased using a demo implemented using the Jitsi-Videobridge software, a WebRTC SFU, for the controller and the Chrome browser for the peers. Particularly, we demonstrate how our framework can improve the received video quality up to 15% compared to an approach where the encoding bitrates are static and do not change over time.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123202465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Britta Meixner, Jan Willem Kleinrouweler, Pablo César
Mobile networks, especially LTE networks, are used more and more for high-bandwidth services like multimedia or video streams. The quality of the data connection plays a major role in the perceived quality of a service. Videos may be presented in a low quality or experience a lot of stalling events, when the connection is too slow to buffer the next frames for playback. So far, no publicly available data set exists that has a larger number of LTE network traces and can be used for deeper analysis. In this data set, we provide 546 traces of 5 minutes each with a sample rate of 100 ms. Thereof 377 traces are pure LTE data. We furthermore provide an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data.
{"title":"4G/LTE channel quality reference signal trace data set","authors":"Britta Meixner, Jan Willem Kleinrouweler, Pablo César","doi":"10.1145/3204949.3208132","DOIUrl":"https://doi.org/10.1145/3204949.3208132","url":null,"abstract":"Mobile networks, especially LTE networks, are used more and more for high-bandwidth services like multimedia or video streams. The quality of the data connection plays a major role in the perceived quality of a service. Videos may be presented in a low quality or experience a lot of stalling events, when the connection is too slow to buffer the next frames for playback. So far, no publicly available data set exists that has a larger number of LTE network traces and can be used for deeper analysis. In this data set, we provide 546 traces of 5 minutes each with a sample rate of 100 ms. Thereof 377 traces are pure LTE data. We furthermore provide an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121297739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern video streaming uses adaptive bitrate (ABR) algorithms than run inside video players and continually adjust the quality (i.e., bitrate) of the video segments that are downloaded and rendered to the user. To maximize the quality-of-experience of the user, ABR algorithms must stream at a high bitrate with low rebuffering and low bitrate oscillations. Further, a good ABR algorithm is responsive to user and network events and can be used in demanding scenarios such as low-latency live streaming. Recent research papers provide an abundance of ABR algorithms, but fall short on many of the above real-world requirements. We develop Sabre, an open-source publicly-available simulation tool that enables fast and accurate simulation of adaptive streaming environments. We used Sabre to design and evaluate BOLA-E and DYNAMIC, two novel ABR algorithms. We also developed a FAST SWITCHING algorithm that can replace segments that have already been downloaded with higher-bitrate (thus higher-quality) segments. The new algorithms provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations. In addition, these algorithms react faster to user events such as startup and seek, and respond more quickly to network events such as improvements in throughput. Further, they perform very well for live streams that require low latency, a challenging scenario for ABR algorithms. Overall, our algorithms offer superior video QoE and responsiveness for real-life adaptive video streaming, in comparison to the state-of-the-art. Importantly all three algorithms presented in this paper are now part of the official DASH reference player dash.js and are being used by video providers in production environments. While our evaluation and implementation are focused on the DASH environment, our algorithms are equally applicable to other adaptive streaming formats such as Apple HLS.
{"title":"From theory to practice: improving bitrate adaptation in the DASH reference player","authors":"Kevin Spiteri, R. Sitaraman, D. Sparacio","doi":"10.1145/3204949.3204953","DOIUrl":"https://doi.org/10.1145/3204949.3204953","url":null,"abstract":"Modern video streaming uses adaptive bitrate (ABR) algorithms than run inside video players and continually adjust the quality (i.e., bitrate) of the video segments that are downloaded and rendered to the user. To maximize the quality-of-experience of the user, ABR algorithms must stream at a high bitrate with low rebuffering and low bitrate oscillations. Further, a good ABR algorithm is responsive to user and network events and can be used in demanding scenarios such as low-latency live streaming. Recent research papers provide an abundance of ABR algorithms, but fall short on many of the above real-world requirements. We develop Sabre, an open-source publicly-available simulation tool that enables fast and accurate simulation of adaptive streaming environments. We used Sabre to design and evaluate BOLA-E and DYNAMIC, two novel ABR algorithms. We also developed a FAST SWITCHING algorithm that can replace segments that have already been downloaded with higher-bitrate (thus higher-quality) segments. The new algorithms provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations. In addition, these algorithms react faster to user events such as startup and seek, and respond more quickly to network events such as improvements in throughput. Further, they perform very well for live streams that require low latency, a challenging scenario for ABR algorithms. Overall, our algorithms offer superior video QoE and responsiveness for real-life adaptive video streaming, in comparison to the state-of-the-art. Importantly all three algorithms presented in this paper are now part of the official DASH reference player dash.js and are being used by video providers in production environments. While our evaluation and implementation are focused on the DASH environment, our algorithms are equally applicable to other adaptive streaming formats such as Apple HLS.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122094574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}