Adaptive streaming over HTTP has become the de-facto standard for video streaming over the Internet, partly due to its ease of deployment in a heavily ossified Internet. Though performant in most on-demand scenarios, it is bound by the semantics of TCP, with reliability prioritised over timeliness, even for live video where the reverse may be desired. In this paper, we present an implementation of MPEG-DASH over TCP Hollywood, a widely deployable TCP variant for latency sensitive applications. Out-of-order delivery in TCP Hollywood allows the client to measure, adapt and request the next video chunk even when the current one is only partially downloaded. Furthermore, the ability to skip frames, enabled by multi-streaming and out-of-order delivery, adds resilience against stalling for any delayed messages. We observed that in high latency and high loss networks, TCP Hollywood significantly lowers the possibility of stall events and also supports better quality downloads in comparison to standard TCP, with minimal changes to current adaptation algorithms.
{"title":"DASHing towards hollywood","authors":"Saba Ahsan, Stephen McQuistin, C. Perkins, J. Ott","doi":"10.1145/3204949.3204959","DOIUrl":"https://doi.org/10.1145/3204949.3204959","url":null,"abstract":"Adaptive streaming over HTTP has become the de-facto standard for video streaming over the Internet, partly due to its ease of deployment in a heavily ossified Internet. Though performant in most on-demand scenarios, it is bound by the semantics of TCP, with reliability prioritised over timeliness, even for live video where the reverse may be desired. In this paper, we present an implementation of MPEG-DASH over TCP Hollywood, a widely deployable TCP variant for latency sensitive applications. Out-of-order delivery in TCP Hollywood allows the client to measure, adapt and request the next video chunk even when the current one is only partially downloaded. Furthermore, the ability to skip frames, enabled by multi-streaming and out-of-order delivery, adds resilience against stalling for any delayed messages. We observed that in high latency and high loss networks, TCP Hollywood significantly lowers the possibility of stall events and also supports better quality downloads in comparison to standard TCP, with minimal changes to current adaptation algorithms.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"257 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133338280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Robitza, Steve Goering, A. Raake, David Lindero, Gunnar Heikkilä, Jorgen Gustafsson, P. List, B. Feiten, Ulf Wüstenhagen, Marie-Neige Garcia, Kazuhisa Yamagishi, S. Broom
This paper describes an open dataset and software for ITU-T Ree. P.1203. As the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming (HAS), it has been extensively trained and validated on over a thousand audiovisual sequences containing HAS-typical effects (such as stalling, coding artifacts, quality switches). Our dataset comprises four of the 30 official subjective databases at a bitstream feature level. The paper also includes subjective results and the model performance. Our software for the standard was made available to the public, too, and it is used for all the analyses presented. Among other previously unpublished details, we show the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.
{"title":"HTTP adaptive streaming QoE estimation with ITU-T rec. P. 1203: open databases and software","authors":"W. Robitza, Steve Goering, A. Raake, David Lindero, Gunnar Heikkilä, Jorgen Gustafsson, P. List, B. Feiten, Ulf Wüstenhagen, Marie-Neige Garcia, Kazuhisa Yamagishi, S. Broom","doi":"10.1145/3204949.3208124","DOIUrl":"https://doi.org/10.1145/3204949.3208124","url":null,"abstract":"This paper describes an open dataset and software for ITU-T Ree. P.1203. As the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming (HAS), it has been extensively trained and validated on over a thousand audiovisual sequences containing HAS-typical effects (such as stalling, coding artifacts, quality switches). Our dataset comprises four of the 30 official subjective databases at a bitstream feature level. The paper also includes subjective results and the model performance. Our software for the standard was made available to the public, too, and it is used for all the analyses presented. Among other previously unpublished details, we show the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133639817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Da Silva, Joachim Bruneau-Queyreix, Mathias Lacaud, D. Négru, Laurent Réveillère
Delivering video content with a high and fairly shared quality of experience is a challenging task in view of the drastic video traffic increase forecasts. Currently, content delivery networks provide numerous servers hosting replicas of the video content, and consuming clients are re-directed to the closest server. Then, the video content is streamed using adaptive streaming solutions. However, some servers become overloaded, and clients may experience a poor or unfairly distributed quality of experience. In this demonstration, we showcase Muslin, a streaming solution supporting a high, fairly shared end-users quality of experience for live streaming. Muslin leverages on MS-Stream, a content delivery solution in which a client can simultaneously use several servers. Muslin dynamically provisions servers and replicates content into servers, and advertises servers to clients based on real-time delivery conditions. Our demonstration shows that our approach outperforms traditional content delivery schemes enabling to increase the fairness and quality of experience at the user side without requiring a greater underlying content delivery platform.
{"title":"MUSLIN demo: high QoE fair multi-source live streaming","authors":"Simon Da Silva, Joachim Bruneau-Queyreix, Mathias Lacaud, D. Négru, Laurent Réveillère","doi":"10.1145/3204949.3208108","DOIUrl":"https://doi.org/10.1145/3204949.3208108","url":null,"abstract":"Delivering video content with a high and fairly shared quality of experience is a challenging task in view of the drastic video traffic increase forecasts. Currently, content delivery networks provide numerous servers hosting replicas of the video content, and consuming clients are re-directed to the closest server. Then, the video content is streamed using adaptive streaming solutions. However, some servers become overloaded, and clients may experience a poor or unfairly distributed quality of experience. In this demonstration, we showcase Muslin, a streaming solution supporting a high, fairly shared end-users quality of experience for live streaming. Muslin leverages on MS-Stream, a content delivery solution in which a client can simultaneously use several servers. Muslin dynamically provisions servers and replicates content into servers, and advertises servers to clients based on real-time delivery conditions. Our demonstration shows that our approach outperforms traditional content delivery schemes enabling to increase the fairness and quality of experience at the user side without requiring a greater underlying content delivery platform.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134031826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Josué, R. Abreu, Fábio Barreto, D. Mattos, G. Amorim, J. Santos, D. Muchaluat-Saade
Multimedia applications are usually composed by audiovisual content. Traditional multimedia conceptual models, and consequently declarative multimedia authoring languages, do not support the definition of multiple sensory effects. Multiple sensorial media (mulsemedia) applications consider the use of sensory effects that can stimulate touch, smell and taste, in addition to hearing and sight. Therefore, mulsemedia applications have been usually developed using general-purpose programming languages. In order to fill in this gap, this paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification. Thus, complete descriptions of mulsemedia applications will be made possible with multimedia models and languages. In order to validate our ideas, an interactive mulsemedia application example is presented and specified with NCL (Nested Context Language) and Lua. Lua components are used for translating sensory effect high-level attributes to MPEG-V SEM (Sensory Effect Metadata) files. A sensory effect simulator was developed to receive SEM files and simulate mulsemedia application rendering.
{"title":"Modeling sensory effects as first-class entities in multimedia applications","authors":"M. Josué, R. Abreu, Fábio Barreto, D. Mattos, G. Amorim, J. Santos, D. Muchaluat-Saade","doi":"10.1145/3204949.3204967","DOIUrl":"https://doi.org/10.1145/3204949.3204967","url":null,"abstract":"Multimedia applications are usually composed by audiovisual content. Traditional multimedia conceptual models, and consequently declarative multimedia authoring languages, do not support the definition of multiple sensory effects. Multiple sensorial media (mulsemedia) applications consider the use of sensory effects that can stimulate touch, smell and taste, in addition to hearing and sight. Therefore, mulsemedia applications have been usually developed using general-purpose programming languages. In order to fill in this gap, this paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification. Thus, complete descriptions of mulsemedia applications will be made possible with multimedia models and languages. In order to validate our ideas, an interactive mulsemedia application example is presented and specified with NCL (Nested Context Language) and Lua. Lua components are used for translating sensory effect high-level attributes to MPEG-V SEM (Sensory Effect Metadata) files. A sensory effect simulator was developed to receive SEM files and simulate mulsemedia application rendering.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130487362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Depletion of fossil fuel and the ever-increasing need for energy in residential and commercial buildings have triggered in-depth research on many energy saving and energy monitoring mechanisms. Currently, users are only aware of their overall energy consumption and its cost in a shared space. Due to the lack of information on individual energy consumption, users are not being able to fine-tune their energy usage. Further, even-splitting of energy cost in shared spaces does not help in creating awareness. With the advent of the Internet of Things (IoT) and wearable devices, apportioning of the total energy consumption of a household to individual occupants can be achieved to create awareness and consequently promoting sustainable energy usage. However, providing personalized energy consumption information in real-time is a challenging task due to the need for collection of fine-grained information at various levels. Particularly, identifying the user(s) utilizing an appliance in a shared space is a hard problem. The reason being, there are no comprehensive means of collecting accurate personalized energy consumption information. In this paper we present the Personalized Energy Apportioning Toolkit (PEAT) to accurately apportion total energy consumption to individual occupants in shared spaces. Apart from performing energy disaggregation, PEAT combines data from IoT devices such as smartphones and smartwatches of occupants to obtain fine-grained information, such as their location and activities. PEAT estimates energy footprint of individuals by modeling the association between the appliances and occupants in the household. We propose several accuracy metrics to study the performance of our toolkit. PEAT was exhaustively evaluated and validated in two multi-occupant households. PEAT achieves 90% energy apportioning accuracy using only the location information of the occupants. Furthermore, the energy apportioning accuracy is around 95% when both location and activity information is available.
{"title":"PEAT, how much am i burning?","authors":"S. Nambi, R. V. Prasad, A. R. Lua, Luis Gonzalez","doi":"10.1145/3204949.3204951","DOIUrl":"https://doi.org/10.1145/3204949.3204951","url":null,"abstract":"Depletion of fossil fuel and the ever-increasing need for energy in residential and commercial buildings have triggered in-depth research on many energy saving and energy monitoring mechanisms. Currently, users are only aware of their overall energy consumption and its cost in a shared space. Due to the lack of information on individual energy consumption, users are not being able to fine-tune their energy usage. Further, even-splitting of energy cost in shared spaces does not help in creating awareness. With the advent of the Internet of Things (IoT) and wearable devices, apportioning of the total energy consumption of a household to individual occupants can be achieved to create awareness and consequently promoting sustainable energy usage. However, providing personalized energy consumption information in real-time is a challenging task due to the need for collection of fine-grained information at various levels. Particularly, identifying the user(s) utilizing an appliance in a shared space is a hard problem. The reason being, there are no comprehensive means of collecting accurate personalized energy consumption information. In this paper we present the Personalized Energy Apportioning Toolkit (PEAT) to accurately apportion total energy consumption to individual occupants in shared spaces. Apart from performing energy disaggregation, PEAT combines data from IoT devices such as smartphones and smartwatches of occupants to obtain fine-grained information, such as their location and activities. PEAT estimates energy footprint of individuals by modeling the association between the appliances and occupants in the household. We propose several accuracy metrics to study the performance of our toolkit. PEAT was exhaustively evaluated and validated in two multi-occupant households. PEAT achieves 90% energy apportioning accuracy using only the location information of the occupants. Furthermore, the energy apportioning accuracy is around 95% when both location and activity information is available.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128980794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Becker, Matthias Schmidt, Fernando Bombardelli da Silva, Serhan Gül, C. Hellge, Oliver Sawade, I. Radusch
Modern driver assistance systems enable a variety of use cases which rely on accurate localization information of all traffic participants. Due to the unavailability of satellite-based localization, the use of infrastructure cameras is a promising alternative in indoor spaces such as parking garages. This paper presents a parking management system which extends the previous work of the eValet system with a low-complexity tracking functionality on compressed video bitstreams (compressed-domain tracking). The advantages of this approach include the improved robustness to partial occlusions as well as a resource-efficient processing of compressed video bit-streams. We have separated the tasks into different modules which are integrated into a comprehensive architecture. The demonstrator setup includes a 2D visualizer illustrating the operation of the algorithms on a single camera stream and a 3D visualizer displaying the abstract object detections in a global reference frame.
{"title":"Visual object tracking in a parking garage using compressed domain analysis","authors":"Daniel Becker, Matthias Schmidt, Fernando Bombardelli da Silva, Serhan Gül, C. Hellge, Oliver Sawade, I. Radusch","doi":"10.1145/3204949.3208117","DOIUrl":"https://doi.org/10.1145/3204949.3208117","url":null,"abstract":"Modern driver assistance systems enable a variety of use cases which rely on accurate localization information of all traffic participants. Due to the unavailability of satellite-based localization, the use of infrastructure cameras is a promising alternative in indoor spaces such as parking garages. This paper presents a parking management system which extends the previous work of the eValet system with a low-complexity tracking functionality on compressed video bitstreams (compressed-domain tracking). The advantages of this approach include the improved robustness to partial occlusions as well as a resource-efficient processing of compressed video bit-streams. We have separated the tasks into different modules which are integrated into a comprehensive architecture. The demonstrator setup includes a 2D visualizer illustrating the operation of the algorithms on a single camera stream and a 3D visualizer displaying the abstract object detections in a global reference frame.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113948809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathias Almquist, Viktor Almquist, Vengatanathan Krishnamoorthi, Niklas Carlsson, D. Eager
With 360° video, only a limited fraction of the full view is displayed at each point in time. This has prompted the design of streaming delivery techniques that allow alternative playback qualities to be delivered for each candidate viewing direction. However, while prefetching based on the user's expected viewing direction is best done close to playback deadlines, large buffers are needed to protect against shortfalls in future available bandwidth. This results in conflicting goals and an important prefetch aggressiveness tradeoff problem regarding how far ahead in time from the current play-point prefetching should be done. This paper presents the first characterization of this tradeoff. The main contributions include an empirical characterization of head movement behavior based on data from viewing sessions of four different categories of 360° video, an optimization-based comparison of the prefetch aggressiveness tradeoffs seen for these video categories, and a data-driven discussion of further optimizations, which include a novel system design that allows both tradeoff objectives to be targeted simultaneously. By qualitatively and quantitatively analyzing the above tradeoffs, we provide insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.
{"title":"The prefetch aggressiveness tradeoff in 360° video streaming","authors":"Mathias Almquist, Viktor Almquist, Vengatanathan Krishnamoorthi, Niklas Carlsson, D. Eager","doi":"10.1145/3204949.3204970","DOIUrl":"https://doi.org/10.1145/3204949.3204970","url":null,"abstract":"With 360° video, only a limited fraction of the full view is displayed at each point in time. This has prompted the design of streaming delivery techniques that allow alternative playback qualities to be delivered for each candidate viewing direction. However, while prefetching based on the user's expected viewing direction is best done close to playback deadlines, large buffers are needed to protect against shortfalls in future available bandwidth. This results in conflicting goals and an important prefetch aggressiveness tradeoff problem regarding how far ahead in time from the current play-point prefetching should be done. This paper presents the first characterization of this tradeoff. The main contributions include an empirical characterization of head movement behavior based on data from viewing sessions of four different categories of 360° video, an optimization-based comparison of the prefetch aggressiveness tradeoffs seen for these video categories, and a data-driven discussion of further optimizations, which include a novel system design that allows both tradeoff objectives to be targeted simultaneously. By qualitatively and quantitatively analyzing the above tradeoffs, we provide insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133122352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jussi Hanhirova, Teemu Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo, Antti Ylä-Jääski
We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.
{"title":"Latency and throughput characterization of convolutional neural networks for mobile computer vision","authors":"Jussi Hanhirova, Teemu Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo, Antti Ylä-Jääski","doi":"10.1145/3204949.3204975","DOIUrl":"https://doi.org/10.1145/3204949.3204975","url":null,"abstract":"We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126738242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anatoliy Zabrovskiy, Christian Feldmann, C. Timmerer
The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio-visual content accounts for the majority of today's internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs. This paper proposes a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions. We adopt state of the art encoding and packaging options and also provide basic quality metrics along with the DASH segments. Additionally, we briefly introduce a multi-codec DASH scheme and possible usage scenarios. Finally, we provide a preliminary evaluation of the encoding efficiency in the context of HTTP adaptive streaming services and applications.
{"title":"Multi-codec DASH dataset","authors":"Anatoliy Zabrovskiy, Christian Feldmann, C. Timmerer","doi":"10.1145/3204949.3208140","DOIUrl":"https://doi.org/10.1145/3204949.3208140","url":null,"abstract":"The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio-visual content accounts for the majority of today's internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs. This paper proposes a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions. We adopt state of the art encoding and packaging options and also provide basic quality metrics along with the DASH segments. Additionally, we briefly introduce a multi-codec DASH scheme and possible usage scenarios. Finally, we provide a preliminary evaluation of the encoding efficiency in the context of HTTP adaptive streaming services and applications.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128781475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Tsilimantos, Theodoros Karagkioules, S. Valentin
Accurate cross-layer information is very useful to optimize mobile networks for specific applications. However, providing application-layer information to lower protocol layers has become very difficult due to the wide adoption of end-to-end encryption and due to the absence of cross-layer signaling standards. As an alternative, this paper presents a traffic profiling solution to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers. By observing IP packet arrivals, our machine learning system identifies video flows and detects the state of an HAS client's play-back buffer in real time. Our experiments with YouTube's mobile client show that Random Forests achieve very high accuracy even with a strong variation of link quality. Since this high performance is achieved at IP level with a small, generic feature set, our approach requires no Deep Packet Inspection (DPI), comes at low complexity, and does not interfere with end-to-end encryption. Traffic profiling is, thus, a powerful new tool for monitoring and managing even encrypted HAS traffic in mobile networks.
{"title":"Classifying flows and buffer state for youtube's HTTP adaptive streaming service in mobile networks","authors":"D. Tsilimantos, Theodoros Karagkioules, S. Valentin","doi":"10.1145/3204949.3204955","DOIUrl":"https://doi.org/10.1145/3204949.3204955","url":null,"abstract":"Accurate cross-layer information is very useful to optimize mobile networks for specific applications. However, providing application-layer information to lower protocol layers has become very difficult due to the wide adoption of end-to-end encryption and due to the absence of cross-layer signaling standards. As an alternative, this paper presents a traffic profiling solution to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers. By observing IP packet arrivals, our machine learning system identifies video flows and detects the state of an HAS client's play-back buffer in real time. Our experiments with YouTube's mobile client show that Random Forests achieve very high accuracy even with a strong variation of link quality. Since this high performance is achieved at IP level with a small, generic feature set, our approach requires no Deep Packet Inspection (DPI), comes at low complexity, and does not interfere with end-to-end encryption. Traffic profiling is, thus, a powerful new tool for monitoring and managing even encrypted HAS traffic in mobile networks.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134192001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}