Chih-Fan Hsu, Yu-Cheng Chen, Yu-Shuen Wang, C. Lei, Kuan-Ta Chen
Retaining eye contact of remote users is a critical issue in video conferencing systems because of parallax caused by the physical distance between a screen and a camera. To achieve this objective, we present a real-time gaze redirection system called Flx-gaze to post-process each video frame before sending it to the remote end. Specifically, we relocate and relight the pixels representing eyes by using a convolutional neural network (CNN). To prevent visual artifacts during manipulation, we minimize not only the L2 loss function but also four novel loss functions when training the network. Two of them retain the rigidity of eyeballs and eyelids; and the other two prevent color discontinuity on the eye peripheries. By leveraging the CPU and the GPU resources, our implementation achieves real-time performance (i.e., 31 frames per second). Experimental results show that the gazes redirected by our system are of high quality under this restrict time constraint. We also conducted an objective evaluation of our system by measuring the peak signal-to-noise ratio (PSNR) between the real and the synthesized images.
{"title":"Realizing the real-time gaze redirection system with convolutional neural network","authors":"Chih-Fan Hsu, Yu-Cheng Chen, Yu-Shuen Wang, C. Lei, Kuan-Ta Chen","doi":"10.1145/3204949.3209618","DOIUrl":"https://doi.org/10.1145/3204949.3209618","url":null,"abstract":"Retaining eye contact of remote users is a critical issue in video conferencing systems because of parallax caused by the physical distance between a screen and a camera. To achieve this objective, we present a real-time gaze redirection system called Flx-gaze to post-process each video frame before sending it to the remote end. Specifically, we relocate and relight the pixels representing eyes by using a convolutional neural network (CNN). To prevent visual artifacts during manipulation, we minimize not only the L2 loss function but also four novel loss functions when training the network. Two of them retain the rigidity of eyeballs and eyelids; and the other two prevent color discontinuity on the eye peripheries. By leveraging the CPU and the GPU resources, our implementation achieves real-time performance (i.e., 31 frames per second). Experimental results show that the gazes redirected by our system are of high quality under this restrict time constraint. We also conducted an objective evaluation of our system by measuring the peak signal-to-noise ratio (PSNR) between the real and the synthesized images.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125716421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work presents a mobile data offloading system for video streaming services over software-defined networking (SDN)-enabled wireless networks. The goal of the proposed system is to alleviate cellular network congestion by offloading parts of video traffic to a WiFi network while improving video quality of all users by efficiently and fairly sharing the limited long term evolution (LTE) resources. In the proposed system, SDN architecture is applied to the wireless network environment to quickly react to time-varying network conditions and finely control the amount of traffic transmitted through LTE and WiFi networks. Under the SDN-enabled wireless environment, we frame the mobile data offloading problem for video streaming services as an asymmetric Nash bargaining game to address conflict among competitive mobile users. Furthermore, we propose a resource allocation algorithm that pursues an effective trade-off between global system utility and quality-of-service fairness among users. The system is fully implemented using ONOS SDN controller and Raspberry PI-3-based mobile devices, and performance is evaluated over real wireless networks.
这项工作提出了一种移动数据卸载系统,用于支持软件定义网络(SDN)的无线网络上的视频流服务。该系统的目标是通过将部分视频流量卸载到WiFi网络来缓解蜂窝网络拥塞,同时通过有效和公平地共享有限的长期演进(LTE)资源来提高所有用户的视频质量。本系统将SDN架构应用于无线网络环境,快速响应时变网络状况,精细控制LTE和WiFi网络传输的流量。在支持sdn的无线环境下,我们将视频流服务的移动数据卸载问题构建为非对称纳什讨价还价博弈,以解决竞争激烈的移动用户之间的冲突。此外,我们提出了一种资源分配算法,该算法追求全局系统效用和用户之间的服务质量公平之间的有效权衡。该系统使用ONOS SDN控制器和基于Raspberry pi -3的移动设备完全实现,并在真实无线网络上进行了性能评估。
{"title":"Mobile data offloading system for video streaming services over SDN-enabled wireless networks","authors":"Donghyeok Ho, Gi Seok Park, Hwangjun Song","doi":"10.1145/3204949.3204977","DOIUrl":"https://doi.org/10.1145/3204949.3204977","url":null,"abstract":"This work presents a mobile data offloading system for video streaming services over software-defined networking (SDN)-enabled wireless networks. The goal of the proposed system is to alleviate cellular network congestion by offloading parts of video traffic to a WiFi network while improving video quality of all users by efficiently and fairly sharing the limited long term evolution (LTE) resources. In the proposed system, SDN architecture is applied to the wireless network environment to quickly react to time-varying network conditions and finely control the amount of traffic transmitted through LTE and WiFi networks. Under the SDN-enabled wireless environment, we frame the mobile data offloading problem for video streaming services as an asymmetric Nash bargaining game to address conflict among competitive mobile users. Furthermore, we propose a resource allocation algorithm that pursues an effective trade-off between global system utility and quality-of-service fairness among users. The system is fully implemented using ONOS SDN controller and Raspberry PI-3-based mobile devices, and performance is evaluated over real wireless networks.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133124608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently on the market there is a plethora of affordable dedicated cameras or smartphones, able to record video and timed geospa-tial data (device location and orientation). This timed metadata can be used to identify relevant (in time and space) recordings. However, there has not been a platform that allows to exploit this information in order to utilize the relevant recordings in an interactive consumption scenario. In this paper we present SWAPUGC, a browser-based platform for building applications that use the accompanying geospatial data to dynamically select the streams for watching an event (or any spatiotemporal reference point). The view selection can be performed either manually, or automatically by a predefined algorithm that switches to the most suitable stream according to the recording characteristics. SWAPUGC is a research tool to test such adaptation algorithms and it is provided as an open-source project, accompanied by an example demo application and references to a compatible dataset and recorder. In this paper, we explain and then demonstrate the capabilities of the platform by an example implementation and examine future prospects and extensions.
{"title":"SWAPUGC","authors":"Emmanouil Potetsianakis, J. L. Feuvre","doi":"10.1145/3204949.3208142","DOIUrl":"https://doi.org/10.1145/3204949.3208142","url":null,"abstract":"Currently on the market there is a plethora of affordable dedicated cameras or smartphones, able to record video and timed geospa-tial data (device location and orientation). This timed metadata can be used to identify relevant (in time and space) recordings. However, there has not been a platform that allows to exploit this information in order to utilize the relevant recordings in an interactive consumption scenario. In this paper we present SWAPUGC, a browser-based platform for building applications that use the accompanying geospatial data to dynamically select the streams for watching an event (or any spatiotemporal reference point). The view selection can be performed either manually, or automatically by a predefined algorithm that switches to the most suitable stream according to the recording characteristics. SWAPUGC is a research tool to test such adaptation algorithms and it is provided as an open-source project, accompanied by an example demo application and references to a compatible dataset and recorder. In this paper, we explain and then demonstrate the capabilities of the platform by an example implementation and examine future prospects and extensions.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129070723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In computer vision, scale-invariant feature transform (SIFT) remains one of the most commonly used algorithms for feature extraction, but its high computational cost makes it hard to deploy in real-time applications. In this paper, we introduce a novel technique to restructure the inter-octave and intra-octave dependencies of SIFT's keypoint detection and description processes, allowing it to be stopped early and produce approximate results in proportion to the time for which it was allowed to run. If our algorithm is run to completion (about 0.7% longer than traditional SIFT), its results and SIFT's converge. Unlike previous approaches to real-time SIFT, we require no special hardware and make no compromises in keypoint quality, making our technique ideal for real-time and near-real-time applications on resource-constrained systems. We use standard data sets and metrics to analyze the performance of our algorithm and the quality of the generated keypoints.
{"title":"ISIFT","authors":"Benjamin J Hamlin, Ryan Feng, Wu-chi Feng","doi":"10.1145/3204949.3210549","DOIUrl":"https://doi.org/10.1145/3204949.3210549","url":null,"abstract":"In computer vision, scale-invariant feature transform (SIFT) remains one of the most commonly used algorithms for feature extraction, but its high computational cost makes it hard to deploy in real-time applications. In this paper, we introduce a novel technique to restructure the inter-octave and intra-octave dependencies of SIFT's keypoint detection and description processes, allowing it to be stopped early and produce approximate results in proportion to the time for which it was allowed to run. If our algorithm is run to completion (about 0.7% longer than traditional SIFT), its results and SIFT's converge. Unlike previous approaches to real-time SIFT, we require no special hardware and make no compromises in keypoint quality, making our technique ideal for real-time and near-real-time applications on resource-constrained systems. We use standard data sets and metrics to analyze the performance of our algorithm and the quality of the generated keypoints.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114252839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RGB-D cameras, such as the Microsoft Kinect, provide us with the 3D information, color and depth, associated with the scene. Interactive 3D Tele-Immersion (i3DTI) systems use such RGB-D cameras to capture the person present in the scene in order to collaborate with other remote users and interact with the virtual objects present in the environment. Using a single camera, it becomes difficult to estimate an accurate skeletal pose and complete 3D model of the person, especially when the person is not in the complete view of the camera. With multiple cameras, even with partial views, it is possible to get a more accurate estimate of the skeleton of the person leading to a better and complete 3D model. In this paper, we present a real-time skeletal pose identification approach that leverages on the inaccurate skeletons of the individual Kinects, and provides a combined optimized skeleton. We estimate the Probability of an Accurate Joint (PAJ) for each joint from all of the Kinect skeletons. We determine the correct direction of the person and assign the correct joint sides for each skeleton. We then use a greedy consensus approach to combine the highly probable and accurate joints to estimate the combined skeleton. Using the individual skeletons, we segment the point clouds from all the cameras. We use the already computed PAJ values to obtain the Probability of an Accurate Bone (PAB). The individual point clouds are then combined one segment after another using the calculated PAB values. The generated combined point cloud is a complete and accurate 3D representation of the person present in the scene. We validate our estimated skeleton against two well-known methods by computing the error distance between the best view Kinect skeleton and the estimated skeleton. An exhaustive analysis is performed by using around 500000 skeletal frames in total, captured using 7 users and 7 cameras. Visual analysis is performed by checking whether the estimated skeleton is completely present within the human model. We also develop a 3D Holo-Bubble game to showcase the real-time performance of the combined skeleton and point cloud. Our results show that our method performs better than the state-of-the-art approaches that use multiple Kinects, in terms of objective error, visual quality and real-time user performance.
{"title":"Combining skeletal poses for 3D human model generation using multiple kinects","authors":"Kevin Desai, B. Prabhakaran, S. Raghuraman","doi":"10.1145/3204949.3204958","DOIUrl":"https://doi.org/10.1145/3204949.3204958","url":null,"abstract":"RGB-D cameras, such as the Microsoft Kinect, provide us with the 3D information, color and depth, associated with the scene. Interactive 3D Tele-Immersion (i3DTI) systems use such RGB-D cameras to capture the person present in the scene in order to collaborate with other remote users and interact with the virtual objects present in the environment. Using a single camera, it becomes difficult to estimate an accurate skeletal pose and complete 3D model of the person, especially when the person is not in the complete view of the camera. With multiple cameras, even with partial views, it is possible to get a more accurate estimate of the skeleton of the person leading to a better and complete 3D model. In this paper, we present a real-time skeletal pose identification approach that leverages on the inaccurate skeletons of the individual Kinects, and provides a combined optimized skeleton. We estimate the Probability of an Accurate Joint (PAJ) for each joint from all of the Kinect skeletons. We determine the correct direction of the person and assign the correct joint sides for each skeleton. We then use a greedy consensus approach to combine the highly probable and accurate joints to estimate the combined skeleton. Using the individual skeletons, we segment the point clouds from all the cameras. We use the already computed PAJ values to obtain the Probability of an Accurate Bone (PAB). The individual point clouds are then combined one segment after another using the calculated PAB values. The generated combined point cloud is a complete and accurate 3D representation of the person present in the scene. We validate our estimated skeleton against two well-known methods by computing the error distance between the best view Kinect skeleton and the estimated skeleton. An exhaustive analysis is performed by using around 500000 skeletal frames in total, captured using 7 users and 7 cameras. Visual analysis is performed by checking whether the estimated skeleton is completely present within the human model. We also develop a 3D Holo-Bubble game to showcase the real-time performance of the combined skeleton and point cloud. Our results show that our method performs better than the state-of-the-art approaches that use multiple Kinects, in terms of objective error, visual quality and real-time user performance.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121116191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maia Rohm, B. Ionescu, A. Gînsca, Rodrygo L. T. Santos, H. Müller
In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.
{"title":"Subdiv17","authors":"Maia Rohm, B. Ionescu, A. Gînsca, Rodrygo L. T. Santos, H. Müller","doi":"10.1145/3204949.3208122","DOIUrl":"https://doi.org/10.1145/3204949.3208122","url":null,"abstract":"In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114933872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a crowdsourced dataset of a large-scale event with more than 1000 measuring participants. The detailed dataset consists of various location data and network measurements of all national carrier collected during a four-day event. The concentrated samples for this short time period enable detailed analysis, e.g., by correlating movement patterns and experienced network conditions.
{"title":"SGF","authors":"J. Heuschkel, Alexander Frömmgen","doi":"10.1145/3204949.3208120","DOIUrl":"https://doi.org/10.1145/3204949.3208120","url":null,"abstract":"This paper presents a crowdsourced dataset of a large-scale event with more than 1000 measuring participants. The detailed dataset consists of various location data and network measurements of all national carrier collected during a four-day event. The concentrated samples for this short time period enable detailed analysis, e.g., by correlating movement patterns and experienced network conditions.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129312676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estêvão Bissoli Saleme, Celso A. S. Santos, G. Ghinea
Human perception is inherently multisensory involving sight, hearing, smell, touch, and taste. Mulsemedia systems include the combination of traditional media (text, image, video, and audio) with non-traditional ones that stimulate other senses beyond sight and hearing. Whilst work has been done on some user-centred aspects that the distribution of mulsemedia data raises, such as synchronisation, and jitter, this paper tackles complementary issues that temporality constraints pose on the distribution of mulsemedia effects. It aims at improving response time interval in networked event-based mulsemedia systems based upon prior findings in this context. Thus, we reshaped the communication strategy of an open distributed mulsemedia platform called PlaySEM to work more efficiently with other event-based applications, such as games, VR/AR software, and interactive applications, wishing to stimulate other senses to increase the immersion of users. Moreover, we added lightweight communication protocols in its interface to analyse whether they reduce network overhead. To carry out the experiment, we developed mock applications for different protocols to simulate an interactive application working with the PlaySEM, measuring the delay between them. The results showed that by pre-processing sensory effects metadata before real-time communication, and selecting the appropriate protocol, response time interval in networked event-based mulsemedia systems can decrease remarkably.
{"title":"Improving response time interval in networked event-based mulsemedia systems","authors":"Estêvão Bissoli Saleme, Celso A. S. Santos, G. Ghinea","doi":"10.1145/3204949.3204965","DOIUrl":"https://doi.org/10.1145/3204949.3204965","url":null,"abstract":"Human perception is inherently multisensory involving sight, hearing, smell, touch, and taste. Mulsemedia systems include the combination of traditional media (text, image, video, and audio) with non-traditional ones that stimulate other senses beyond sight and hearing. Whilst work has been done on some user-centred aspects that the distribution of mulsemedia data raises, such as synchronisation, and jitter, this paper tackles complementary issues that temporality constraints pose on the distribution of mulsemedia effects. It aims at improving response time interval in networked event-based mulsemedia systems based upon prior findings in this context. Thus, we reshaped the communication strategy of an open distributed mulsemedia platform called PlaySEM to work more efficiently with other event-based applications, such as games, VR/AR software, and interactive applications, wishing to stimulate other senses to increase the immersion of users. Moreover, we added lightweight communication protocols in its interface to analyse whether they reduce network overhead. To carry out the experiment, we developed mock applications for different protocols to simulate an interactive application working with the PlaySEM, measuring the delay between them. The results showed that by pre-processing sensory effects metadata before real-time communication, and selecting the appropriate protocol, response time interval in networked event-based mulsemedia systems can decrease remarkably.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127152916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Heikkinen, P. Pääkkönen, Marko Viitanen, Jarno Vanne, Tommi Riikonen, K. Bakanoglu
The service broker provides service providers with virtualized services that can be initialized rapidly and scaled up or down on demand. This demonstration paper describes how a service provider can set up a new video distribution service to end users with a diminutive effort. Our proposal makes use of Docker lightweight virtualization technologies that pack services in containers. This makes it possible to implement video coding and content delivery networks that are scalable and consume resources only when needed. The demonstration showcases a scenario where a video service provider sets up a new live video distribution service to end users. After the setup, live 720p30 video camera feed is encoded in real-time, streamed in HEVC MPEG-DASH format over CDN network, and accessed with a HbbTV compatible set-top-box. This end-to-end system illustrates that virtualization causes no significant resource or performance overhead but is a perfect match for online video services.
{"title":"Fast and easy live video service setup using lightweight virtualization","authors":"A. Heikkinen, P. Pääkkönen, Marko Viitanen, Jarno Vanne, Tommi Riikonen, K. Bakanoglu","doi":"10.1145/3204949.3208112","DOIUrl":"https://doi.org/10.1145/3204949.3208112","url":null,"abstract":"The service broker provides service providers with virtualized services that can be initialized rapidly and scaled up or down on demand. This demonstration paper describes how a service provider can set up a new video distribution service to end users with a diminutive effort. Our proposal makes use of Docker lightweight virtualization technologies that pack services in containers. This makes it possible to implement video coding and content delivery networks that are scalable and consume resources only when needed. The demonstration showcases a scenario where a video service provider sets up a new live video distribution service to end users. After the setup, live 720p30 video camera feed is encoded in real-time, streamed in HEVC MPEG-DASH format over CDN network, and accessed with a HbbTV compatible set-top-box. This end-to-end system illustrates that virtualization causes no significant resource or performance overhead but is a perfect match for online video services.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127698952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimedia services like Skype, WhatsApp, and Google Hangouts have strict Service Level Agreements (SLAs). These services attempt to address the root causes of SLA violations through techniques such as detecting anomalies in the inputs of the services. The key problem with current anomaly detection and handling techniques is that they can't adapt to service changes in real-time. In current techniques, historic data from prior runs of the service are used to identify anomalies in the service inputs like number of concurrent users, and system states like CPU utilization. These techniques do not evaluate the current impact of anomalies on the service. Thus, they may raise alerts and take corrective measures even if the detected anomalies do not cause SLA violations. Alerts are expensive to handle from a system and engineering support perspectives, and should be raised only if necessary. We propose a dynamic approach for handling service input and system state anomalies in multimedia services in real-time, by evaluating the impact of anomalies, independently and associatively, on the service outputs. Our proposed approach alerts and takes corrective measures like capacity allocations if the detected anomalies result in SLA violations. We implement our approach in a large-scale operational multimedia service, and show that it increases anomaly detection accuracy by 31%, reduces anomaly alerting false positives by 71%, false negatives by 69%, and enhances media sharing quality by 14%.
{"title":"Dynamic input anomaly detection in interactive multimedia services","authors":"M. Shatnawi, M. Hefeeda","doi":"10.1145/3204949.3204954","DOIUrl":"https://doi.org/10.1145/3204949.3204954","url":null,"abstract":"Multimedia services like Skype, WhatsApp, and Google Hangouts have strict Service Level Agreements (SLAs). These services attempt to address the root causes of SLA violations through techniques such as detecting anomalies in the inputs of the services. The key problem with current anomaly detection and handling techniques is that they can't adapt to service changes in real-time. In current techniques, historic data from prior runs of the service are used to identify anomalies in the service inputs like number of concurrent users, and system states like CPU utilization. These techniques do not evaluate the current impact of anomalies on the service. Thus, they may raise alerts and take corrective measures even if the detected anomalies do not cause SLA violations. Alerts are expensive to handle from a system and engineering support perspectives, and should be raised only if necessary. We propose a dynamic approach for handling service input and system state anomalies in multimedia services in real-time, by evaluating the impact of anomalies, independently and associatively, on the service outputs. Our proposed approach alerts and takes corrective measures like capacity allocations if the detected anomalies result in SLA violations. We implement our approach in a large-scale operational multimedia service, and show that it increases anomaly detection accuracy by 31%, reduces anomaly alerting false positives by 71%, false negatives by 69%, and enhances media sharing quality by 14%.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}