Konstantin Pogorelov, Zeno Albisser, O. Ostroukhova, M. Lux, Dag Johansen, P. Halvorsen, M. Riegler
This paper presents an open-source classification tool for image and video frame classification. The classification takes a search-based approach and relies on global and local image features. It has been shown to work with images as well as videos, and is able to perform the classification of video frames in real-time so that the output can be used while the video is recorded, playing, or streamed. OpenSea has been proven to perform comparable to state-of-the-art methods such as deep learning, at the same time performing much faster in terms of processing speed, and can be therefore seen as an easy to get and hard to beat baseline. We present a detailed description of the software, its installation and use. As a use case, we demonstrate the classification of polyps in colonoscopy videos based on a publicly available dataset. We conduct leave-one-out-cross-validation to show the potential of the software in terms of classification time and accuracy.
{"title":"Opensea","authors":"Konstantin Pogorelov, Zeno Albisser, O. Ostroukhova, M. Lux, Dag Johansen, P. Halvorsen, M. Riegler","doi":"10.1145/3204949.3208128","DOIUrl":"https://doi.org/10.1145/3204949.3208128","url":null,"abstract":"This paper presents an open-source classification tool for image and video frame classification. The classification takes a search-based approach and relies on global and local image features. It has been shown to work with images as well as videos, and is able to perform the classification of video frames in real-time so that the output can be used while the video is recorded, playing, or streamed. OpenSea has been proven to perform comparable to state-of-the-art methods such as deep learning, at the same time performing much faster in terms of processing speed, and can be therefore seen as an easy to get and hard to beat baseline. We present a detailed description of the software, its installation and use. As a use case, we demonstrate the classification of polyps in colonoscopy videos based on a publicly available dataset. We conduct leave-one-out-cross-validation to show the potential of the software in terms of classification time and accuracy.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127819253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. I. T. D. C. Filho, M. C. Luizelli, M. T. Vega, Jeroen van der Hooft, Stefano Petrangeli, T. Wauters, F. Turck, L. Gaspary
The demand of Virtual Reality (VR) video streaming to mobile devices is booming, as VR becomes accessible to the general public. However, the variability of conditions of mobile networks affects the perception of this type of high-bandwidth-demanding services in unexpected ways. In this situation, there is a need for novel performance assessment models fit to the new VR applications. In this paper, we present PERCEIVE, a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks. By means of machine learning techniques, our approach is able to first predict adaptive VR video playout performance, using network Quality of Service (QoS) indicators as predictors. In a second stage, it employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality. The evaluation of PERCEIVE has been performed considering a real-world environment, in which VR videos are streamed while subjected to LTE/4G network condition. The accuracy of PERCEIVE has been assessed by means of the residual error between predicted and measured values. Our approach predicts the different performance metrics of the VR playout with an average prediction error lower than 3.7% and estimates the perceived quality with a prediction error lower than 4% for over 90% of all the tested cases. Moreover, it allows us to pinpoint the QoS conditions that affect adaptive VR streaming services the most.
{"title":"Predicting the performance of virtual reality video streaming in mobile networks","authors":"R. I. T. D. C. Filho, M. C. Luizelli, M. T. Vega, Jeroen van der Hooft, Stefano Petrangeli, T. Wauters, F. Turck, L. Gaspary","doi":"10.1145/3204949.3204966","DOIUrl":"https://doi.org/10.1145/3204949.3204966","url":null,"abstract":"The demand of Virtual Reality (VR) video streaming to mobile devices is booming, as VR becomes accessible to the general public. However, the variability of conditions of mobile networks affects the perception of this type of high-bandwidth-demanding services in unexpected ways. In this situation, there is a need for novel performance assessment models fit to the new VR applications. In this paper, we present PERCEIVE, a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks. By means of machine learning techniques, our approach is able to first predict adaptive VR video playout performance, using network Quality of Service (QoS) indicators as predictors. In a second stage, it employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality. The evaluation of PERCEIVE has been performed considering a real-world environment, in which VR videos are streamed while subjected to LTE/4G network condition. The accuracy of PERCEIVE has been assessed by means of the residual error between predicted and measured values. Our approach predicts the different performance metrics of the VR playout with an average prediction error lower than 3.7% and estimates the perceived quality with a prediction error lower than 4% for over 90% of all the tested cases. Moreover, it allows us to pinpoint the QoS conditions that affect adaptive VR streaming services the most.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117250060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The keypoint detector and descriptor Scalable Invariant Feature Transform (SIFT) [8] is famous for its ability to extract and describe keypoints in 2D images of natural scenes. It is used in ranging from object recognition to 3D reconstruction. However, SIFT is considered compute-heavy. This has led to the development of many keypoint extraction and description methods that sacrifice the wide applicability of SIFT for higher speed. We present our CUDA implementation named PopSift that does not sacrifice any detail of the SIFT algorithm, achieves a keypoint extraction and description performance that is as accurate as the best existing implementations, and runs at least 100x faster on a high-end consumer GPU than existing CPU implementations on a desktop CPU. Without any algorithmic trade-offs and short-cuts that sacrifice quality for speed, we extract at >25 fps from 1080p images with upscaling to 3840x2160 pixels on a high-end consumer GPU.
{"title":"Popsift","authors":"C. Griwodz, L. Calvet, P. Halvorsen","doi":"10.1145/3204949.3208136","DOIUrl":"https://doi.org/10.1145/3204949.3208136","url":null,"abstract":"The keypoint detector and descriptor Scalable Invariant Feature Transform (SIFT) [8] is famous for its ability to extract and describe keypoints in 2D images of natural scenes. It is used in ranging from object recognition to 3D reconstruction. However, SIFT is considered compute-heavy. This has led to the development of many keypoint extraction and description methods that sacrifice the wide applicability of SIFT for higher speed. We present our CUDA implementation named PopSift that does not sacrifice any detail of the SIFT algorithm, achieves a keypoint extraction and description performance that is as accurate as the best existing implementations, and runs at least 100x faster on a high-end consumer GPU than existing CPU implementations on a desktop CPU. Without any algorithmic trade-offs and short-cuts that sacrifice quality for speed, we extract at >25 fps from 1080p images with upscaling to 3840x2160 pixels on a high-end consumer GPU.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121991156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In multiplayer shooter games, lag compensation is used to mitigate the effects of network latency, or lag. Traditional lag compensation (TLC), however, introduces an inconsistency known as "shot behind covers" (SBC), especially to less lagged players. A few recent games ameliorate this problem by compensating only players with lag below a certain limit. This forces sufficiently lagged players to aim ahead of their targets, which is difficult and unrealistic. In this paper, we present a novel advanced lag compensation (ALC) algorithm. Based on TLC, this new algorithm retains the benefits of lag compensation but without compromising less lagged players or compensating only certain players. To evaluate ALC, we have invited players to play an FPS game we build from scratch and answer questions after each match. Comparing with TLC, ALC reduces the number of SBC by 94.1%, and a significant drop in the number of SBC reported by players during matches (p < .05) and the perceived SBC frequency collected at the end of each match (p < .05). ALC and TLC also share a similar hit registration accuracy (p = .158 and p = .18) and responsiveness (p = .317).
{"title":"Enhancing the experience of multiplayer shooter games via advanced lag compensation","authors":"Steven W. K. Lee, R. Chang","doi":"10.1145/3204949.3204971","DOIUrl":"https://doi.org/10.1145/3204949.3204971","url":null,"abstract":"In multiplayer shooter games, lag compensation is used to mitigate the effects of network latency, or lag. Traditional lag compensation (TLC), however, introduces an inconsistency known as \"shot behind covers\" (SBC), especially to less lagged players. A few recent games ameliorate this problem by compensating only players with lag below a certain limit. This forces sufficiently lagged players to aim ahead of their targets, which is difficult and unrealistic. In this paper, we present a novel advanced lag compensation (ALC) algorithm. Based on TLC, this new algorithm retains the benefits of lag compensation but without compromising less lagged players or compensating only certain players. To evaluate ALC, we have invited players to play an FPS game we build from scratch and answer questions after each match. Comparing with TLC, ALC reduces the number of SBC by 94.1%, and a significant drop in the number of SBC reported by players during matches (p < .05) and the perceived SBC frequency collected at the end of each match (p < .05). ALC and TLC also share a similar hit registration accuracy (p = .158 and p = .18) and responsiveness (p = .317).","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122774927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Song, Jiacheng Chen, R. Shea, Andy Sun, Arrvindh Shriraman, Jiangchuan Liu
The past decade has witnessed significant breakthroughs in the world of computer vision. Recent deep learning-based computer vision algorithms exhibit strong performance on recognition, detection, and segmentation. While the development of vision algorithms elicits promising applications, it also presents immense computational challenge to the underlying hardware due to its complex nature, especially when attempting to process the data at line-rate. To this end we develop a highly scalable computer vision processing framework, which leverages advanced technologies such as Spark Streaming and OpenCV to achieve line-rate video data processing. To ensure the greatest flexibility, our framework is agnostic in terms of computer vision model, and can utilize environments with heterogeneous processing devices. To evaluate this framework, we deploy it in a production cloud computing environment, and perform a thorough analysis on the system's performance. We utilize existing real-world live video streams from Simon Fraser University to measure the number of cars entering our university campus. Further, the data collected from our experiments is being used for real-time predictions of traffic conditions on campus.
{"title":"Scalable distributed visual computing for line-rate video streams","authors":"Chen Song, Jiacheng Chen, R. Shea, Andy Sun, Arrvindh Shriraman, Jiangchuan Liu","doi":"10.1145/3204949.3204974","DOIUrl":"https://doi.org/10.1145/3204949.3204974","url":null,"abstract":"The past decade has witnessed significant breakthroughs in the world of computer vision. Recent deep learning-based computer vision algorithms exhibit strong performance on recognition, detection, and segmentation. While the development of vision algorithms elicits promising applications, it also presents immense computational challenge to the underlying hardware due to its complex nature, especially when attempting to process the data at line-rate. To this end we develop a highly scalable computer vision processing framework, which leverages advanced technologies such as Spark Streaming and OpenCV to achieve line-rate video data processing. To ensure the greatest flexibility, our framework is agnostic in terms of computer vision model, and can utilize environments with heterogeneous processing devices. To evaluate this framework, we deploy it in a production cloud computing environment, and perform a thorough analysis on the system's performance. We utilize existing real-world live video streams from Simon Fraser University to measure the number of cars entering our university campus. Further, the data collected from our experiments is being used for real-time predictions of traffic conditions on campus.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131592736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuta Kudo, Hugo Zwaan, Toru Takahashi, Rui Ishiyama, P. Jonker
This paper presents a new identification system for tiny parts that have no space for applying conventional ID marking or tagging. The system marks the parts with a single dot using ink containing shiny particles. The particles in a single dot naturally form a unique pattern. The parts are then identified by matching microscopic images of this pattern with a database containing images of these dots. In this paper, we develop an automated system to conduct dotting and image capturing for mass-produced parts. Experimental results show that our "Tip-on-a-chip" system can uniquely identify more than ten thousand chip capacitors.
{"title":"Tip-on-a-chip: automatic dotting with glitter ink pen for individual identification of tiny parts","authors":"Yuta Kudo, Hugo Zwaan, Toru Takahashi, Rui Ishiyama, P. Jonker","doi":"10.1145/3204949.3208116","DOIUrl":"https://doi.org/10.1145/3204949.3208116","url":null,"abstract":"This paper presents a new identification system for tiny parts that have no space for applying conventional ID marking or tagging. The system marks the parts with a single dot using ink containing shiny particles. The particles in a single dot naturally form a unique pattern. The parts are then identified by matching microscopic images of this pattern with a database containing images of these dots. In this paper, we develop an automated system to conduct dotting and image capturing for mass-produced parts. Experimental results show that our \"Tip-on-a-chip\" system can uniquely identify more than ten thousand chip capacitors.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"37 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120925538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Fremerey, Ashutosh Singla, Kay Meseberg, Alexander Raake
In this paper, we present a viewing test with 48 subjects watching 20 different entertaining omnidirectional videos on an HTC Vive Head Mounted Display (HMD) in a task-free scenario. While the subjects were watching the contents, we recorded their head movements. The obtained dataset is publicly available in addition to the links and timestamps of the source contents used. Within this study, subjects were also asked to fill in the Simulator Sickness Questionnaire (SSQ) after every viewing session. Within this paper, at first SSQ results are presented. Several methods for evaluating head rotation data are presented and discussed. In the course of the study, the collected dataset is published along with the scripts for evaluating the head rotation data. The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent. The collected information can be presented as head-saliency maps, too. In case of videos, head-saliency data can be used for training saliency models, as information for evaluating decisions during content creation, or as part of streaming solutions for region-of-interest-specific coding as with the latest tile-based streaming solutions, as discussed also in standardization bodies such as MPEG.
{"title":"AVtrack360","authors":"S. Fremerey, Ashutosh Singla, Kay Meseberg, Alexander Raake","doi":"10.1145/3204949.3208134","DOIUrl":"https://doi.org/10.1145/3204949.3208134","url":null,"abstract":"In this paper, we present a viewing test with 48 subjects watching 20 different entertaining omnidirectional videos on an HTC Vive Head Mounted Display (HMD) in a task-free scenario. While the subjects were watching the contents, we recorded their head movements. The obtained dataset is publicly available in addition to the links and timestamps of the source contents used. Within this study, subjects were also asked to fill in the Simulator Sickness Questionnaire (SSQ) after every viewing session. Within this paper, at first SSQ results are presented. Several methods for evaluating head rotation data are presented and discussed. In the course of the study, the collected dataset is published along with the scripts for evaluating the head rotation data. The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent. The collected information can be presented as head-saliency maps, too. In case of videos, head-saliency data can be used for training saliency models, as information for evaluating decisions during content creation, or as part of streaming solutions for region-of-interest-specific coding as with the latest tile-based streaming solutions, as discussed also in standardization bodies such as MPEG.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"33 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116719166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erwan J. David, Jesús Gutiérrez, A. Coutrot, Matthieu Perreira Da Silva, P. Callet
Research on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high quality of experience to end users. The availability of public datasets is essential to support and facilitate research activities of the community. Recently, some studies have been presented analyzing exploration behaviors of people watching 360° videos, and a few datasets have been published. However, the majority of these works only consider head movements as proxy for gaze data, despite the importance of eye movements in the exploration of omnidirectional content. Thus, this paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to our previous dataset for still images [14]. Head and eye tracking data was obtained from 57 participants during a free-viewing experiment with 19 videos. In addition, guidelines on how to obtain saliency maps and scanpaths from raw data are provided. Also, some statistics related to exploration behaviors are presented, such as the impact of the longitudinal starting position when watching omnidirectional videos was investigated in this test. This dataset and its associated code are made publicly available to support research on visual attention for 360° content.
{"title":"A dataset of head and eye movements for 360° videos","authors":"Erwan J. David, Jesús Gutiérrez, A. Coutrot, Matthieu Perreira Da Silva, P. Callet","doi":"10.1145/3204949.3208139","DOIUrl":"https://doi.org/10.1145/3204949.3208139","url":null,"abstract":"Research on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high quality of experience to end users. The availability of public datasets is essential to support and facilitate research activities of the community. Recently, some studies have been presented analyzing exploration behaviors of people watching 360° videos, and a few datasets have been published. However, the majority of these works only consider head movements as proxy for gaze data, despite the importance of eye movements in the exploration of omnidirectional content. Thus, this paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to our previous dataset for still images [14]. Head and eye tracking data was obtained from 57 participants during a free-viewing experiment with 19 videos. In addition, guidelines on how to obtain saliency maps and scanpaths from raw data are provided. Also, some statistics related to exploration behaviors are presented, such as the impact of the longitudinal starting position when watching omnidirectional videos was investigated in this test. This dataset and its associated code are made publicly available to support research on visual attention for 360° content.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134448066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bentaleb, A. Begen, S. Harous, Roger Zimmermann
In streaming media, it is imperative to deliver a good viewer experience to preserve customer loyalty. Prior research has shown that this is rather difficult when shared Internet resources struggle to meet the demand from streaming clients that are largely designed to behave in their own self-interest. To date, several schemes for adaptive streaming have been proposed to address this challenge with varying success. In this paper, we take a different approach and develop a game theoretic approach. We present a practical implementation integrated in the dash.js reference player and provide substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments. Our approach outperforms its competitors in the average viewer experience by 38.5% and in video stability by 62%.
{"title":"Want to play DASH?: a game theoretic approach for adaptive streaming over HTTP","authors":"A. Bentaleb, A. Begen, S. Harous, Roger Zimmermann","doi":"10.1145/3204949.3204961","DOIUrl":"https://doi.org/10.1145/3204949.3204961","url":null,"abstract":"In streaming media, it is imperative to deliver a good viewer experience to preserve customer loyalty. Prior research has shown that this is rather difficult when shared Internet resources struggle to meet the demand from streaming clients that are largely designed to behave in their own self-interest. To date, several schemes for adaptive streaming have been proposed to address this challenge with varying success. In this paper, we take a different approach and develop a game theoretic approach. We present a practical implementation integrated in the dash.js reference player and provide substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments. Our approach outperforms its competitors in the average viewer experience by 38.5% and in video stability by 62%.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"189 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120888339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to track the users who illegally re-stream live video streams, one solution is to embed identified watermark sequences in the video segments to distinguish the users. However, since all types of watermarked segments should be prepared, the existing solutions require an extra cost of bandwidth for delivery (at least multiplying by two the required bandwidth). In this paper, we study how to reduce the inner delivery (traffic) cost of a Content Delivery Network (CDN). We propose a mechanism that reduces the number of watermarked segments that need to be encoded and delivered. We calculate the best- and worst-case traffics for two different cases: multicast and unicast. The results illustrate that even in the worst cases, the traffic with our approach is much lower than without reducing. Moreover, the watermarked sequences can still maintain uniqueness for each user. Experiments based on a real database are carried out, and illustrate that our mechanism significantly reduces traffic with respect to the current CDN practice.
{"title":"Watermarked video delivery: traffic reduction and CDN management","authors":"Kun He, P. Maillé, G. Simon","doi":"10.1145/3204949.3204964","DOIUrl":"https://doi.org/10.1145/3204949.3204964","url":null,"abstract":"In order to track the users who illegally re-stream live video streams, one solution is to embed identified watermark sequences in the video segments to distinguish the users. However, since all types of watermarked segments should be prepared, the existing solutions require an extra cost of bandwidth for delivery (at least multiplying by two the required bandwidth). In this paper, we study how to reduce the inner delivery (traffic) cost of a Content Delivery Network (CDN). We propose a mechanism that reduces the number of watermarked segments that need to be encoded and delivered. We calculate the best- and worst-case traffics for two different cases: multicast and unicast. The results illustrate that even in the worst cases, the traffic with our approach is much lower than without reducing. Moreover, the watermarked sequences can still maintain uniqueness for each user. Experiments based on a real database are carried out, and illustrate that our mechanism significantly reduces traffic with respect to the current CDN practice.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}