首页 > 最新文献

Proceedings of the 9th ACM Multimedia Systems Conference最新文献

英文 中文
Realizing the real-time gaze redirection system with convolutional neural network 用卷积神经网络实现实时注视重定向系统
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3209618
Chih-Fan Hsu, Yu-Cheng Chen, Yu-Shuen Wang, C. Lei, Kuan-Ta Chen
Retaining eye contact of remote users is a critical issue in video conferencing systems because of parallax caused by the physical distance between a screen and a camera. To achieve this objective, we present a real-time gaze redirection system called Flx-gaze to post-process each video frame before sending it to the remote end. Specifically, we relocate and relight the pixels representing eyes by using a convolutional neural network (CNN). To prevent visual artifacts during manipulation, we minimize not only the L2 loss function but also four novel loss functions when training the network. Two of them retain the rigidity of eyeballs and eyelids; and the other two prevent color discontinuity on the eye peripheries. By leveraging the CPU and the GPU resources, our implementation achieves real-time performance (i.e., 31 frames per second). Experimental results show that the gazes redirected by our system are of high quality under this restrict time constraint. We also conducted an objective evaluation of our system by measuring the peak signal-to-noise ratio (PSNR) between the real and the synthesized images.
由于屏幕和摄像机之间的物理距离造成视差,在视频会议系统中保持远程用户的目光接触是一个关键问题。为了实现这一目标,我们提出了一种称为flex -gaze的实时凝视重定向系统,在将每个视频帧发送到远程端之前对其进行后处理。具体来说,我们使用卷积神经网络(CNN)重新定位和重新点亮代表眼睛的像素。为了防止操作过程中的视觉伪影,我们在训练网络时不仅最小化了L2损失函数,还最小化了四个新的损失函数。其中两个保留了眼球和眼睑的刚性;另外两种可以防止眼睛周围的颜色不连续性。通过利用CPU和GPU资源,我们的实现实现了实时性能(即每秒31帧)。实验结果表明,在有限的时间约束下,系统重定向后的凝视具有较高的质量。我们还通过测量真实图像和合成图像之间的峰值信噪比(PSNR)对我们的系统进行了客观评价。
{"title":"Realizing the real-time gaze redirection system with convolutional neural network","authors":"Chih-Fan Hsu, Yu-Cheng Chen, Yu-Shuen Wang, C. Lei, Kuan-Ta Chen","doi":"10.1145/3204949.3209618","DOIUrl":"https://doi.org/10.1145/3204949.3209618","url":null,"abstract":"Retaining eye contact of remote users is a critical issue in video conferencing systems because of parallax caused by the physical distance between a screen and a camera. To achieve this objective, we present a real-time gaze redirection system called Flx-gaze to post-process each video frame before sending it to the remote end. Specifically, we relocate and relight the pixels representing eyes by using a convolutional neural network (CNN). To prevent visual artifacts during manipulation, we minimize not only the L2 loss function but also four novel loss functions when training the network. Two of them retain the rigidity of eyeballs and eyelids; and the other two prevent color discontinuity on the eye peripheries. By leveraging the CPU and the GPU resources, our implementation achieves real-time performance (i.e., 31 frames per second). Experimental results show that the gazes redirected by our system are of high quality under this restrict time constraint. We also conducted an objective evaluation of our system by measuring the peak signal-to-noise ratio (PSNR) between the real and the synthesized images.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125716421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mobile data offloading system for video streaming services over SDN-enabled wireless networks 在支持sdn的无线网络上用于视频流服务的移动数据卸载系统
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3204977
Donghyeok Ho, Gi Seok Park, Hwangjun Song
This work presents a mobile data offloading system for video streaming services over software-defined networking (SDN)-enabled wireless networks. The goal of the proposed system is to alleviate cellular network congestion by offloading parts of video traffic to a WiFi network while improving video quality of all users by efficiently and fairly sharing the limited long term evolution (LTE) resources. In the proposed system, SDN architecture is applied to the wireless network environment to quickly react to time-varying network conditions and finely control the amount of traffic transmitted through LTE and WiFi networks. Under the SDN-enabled wireless environment, we frame the mobile data offloading problem for video streaming services as an asymmetric Nash bargaining game to address conflict among competitive mobile users. Furthermore, we propose a resource allocation algorithm that pursues an effective trade-off between global system utility and quality-of-service fairness among users. The system is fully implemented using ONOS SDN controller and Raspberry PI-3-based mobile devices, and performance is evaluated over real wireless networks.
这项工作提出了一种移动数据卸载系统,用于支持软件定义网络(SDN)的无线网络上的视频流服务。该系统的目标是通过将部分视频流量卸载到WiFi网络来缓解蜂窝网络拥塞,同时通过有效和公平地共享有限的长期演进(LTE)资源来提高所有用户的视频质量。本系统将SDN架构应用于无线网络环境,快速响应时变网络状况,精细控制LTE和WiFi网络传输的流量。在支持sdn的无线环境下,我们将视频流服务的移动数据卸载问题构建为非对称纳什讨价还价博弈,以解决竞争激烈的移动用户之间的冲突。此外,我们提出了一种资源分配算法,该算法追求全局系统效用和用户之间的服务质量公平之间的有效权衡。该系统使用ONOS SDN控制器和基于Raspberry pi -3的移动设备完全实现,并在真实无线网络上进行了性能评估。
{"title":"Mobile data offloading system for video streaming services over SDN-enabled wireless networks","authors":"Donghyeok Ho, Gi Seok Park, Hwangjun Song","doi":"10.1145/3204949.3204977","DOIUrl":"https://doi.org/10.1145/3204949.3204977","url":null,"abstract":"This work presents a mobile data offloading system for video streaming services over software-defined networking (SDN)-enabled wireless networks. The goal of the proposed system is to alleviate cellular network congestion by offloading parts of video traffic to a WiFi network while improving video quality of all users by efficiently and fairly sharing the limited long term evolution (LTE) resources. In the proposed system, SDN architecture is applied to the wireless network environment to quickly react to time-varying network conditions and finely control the amount of traffic transmitted through LTE and WiFi networks. Under the SDN-enabled wireless environment, we frame the mobile data offloading problem for video streaming services as an asymmetric Nash bargaining game to address conflict among competitive mobile users. Furthermore, we propose a resource allocation algorithm that pursues an effective trade-off between global system utility and quality-of-service fairness among users. The system is fully implemented using ONOS SDN controller and Raspberry PI-3-based mobile devices, and performance is evaluated over real wireless networks.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133124608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SWAPUGC SWAPUGC
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3208142
Emmanouil Potetsianakis, J. L. Feuvre
Currently on the market there is a plethora of affordable dedicated cameras or smartphones, able to record video and timed geospa-tial data (device location and orientation). This timed metadata can be used to identify relevant (in time and space) recordings. However, there has not been a platform that allows to exploit this information in order to utilize the relevant recordings in an interactive consumption scenario. In this paper we present SWAPUGC, a browser-based platform for building applications that use the accompanying geospatial data to dynamically select the streams for watching an event (or any spatiotemporal reference point). The view selection can be performed either manually, or automatically by a predefined algorithm that switches to the most suitable stream according to the recording characteristics. SWAPUGC is a research tool to test such adaptation algorithms and it is provided as an open-source project, accompanied by an example demo application and references to a compatible dataset and recorder. In this paper, we explain and then demonstrate the capabilities of the platform by an example implementation and examine future prospects and extensions.
{"title":"SWAPUGC","authors":"Emmanouil Potetsianakis, J. L. Feuvre","doi":"10.1145/3204949.3208142","DOIUrl":"https://doi.org/10.1145/3204949.3208142","url":null,"abstract":"Currently on the market there is a plethora of affordable dedicated cameras or smartphones, able to record video and timed geospa-tial data (device location and orientation). This timed metadata can be used to identify relevant (in time and space) recordings. However, there has not been a platform that allows to exploit this information in order to utilize the relevant recordings in an interactive consumption scenario. In this paper we present SWAPUGC, a browser-based platform for building applications that use the accompanying geospatial data to dynamically select the streams for watching an event (or any spatiotemporal reference point). The view selection can be performed either manually, or automatically by a predefined algorithm that switches to the most suitable stream according to the recording characteristics. SWAPUGC is a research tool to test such adaptation algorithms and it is provided as an open-source project, accompanied by an example demo application and references to a compatible dataset and recorder. In this paper, we explain and then demonstrate the capabilities of the platform by an example implementation and examine future prospects and extensions.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129070723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ISIFT
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3210549
Benjamin J Hamlin, Ryan Feng, Wu-chi Feng
In computer vision, scale-invariant feature transform (SIFT) remains one of the most commonly used algorithms for feature extraction, but its high computational cost makes it hard to deploy in real-time applications. In this paper, we introduce a novel technique to restructure the inter-octave and intra-octave dependencies of SIFT's keypoint detection and description processes, allowing it to be stopped early and produce approximate results in proportion to the time for which it was allowed to run. If our algorithm is run to completion (about 0.7% longer than traditional SIFT), its results and SIFT's converge. Unlike previous approaches to real-time SIFT, we require no special hardware and make no compromises in keypoint quality, making our technique ideal for real-time and near-real-time applications on resource-constrained systems. We use standard data sets and metrics to analyze the performance of our algorithm and the quality of the generated keypoints.
{"title":"ISIFT","authors":"Benjamin J Hamlin, Ryan Feng, Wu-chi Feng","doi":"10.1145/3204949.3210549","DOIUrl":"https://doi.org/10.1145/3204949.3210549","url":null,"abstract":"In computer vision, scale-invariant feature transform (SIFT) remains one of the most commonly used algorithms for feature extraction, but its high computational cost makes it hard to deploy in real-time applications. In this paper, we introduce a novel technique to restructure the inter-octave and intra-octave dependencies of SIFT's keypoint detection and description processes, allowing it to be stopped early and produce approximate results in proportion to the time for which it was allowed to run. If our algorithm is run to completion (about 0.7% longer than traditional SIFT), its results and SIFT's converge. Unlike previous approaches to real-time SIFT, we require no special hardware and make no compromises in keypoint quality, making our technique ideal for real-time and near-real-time applications on resource-constrained systems. We use standard data sets and metrics to analyze the performance of our algorithm and the quality of the generated keypoints.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114252839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combining skeletal poses for 3D human model generation using multiple kinects 结合骨骼姿势的3D人体模型生成使用多个关节
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3204958
Kevin Desai, B. Prabhakaran, S. Raghuraman
RGB-D cameras, such as the Microsoft Kinect, provide us with the 3D information, color and depth, associated with the scene. Interactive 3D Tele-Immersion (i3DTI) systems use such RGB-D cameras to capture the person present in the scene in order to collaborate with other remote users and interact with the virtual objects present in the environment. Using a single camera, it becomes difficult to estimate an accurate skeletal pose and complete 3D model of the person, especially when the person is not in the complete view of the camera. With multiple cameras, even with partial views, it is possible to get a more accurate estimate of the skeleton of the person leading to a better and complete 3D model. In this paper, we present a real-time skeletal pose identification approach that leverages on the inaccurate skeletons of the individual Kinects, and provides a combined optimized skeleton. We estimate the Probability of an Accurate Joint (PAJ) for each joint from all of the Kinect skeletons. We determine the correct direction of the person and assign the correct joint sides for each skeleton. We then use a greedy consensus approach to combine the highly probable and accurate joints to estimate the combined skeleton. Using the individual skeletons, we segment the point clouds from all the cameras. We use the already computed PAJ values to obtain the Probability of an Accurate Bone (PAB). The individual point clouds are then combined one segment after another using the calculated PAB values. The generated combined point cloud is a complete and accurate 3D representation of the person present in the scene. We validate our estimated skeleton against two well-known methods by computing the error distance between the best view Kinect skeleton and the estimated skeleton. An exhaustive analysis is performed by using around 500000 skeletal frames in total, captured using 7 users and 7 cameras. Visual analysis is performed by checking whether the estimated skeleton is completely present within the human model. We also develop a 3D Holo-Bubble game to showcase the real-time performance of the combined skeleton and point cloud. Our results show that our method performs better than the state-of-the-art approaches that use multiple Kinects, in terms of objective error, visual quality and real-time user performance.
RGB-D相机,如微软Kinect,为我们提供与场景相关的3D信息,颜色和深度。交互式3D远程沉浸(i3DTI)系统使用这种RGB-D相机来捕捉场景中的人,以便与其他远程用户协作,并与环境中的虚拟对象进行交互。使用单个相机,很难估计准确的骨骼姿势和完整的人的3D模型,特别是当人不在相机的完整视图中时。有了多个摄像头,即使只有部分视图,也可以对人的骨骼进行更准确的估计,从而得出更好、更完整的3D模型。在本文中,我们提出了一种实时骨骼姿态识别方法,该方法利用了单个kinect的不准确骨架,并提供了一个组合优化骨架。我们从所有Kinect骨架中估算每个关节的准确关节概率(PAJ)。我们确定人的正确方向,并为每个骨骼分配正确的关节边。然后,我们使用贪婪共识方法将高概率和精确关节结合起来估计组合骨架。使用单个骨架,我们从所有摄像机中分割点云。我们使用已经计算的PAJ值来获得准确骨的概率(PAB)。然后使用计算的PAB值将单个点云一个接一个地组合起来。生成的组合点云是场景中人物的完整而准确的3D表示。我们通过计算最佳视角Kinect骨架与估计骨架之间的误差距离来验证我们的估计骨架。详尽的分析是通过使用大约500000个骨架帧进行的,使用7个用户和7台相机捕获。通过检查估计的骨骼是否完全存在于人体模型中来进行视觉分析。我们还开发了一个3D全息泡泡游戏来展示结合骨架和点云的实时性能。我们的结果表明,在客观误差、视觉质量和实时用户性能方面,我们的方法比使用多个kinect的最先进方法表现得更好。
{"title":"Combining skeletal poses for 3D human model generation using multiple kinects","authors":"Kevin Desai, B. Prabhakaran, S. Raghuraman","doi":"10.1145/3204949.3204958","DOIUrl":"https://doi.org/10.1145/3204949.3204958","url":null,"abstract":"RGB-D cameras, such as the Microsoft Kinect, provide us with the 3D information, color and depth, associated with the scene. Interactive 3D Tele-Immersion (i3DTI) systems use such RGB-D cameras to capture the person present in the scene in order to collaborate with other remote users and interact with the virtual objects present in the environment. Using a single camera, it becomes difficult to estimate an accurate skeletal pose and complete 3D model of the person, especially when the person is not in the complete view of the camera. With multiple cameras, even with partial views, it is possible to get a more accurate estimate of the skeleton of the person leading to a better and complete 3D model. In this paper, we present a real-time skeletal pose identification approach that leverages on the inaccurate skeletons of the individual Kinects, and provides a combined optimized skeleton. We estimate the Probability of an Accurate Joint (PAJ) for each joint from all of the Kinect skeletons. We determine the correct direction of the person and assign the correct joint sides for each skeleton. We then use a greedy consensus approach to combine the highly probable and accurate joints to estimate the combined skeleton. Using the individual skeletons, we segment the point clouds from all the cameras. We use the already computed PAJ values to obtain the Probability of an Accurate Bone (PAB). The individual point clouds are then combined one segment after another using the calculated PAB values. The generated combined point cloud is a complete and accurate 3D representation of the person present in the scene. We validate our estimated skeleton against two well-known methods by computing the error distance between the best view Kinect skeleton and the estimated skeleton. An exhaustive analysis is performed by using around 500000 skeletal frames in total, captured using 7 users and 7 cameras. Visual analysis is performed by checking whether the estimated skeleton is completely present within the human model. We also develop a 3D Holo-Bubble game to showcase the real-time performance of the combined skeleton and point cloud. Our results show that our method performs better than the state-of-the-art approaches that use multiple Kinects, in terms of objective error, visual quality and real-time user performance.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121116191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Subdiv17
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3208122
Maia Rohm, B. Ionescu, A. Gînsca, Rodrygo L. T. Santos, H. Müller
In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.
{"title":"Subdiv17","authors":"Maia Rohm, B. Ionescu, A. Gînsca, Rodrygo L. T. Santos, H. Müller","doi":"10.1145/3204949.3208122","DOIUrl":"https://doi.org/10.1145/3204949.3208122","url":null,"abstract":"In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114933872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SGF
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3208120
J. Heuschkel, Alexander Frömmgen
This paper presents a crowdsourced dataset of a large-scale event with more than 1000 measuring participants. The detailed dataset consists of various location data and network measurements of all national carrier collected during a four-day event. The concentrated samples for this short time period enable detailed analysis, e.g., by correlating movement patterns and experienced network conditions.
{"title":"SGF","authors":"J. Heuschkel, Alexander Frömmgen","doi":"10.1145/3204949.3208120","DOIUrl":"https://doi.org/10.1145/3204949.3208120","url":null,"abstract":"This paper presents a crowdsourced dataset of a large-scale event with more than 1000 measuring participants. The detailed dataset consists of various location data and network measurements of all national carrier collected during a four-day event. The concentrated samples for this short time period enable detailed analysis, e.g., by correlating movement patterns and experienced network conditions.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129312676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving response time interval in networked event-based mulsemedia systems 改进网络事件多媒体系统的响应时间间隔
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3204965
Estêvão Bissoli Saleme, Celso A. S. Santos, G. Ghinea
Human perception is inherently multisensory involving sight, hearing, smell, touch, and taste. Mulsemedia systems include the combination of traditional media (text, image, video, and audio) with non-traditional ones that stimulate other senses beyond sight and hearing. Whilst work has been done on some user-centred aspects that the distribution of mulsemedia data raises, such as synchronisation, and jitter, this paper tackles complementary issues that temporality constraints pose on the distribution of mulsemedia effects. It aims at improving response time interval in networked event-based mulsemedia systems based upon prior findings in this context. Thus, we reshaped the communication strategy of an open distributed mulsemedia platform called PlaySEM to work more efficiently with other event-based applications, such as games, VR/AR software, and interactive applications, wishing to stimulate other senses to increase the immersion of users. Moreover, we added lightweight communication protocols in its interface to analyse whether they reduce network overhead. To carry out the experiment, we developed mock applications for different protocols to simulate an interactive application working with the PlaySEM, measuring the delay between them. The results showed that by pre-processing sensory effects metadata before real-time communication, and selecting the appropriate protocol, response time interval in networked event-based mulsemedia systems can decrease remarkably.
人类的感知本质上是多感官的,包括视觉、听觉、嗅觉、触觉和味觉。多媒体系统包括传统媒体(文本、图像、视频和音频)与非传统媒体的结合,这些非传统媒体可以刺激视觉和听觉以外的其他感官。虽然工作已经完成了一些以用户为中心的方面,多媒体数据的分布提出,如同步和抖动,这篇论文解决了一些补充问题,即时间性限制对多媒体效果的分布。本文的目的是在前人研究的基础上改进基于事件的网络化多媒体系统的响应时间间隔。因此,我们重塑了一个名为PlaySEM的开放式分布式多媒体平台的沟通策略,以便更有效地与其他基于事件的应用程序(如游戏,VR/AR软件和交互式应用程序)一起工作,希望刺激其他感官以增加用户的沉浸感。此外,我们在其接口中添加了轻量级通信协议,以分析它们是否减少了网络开销。为了进行实验,我们为不同的协议开发了模拟应用程序,以模拟与PlaySEM一起工作的交互式应用程序,测量它们之间的延迟。结果表明,在实时通信前对感知效果元数据进行预处理,并选择合适的协议,可以显著缩短网络事件多媒体系统的响应时间间隔。
{"title":"Improving response time interval in networked event-based mulsemedia systems","authors":"Estêvão Bissoli Saleme, Celso A. S. Santos, G. Ghinea","doi":"10.1145/3204949.3204965","DOIUrl":"https://doi.org/10.1145/3204949.3204965","url":null,"abstract":"Human perception is inherently multisensory involving sight, hearing, smell, touch, and taste. Mulsemedia systems include the combination of traditional media (text, image, video, and audio) with non-traditional ones that stimulate other senses beyond sight and hearing. Whilst work has been done on some user-centred aspects that the distribution of mulsemedia data raises, such as synchronisation, and jitter, this paper tackles complementary issues that temporality constraints pose on the distribution of mulsemedia effects. It aims at improving response time interval in networked event-based mulsemedia systems based upon prior findings in this context. Thus, we reshaped the communication strategy of an open distributed mulsemedia platform called PlaySEM to work more efficiently with other event-based applications, such as games, VR/AR software, and interactive applications, wishing to stimulate other senses to increase the immersion of users. Moreover, we added lightweight communication protocols in its interface to analyse whether they reduce network overhead. To carry out the experiment, we developed mock applications for different protocols to simulate an interactive application working with the PlaySEM, measuring the delay between them. The results showed that by pre-processing sensory effects metadata before real-time communication, and selecting the appropriate protocol, response time interval in networked event-based mulsemedia systems can decrease remarkably.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127152916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast and easy live video service setup using lightweight virtualization 使用轻量级虚拟化快速简便的实时视频服务设置
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3208112
A. Heikkinen, P. Pääkkönen, Marko Viitanen, Jarno Vanne, Tommi Riikonen, K. Bakanoglu
The service broker provides service providers with virtualized services that can be initialized rapidly and scaled up or down on demand. This demonstration paper describes how a service provider can set up a new video distribution service to end users with a diminutive effort. Our proposal makes use of Docker lightweight virtualization technologies that pack services in containers. This makes it possible to implement video coding and content delivery networks that are scalable and consume resources only when needed. The demonstration showcases a scenario where a video service provider sets up a new live video distribution service to end users. After the setup, live 720p30 video camera feed is encoded in real-time, streamed in HEVC MPEG-DASH format over CDN network, and accessed with a HbbTV compatible set-top-box. This end-to-end system illustrates that virtualization causes no significant resource or performance overhead but is a perfect match for online video services.
服务代理为服务提供者提供虚拟化服务,这些服务可以快速初始化,并根据需要伸缩。本演示文件描述了服务提供商如何以最小的工作量为最终用户建立新的视频分发服务。我们的建议使用Docker轻量级虚拟化技术,将服务打包到容器中。这使得实现视频编码和内容交付网络成为可能,这些网络是可扩展的,并且只在需要时消耗资源。该演示展示了视频服务提供商为最终用户设置新的实时视频分发服务的场景。设置完成后,720p30视频摄像机实时馈送编码,通过CDN网络以HEVC MPEG-DASH格式流式传输,并通过兼容HbbTV的机顶盒访问。这个端到端系统说明,虚拟化不会造成显著的资源或性能开销,但非常适合在线视频服务。
{"title":"Fast and easy live video service setup using lightweight virtualization","authors":"A. Heikkinen, P. Pääkkönen, Marko Viitanen, Jarno Vanne, Tommi Riikonen, K. Bakanoglu","doi":"10.1145/3204949.3208112","DOIUrl":"https://doi.org/10.1145/3204949.3208112","url":null,"abstract":"The service broker provides service providers with virtualized services that can be initialized rapidly and scaled up or down on demand. This demonstration paper describes how a service provider can set up a new video distribution service to end users with a diminutive effort. Our proposal makes use of Docker lightweight virtualization technologies that pack services in containers. This makes it possible to implement video coding and content delivery networks that are scalable and consume resources only when needed. The demonstration showcases a scenario where a video service provider sets up a new live video distribution service to end users. After the setup, live 720p30 video camera feed is encoded in real-time, streamed in HEVC MPEG-DASH format over CDN network, and accessed with a HbbTV compatible set-top-box. This end-to-end system illustrates that virtualization causes no significant resource or performance overhead but is a perfect match for online video services.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127698952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic input anomaly detection in interactive multimedia services 交互式多媒体业务中的动态输入异常检测
Pub Date : 2018-06-12 DOI: 10.1145/3204949.3204954
M. Shatnawi, M. Hefeeda
Multimedia services like Skype, WhatsApp, and Google Hangouts have strict Service Level Agreements (SLAs). These services attempt to address the root causes of SLA violations through techniques such as detecting anomalies in the inputs of the services. The key problem with current anomaly detection and handling techniques is that they can't adapt to service changes in real-time. In current techniques, historic data from prior runs of the service are used to identify anomalies in the service inputs like number of concurrent users, and system states like CPU utilization. These techniques do not evaluate the current impact of anomalies on the service. Thus, they may raise alerts and take corrective measures even if the detected anomalies do not cause SLA violations. Alerts are expensive to handle from a system and engineering support perspectives, and should be raised only if necessary. We propose a dynamic approach for handling service input and system state anomalies in multimedia services in real-time, by evaluating the impact of anomalies, independently and associatively, on the service outputs. Our proposed approach alerts and takes corrective measures like capacity allocations if the detected anomalies result in SLA violations. We implement our approach in a large-scale operational multimedia service, and show that it increases anomaly detection accuracy by 31%, reduces anomaly alerting false positives by 71%, false negatives by 69%, and enhances media sharing quality by 14%.
Skype、WhatsApp和Google Hangouts等多媒体服务都有严格的服务级别协议(sla)。这些服务试图通过检测服务输入中的异常等技术来解决违反SLA的根本原因。当前异常检测和处理技术的关键问题是不能实时适应业务的变化。在当前的技术中,使用来自先前运行的服务的历史数据来识别服务输入中的异常情况(如并发用户数)和系统状态(如CPU利用率)。这些技术不评估当前异常对服务的影响。因此,即使检测到的异常不会导致SLA违规,他们也可能会发出警报并采取纠正措施。从系统和工程支持的角度来看,处理警报的成本很高,只有在必要时才应该提出警报。我们提出了一种动态的方法,通过评估异常对服务输出的影响,独立地和关联地实时处理多媒体服务中的服务输入和系统状态异常。如果检测到的异常导致违反SLA,我们建议的方法会发出警报并采取纠正措施,如容量分配。我们在大型运营多媒体服务中实现了我们的方法,并表明它将异常检测准确率提高了31%,将异常报警误报率降低了71%,误报率降低了69%,并将媒体共享质量提高了14%。
{"title":"Dynamic input anomaly detection in interactive multimedia services","authors":"M. Shatnawi, M. Hefeeda","doi":"10.1145/3204949.3204954","DOIUrl":"https://doi.org/10.1145/3204949.3204954","url":null,"abstract":"Multimedia services like Skype, WhatsApp, and Google Hangouts have strict Service Level Agreements (SLAs). These services attempt to address the root causes of SLA violations through techniques such as detecting anomalies in the inputs of the services. The key problem with current anomaly detection and handling techniques is that they can't adapt to service changes in real-time. In current techniques, historic data from prior runs of the service are used to identify anomalies in the service inputs like number of concurrent users, and system states like CPU utilization. These techniques do not evaluate the current impact of anomalies on the service. Thus, they may raise alerts and take corrective measures even if the detected anomalies do not cause SLA violations. Alerts are expensive to handle from a system and engineering support perspectives, and should be raised only if necessary. We propose a dynamic approach for handling service input and system state anomalies in multimedia services in real-time, by evaluating the impact of anomalies, independently and associatively, on the service outputs. Our proposed approach alerts and takes corrective measures like capacity allocations if the detected anomalies result in SLA violations. We implement our approach in a large-scale operational multimedia service, and show that it increases anomaly detection accuracy by 31%, reduces anomaly alerting false positives by 71%, false negatives by 69%, and enhances media sharing quality by 14%.","PeriodicalId":141196,"journal":{"name":"Proceedings of the 9th ACM Multimedia Systems Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130442557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the 9th ACM Multimedia Systems Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1