首页 > 最新文献

2020 IEEE International Symposium on Multimedia (ISM)最新文献

英文 中文
REP-Model: A deep learning framework for replacing ad billboards in soccer videos REP-Model:一个深度学习框架,用于替换足球视频中的广告牌
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00032
V. Ghassab, Kamal Maanicshah, N. Bouguila, Paul Green
In this paper, we propose a novel framework for replacing advertisement contents in soccer videos with an automatic way by using deep learning strategies. We begin by applying UNET (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), we will replace the unwanted content by new one using a homography mapping procedure. Furthermore, the replacement key points in each frame will be tracked into the next frames considering the camera zoom-in and zoom-out controlling. Since the movement of objects in video can disrupt the alignment between frames and correspondingly make the homography matrix calculation erroneous, we use Mask R-CNN to mask and remove the moving objects from the scene. Such framework is denominated as REP-Model which stands for a replacing model.
在本文中,我们提出了一个新的框架,利用深度学习策略自动替换足球视频中的广告内容。我们首先应用UNET(一种图像分割卷积神经网络技术)进行内容分割和检测。随后,在重建视频帧中的分割内容后(考虑到检测中的明显损失),我们将使用单应性映射过程将不需要的内容替换为新的内容。此外,考虑到摄像机的放大和缩小控制,每帧中的替换关键点将被跟踪到下一帧。由于视频中物体的运动可能会破坏帧之间的对齐,从而导致单应性矩阵计算错误,因此我们使用Mask R-CNN对场景中的运动物体进行掩码和移除。这种框架被命名为REP-Model,代表替换模型。
{"title":"REP-Model: A deep learning framework for replacing ad billboards in soccer videos","authors":"V. Ghassab, Kamal Maanicshah, N. Bouguila, Paul Green","doi":"10.1109/ISM.2020.00032","DOIUrl":"https://doi.org/10.1109/ISM.2020.00032","url":null,"abstract":"In this paper, we propose a novel framework for replacing advertisement contents in soccer videos with an automatic way by using deep learning strategies. We begin by applying UNET (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), we will replace the unwanted content by new one using a homography mapping procedure. Furthermore, the replacement key points in each frame will be tracked into the next frames considering the camera zoom-in and zoom-out controlling. Since the movement of objects in video can disrupt the alignment between frames and correspondingly make the homography matrix calculation erroneous, we use Mask R-CNN to mask and remove the moving objects from the scene. Such framework is denominated as REP-Model which stands for a replacing model.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116452157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Multi-View Live Video Streaming for Teledriving Using a Single Hardware Encoder 自适应多视图实时视频流电视驾驶使用单一硬件编码器
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00008
M. Hofbauer, Christopher B. Kuhn, G. Petrovic, E. Steinbach
Teleoperated driving (TOD) is a possible solution to cope with failures of autonomous vehicles. In TOD, the human operator perceives the traffic situation via video streams of multiple cameras from a remote location. Adaptation mechanisms are needed in order to match the available transmission resources and provide the operator with the best possible situation awareness. This includes the adjustment of individual camera video streams according to the current traffic situation. The limited video encoding hardware in vehicles requires the combination of individual camera frames into a larger superframe video. While this enables the encoding of multiple camera views with a single encoder, it does not allow for rate/quality adaptation of the individual views. To this end, we propose a novel concept that uses preprocessing filters to enable individual rate/quality adaptations in the superframe video. The proposed preprocessing filters allow for the usage of existing multidimensional adaptation models in the same way as for individual video streams using multiple encoders. Our experiments confirm that the proposed concept is able to control the spatial, temporal and quality resolution of individual segments in the superframe video. Additionally, we demonstrate the usability of the proposed method by applying it in a multi-view teledriving scenario. We compare our approach to individually encoded video streams and a multiplexing solution without preprocessing. The results show that the proposed approach produces bitrates for the individual video streams which are comparable to the bitrates achieved with separate encoders. While achieving a similar bitrate for the most important views, our approach requires a total bitrate that is 40% smaller compared to the multiplexing approach without preprocessing.
远程操作驾驶(TOD)是应对自动驾驶汽车故障的一种可能的解决方案。在TOD中,人类操作员通过来自远程位置的多个摄像头的视频流来感知交通状况。为了匹配可用的传输资源并为运营商提供最佳的态势感知,需要自适应机制。这包括根据当前交通状况调整单个摄像机视频流。车辆中有限的视频编码硬件需要将单个摄像机帧组合成更大的超帧视频。虽然这样可以用一个编码器对多个摄像机视图进行编码,但它不允许对单个视图进行速率/质量调整。为此,我们提出了一种新的概念,即使用预处理滤波器在超帧视频中实现个人速率/质量调整。所提出的预处理过滤器允许以与使用多个编码器的单个视频流相同的方式使用现有的多维自适应模型。我们的实验证实了所提出的概念能够控制超帧视频中单个片段的空间、时间和质量分辨率。此外,我们通过将其应用于多视图电视驾驶场景来证明所提出方法的可用性。我们将我们的方法与单独编码的视频流和没有预处理的多路复用解决方案进行比较。结果表明,该方法产生的单个视频流的比特率与使用单独编码器获得的比特率相当。虽然对于最重要的视图实现了类似的比特率,但我们的方法需要的总比特率比没有预处理的多路复用方法小40%。
{"title":"Adaptive Multi-View Live Video Streaming for Teledriving Using a Single Hardware Encoder","authors":"M. Hofbauer, Christopher B. Kuhn, G. Petrovic, E. Steinbach","doi":"10.1109/ISM.2020.00008","DOIUrl":"https://doi.org/10.1109/ISM.2020.00008","url":null,"abstract":"Teleoperated driving (TOD) is a possible solution to cope with failures of autonomous vehicles. In TOD, the human operator perceives the traffic situation via video streams of multiple cameras from a remote location. Adaptation mechanisms are needed in order to match the available transmission resources and provide the operator with the best possible situation awareness. This includes the adjustment of individual camera video streams according to the current traffic situation. The limited video encoding hardware in vehicles requires the combination of individual camera frames into a larger superframe video. While this enables the encoding of multiple camera views with a single encoder, it does not allow for rate/quality adaptation of the individual views. To this end, we propose a novel concept that uses preprocessing filters to enable individual rate/quality adaptations in the superframe video. The proposed preprocessing filters allow for the usage of existing multidimensional adaptation models in the same way as for individual video streams using multiple encoders. Our experiments confirm that the proposed concept is able to control the spatial, temporal and quality resolution of individual segments in the superframe video. Additionally, we demonstrate the usability of the proposed method by applying it in a multi-view teledriving scenario. We compare our approach to individually encoded video streams and a multiplexing solution without preprocessing. The results show that the proposed approach produces bitrates for the individual video streams which are comparable to the bitrates achieved with separate encoders. While achieving a similar bitrate for the most important views, our approach requires a total bitrate that is 40% smaller compared to the multiplexing approach without preprocessing.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122322676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Closing-the-Loop: A Data-Driven Framework for Effective Video Summarization 闭环:有效视频摘要的数据驱动框架
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00042
Ran Xu, Haoliang Wang, Stefano Petrangeli, Viswanathan Swaminathan, S. Bagchi
Today, videos are the primary way in which information is shared over the Internet. Given the huge popularity of video sharing platforms, it is imperative to make videos engaging for the end-users. Content creators rely on their own experience to create engaging short videos starting from the raw content. Several approaches have been proposed in the past to assist creators in the summarization process. However, it is hard to quantify the effect of these edits on the end-user engagement. Moreover, the availability of video consumption data has opened the possibility to predict the effectiveness of a video before it is published. In this paper, we propose a novel framework to close the feedback loop between automatic video summarization and its data-driven evaluation. Our Closing-The-Loop framework is composed of two main steps that are repeated iteratively. Given an input video, we first generate a set of initial video summaries. Second, we predict the effectiveness of the generated variants based on a data-driven model trained on users' video consumption data. We employ a genetic algorithm to search the space of possible summaries (i.e., adding/removing shots to the video) in an efficient way, where only those variants with the highest predicted performance are allowed to survive and generate new variants in their place. Our results show that the proposed framework can improve the effectiveness of the generated summaries with minimal computation overhead compared to a baseline solution - 28.3% more video summaries are in the highest effectiveness class than those in the baseline.
今天,视频是在互联网上分享信息的主要方式。鉴于视频分享平台的巨大人气,制作吸引终端用户的视频势在必行。内容创作者依靠自己的经验,从原始内容开始制作引人入胜的短视频。过去已经提出了几种方法来帮助创建者进行总结过程。然而,很难量化这些修改对最终用户参与度的影响。此外,视频消费数据的可用性使得在视频发布之前预测其有效性成为可能。在本文中,我们提出了一个新的框架来关闭自动视频摘要与其数据驱动评估之间的反馈回路。我们的闭环框架由迭代重复的两个主要步骤组成。给定一个输入视频,我们首先生成一组初始视频摘要。其次,我们基于用户视频消费数据训练的数据驱动模型预测生成的变体的有效性。我们使用遗传算法以一种有效的方式搜索可能的摘要空间(即,在视频中添加/删除镜头),其中只有那些具有最高预测性能的变体被允许生存并在其位置生成新的变体。我们的结果表明,与基线解决方案相比,所提出的框架可以以最小的计算开销提高生成摘要的有效性-最高有效性类别的视频摘要比基线中的视频摘要多28.3%。
{"title":"Closing-the-Loop: A Data-Driven Framework for Effective Video Summarization","authors":"Ran Xu, Haoliang Wang, Stefano Petrangeli, Viswanathan Swaminathan, S. Bagchi","doi":"10.1109/ISM.2020.00042","DOIUrl":"https://doi.org/10.1109/ISM.2020.00042","url":null,"abstract":"Today, videos are the primary way in which information is shared over the Internet. Given the huge popularity of video sharing platforms, it is imperative to make videos engaging for the end-users. Content creators rely on their own experience to create engaging short videos starting from the raw content. Several approaches have been proposed in the past to assist creators in the summarization process. However, it is hard to quantify the effect of these edits on the end-user engagement. Moreover, the availability of video consumption data has opened the possibility to predict the effectiveness of a video before it is published. In this paper, we propose a novel framework to close the feedback loop between automatic video summarization and its data-driven evaluation. Our Closing-The-Loop framework is composed of two main steps that are repeated iteratively. Given an input video, we first generate a set of initial video summaries. Second, we predict the effectiveness of the generated variants based on a data-driven model trained on users' video consumption data. We employ a genetic algorithm to search the space of possible summaries (i.e., adding/removing shots to the video) in an efficient way, where only those variants with the highest predicted performance are allowed to survive and generate new variants in their place. Our results show that the proposed framework can improve the effectiveness of the generated summaries with minimal computation overhead compared to a baseline solution - 28.3% more video summaries are in the highest effectiveness class than those in the baseline.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CooPEC: Cooperative Prefetching and Edge Caching for Adaptive 360° Video Streaming CooPEC:自适应360°视频流的协同预取和边缘缓存
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00019
A. Mahzari, Aliehsan Samiei, R. Prakash
Dynamic Adaptive Streaming over HTTP (DASH) has emerged as the de facto solution for streaming 360°videos. Viewers of 360° videos view only a fraction of each video segment, i.e., the part that corresponds to their Field of View (FoV). To facilitate FoV-adaptive streaming, a segment can be divided into multiple tiles with the FoV corresponding to a subset of tiles. Streaming each segment in its entirety from the video server to a client can incur high communication overheads both in terms of bandwidth and latency. Caching at the network edge can reduce these overheads. However, as edge cache capacity is limited, only a subset of tiles encoded at a subset of supported resolutions may be present in the cache. A viewer, depending on its FoV,may experience cache hit and low download latency for some segments, and a cache miss resulting in high download latency from video server for other segments. This can result in the DASH client unnecessarily triggering quality switches for the following reason: low (high) latency download from edge cache (server, respectively) may be misinterpreted as high (low, respectively) network throughput estimate. In this paper, we propose CooPEC (COOperative Prefetching and Edge Caching), a prefetching and complementary caching solution which uses viewers' FoV entropy to: (i) enable a bitrate oscillation-free video streaming, (ii) reduce core network bandwidth consumption, and (iii) enhance QoE for users.
基于HTTP的动态自适应流媒体(DASH)已经成为360°视频流媒体的实际解决方案。观看360°视频的观众只能看到每个视频片段的一小部分,即与他们的视场(FoV)相对应的部分。为了促进FoV自适应流,可以将一个片段划分为多个tile,其中FoV对应于tile的子集。将每个片段完整地从视频服务器流式传输到客户端可能会在带宽和延迟方面产生很高的通信开销。网络边缘的缓存可以减少这些开销。但是,由于边缘缓存容量有限,因此缓存中可能只存在以支持的分辨率子集编码的块的子集。根据其FoV,查看器可能会遇到某些片段的缓存命中和较低的下载延迟,而缓存丢失会导致视频服务器对其他片段的高下载延迟。这可能导致DASH客户端不必要地触发质量开关,原因如下:来自边缘缓存(分别为服务器)的低(高)延迟下载可能被误解为高(低)网络吞吐量估计。在本文中,我们提出了CooPEC(合作预取和边缘缓存),这是一种预取和互补缓存解决方案,它利用观众的视场熵来:(i)实现无比特率振荡的视频流,(ii)减少核心网络带宽消耗,(iii)增强用户的QoE。
{"title":"CooPEC: Cooperative Prefetching and Edge Caching for Adaptive 360° Video Streaming","authors":"A. Mahzari, Aliehsan Samiei, R. Prakash","doi":"10.1109/ISM.2020.00019","DOIUrl":"https://doi.org/10.1109/ISM.2020.00019","url":null,"abstract":"Dynamic Adaptive Streaming over HTTP (DASH) has emerged as the de facto solution for streaming 360°videos. Viewers of 360° videos view only a fraction of each video segment, i.e., the part that corresponds to their Field of View (FoV). To facilitate FoV-adaptive streaming, a segment can be divided into multiple tiles with the FoV corresponding to a subset of tiles. Streaming each segment in its entirety from the video server to a client can incur high communication overheads both in terms of bandwidth and latency. Caching at the network edge can reduce these overheads. However, as edge cache capacity is limited, only a subset of tiles encoded at a subset of supported resolutions may be present in the cache. A viewer, depending on its FoV,may experience cache hit and low download latency for some segments, and a cache miss resulting in high download latency from video server for other segments. This can result in the DASH client unnecessarily triggering quality switches for the following reason: low (high) latency download from edge cache (server, respectively) may be misinterpreted as high (low, respectively) network throughput estimate. In this paper, we propose CooPEC (COOperative Prefetching and Edge Caching), a prefetching and complementary caching solution which uses viewers' FoV entropy to: (i) enable a bitrate oscillation-free video streaming, (ii) reduce core network bandwidth consumption, and (iii) enhance QoE for users.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"2673 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134003144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
AR40ER: A Semantic Platform for Open Educational Augmented Reality Resources AR40ER:开放教育增强现实资源的语义平台
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00047
Christian Grévisse, C. Gomes, S. Rothkugel
Tablet computers are gaining in presence in modern-day classrooms, enabling the use of a variety of apps for purposes such as note-taking or assessment. Augmented Reality (AR) experiences in the classroom, made possible by current hardware, permit new ways of interaction and visualization, as well as increase student motivation and engagement. They also overcome the need for potentially expensive hardware required for experiments in certain scientific domains. The movement of Open Educational Resources (OER) has enabled the sharing of heterogeneous learning resources. Their retrieval can be improved by enriching their metadata using Semantic Web technologies. In this paper, we present AR40ER, a semantic platform for heterogeneous AR experiences provided as OER. We showcase four AR scenarios from different school subjects. These scenarios can be integrated through a lose coupling in third-party apps. Apart from describing how this integration works, we demonstrate how a note-taking app can benefit from these scenarios.
平板电脑越来越多地出现在现代教室中,使各种应用程序的使用成为可能,如记笔记或评估。增强现实(AR)在课堂上的体验,通过当前的硬件实现,允许新的互动和可视化方式,以及提高学生的积极性和参与度。它们还克服了对某些科学领域实验所需的潜在昂贵硬件的需求。开放教育资源(OER)运动使异构学习资源的共享成为可能。可以通过使用语义Web技术丰富它们的元数据来改进它们的检索。在本文中,我们提出了AR40ER,作为OER提供异构AR体验的语义平台。我们展示了来自不同学校科目的四个AR场景。这些场景可以通过在第三方应用程序中的失耦合来集成。除了描述这种集成是如何工作的,我们还演示了笔记应用程序如何从这些场景中受益。
{"title":"AR40ER: A Semantic Platform for Open Educational Augmented Reality Resources","authors":"Christian Grévisse, C. Gomes, S. Rothkugel","doi":"10.1109/ISM.2020.00047","DOIUrl":"https://doi.org/10.1109/ISM.2020.00047","url":null,"abstract":"Tablet computers are gaining in presence in modern-day classrooms, enabling the use of a variety of apps for purposes such as note-taking or assessment. Augmented Reality (AR) experiences in the classroom, made possible by current hardware, permit new ways of interaction and visualization, as well as increase student motivation and engagement. They also overcome the need for potentially expensive hardware required for experiments in certain scientific domains. The movement of Open Educational Resources (OER) has enabled the sharing of heterogeneous learning resources. Their retrieval can be improved by enriching their metadata using Semantic Web technologies. In this paper, we present AR40ER, a semantic platform for heterogeneous AR experiences provided as OER. We showcase four AR scenarios from different school subjects. These scenarios can be integrated through a lose coupling in third-party apps. Apart from describing how this integration works, we demonstrate how a note-taking app can benefit from these scenarios.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132911447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redefine the A in ABR for 360-degree Videos: A Flexible ABR Framework 重新定义360度视频ABR中的A:一个灵活的ABR框架
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00020
Kuan-Ying Lee, Andrew Yoo, Jounsup Park, K. Nahrstedt
360-degree video has been popular due to the immersive experience it provides to the viewer. While watching, viewer can control the field of view (FoV)11In this paper, we use viewport interchangeably with FoV in the range of 360° by 180°. As this trend continues, adaptive bitrate (ABR) streaming is becoming a prevalent issue. Most existing ABR algorithms for 360 videos (360 ABR algorithms) require real-time head traces and certain computation resource from the client for streaming, which largely constrains the range of audience. Also, while more 360 ABR algorithms rely upon machine learning (ML) for viewport prediction, ML and ABR are research topics that grow mostly independently. In this paper, we propose a two-fold ABR algorithm for 360 video streaming that utilizes 1) an off-the-shelf ABR algorithm for ordinary videos, and 2) an off-the-shelf viewport prediction model. Our algorithm requires neither real-time head traces nor additional computation from the viewing device. In addition, it adapts easily to the newest developments in viewport prediction and ABR. As a consequence, the proposed method fits nicely to the existing streaming framework and any advancement in viewport prediction and ABR could enhance its performance. With the quantitative experiments, we demonstrate that the proposed method achieves twice the quality of experience (QoE) compared to the baseline.
360度视频之所以受欢迎,是因为它为观众提供了身临其境的体验。在观看时,观看者可以控制视场(FoV)11在本文中,我们在360°× 180°的范围内交替使用viewport和FoV。随着这一趋势的持续,自适应比特率(ABR)流正在成为一个普遍的问题。现有的360度视频ABR算法(360 ABR算法)大多需要实时的头部跟踪和一定的客户端计算资源来进行流媒体,这在很大程度上限制了观众的范围。此外,虽然更多的360 ABR算法依赖于机器学习(ML)进行视口预测,但ML和ABR是主要独立发展的研究课题。在本文中,我们提出了一种用于360视频流的双重ABR算法,该算法利用1)普通视频的现成ABR算法,以及2)现成的视口预测模型。我们的算法既不需要实时头部轨迹,也不需要观看设备的额外计算。此外,它可以很容易地适应viewport预测和ABR的最新发展。结果表明,该方法能够很好地适应现有的流媒体框架,并且在视口预测和ABR方面的任何进步都可以提高其性能。通过定量实验,我们证明了该方法的经验质量(QoE)是基线的两倍。
{"title":"Redefine the A in ABR for 360-degree Videos: A Flexible ABR Framework","authors":"Kuan-Ying Lee, Andrew Yoo, Jounsup Park, K. Nahrstedt","doi":"10.1109/ISM.2020.00020","DOIUrl":"https://doi.org/10.1109/ISM.2020.00020","url":null,"abstract":"360-degree video has been popular due to the immersive experience it provides to the viewer. While watching, viewer can control the field of view (FoV)11In this paper, we use viewport interchangeably with FoV in the range of 360° by 180°. As this trend continues, adaptive bitrate (ABR) streaming is becoming a prevalent issue. Most existing ABR algorithms for 360 videos (360 ABR algorithms) require real-time head traces and certain computation resource from the client for streaming, which largely constrains the range of audience. Also, while more 360 ABR algorithms rely upon machine learning (ML) for viewport prediction, ML and ABR are research topics that grow mostly independently. In this paper, we propose a two-fold ABR algorithm for 360 video streaming that utilizes 1) an off-the-shelf ABR algorithm for ordinary videos, and 2) an off-the-shelf viewport prediction model. Our algorithm requires neither real-time head traces nor additional computation from the viewing device. In addition, it adapts easily to the newest developments in viewport prediction and ABR. As a consequence, the proposed method fits nicely to the existing streaming framework and any advancement in viewport prediction and ABR could enhance its performance. With the quantitative experiments, we demonstrate that the proposed method achieves twice the quality of experience (QoE) compared to the baseline.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132021475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Live Demonstration: Interactive Quality of Experience Evaluation in Kvazzup Video Call 现场演示:Kvazzup视频通话中的交互体验质量评估
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00011
Joni Räsänen, Aaro Altonen, Alexandre Mercat, Jarno Vanne
This paper presents an interactive demonstration setup, which allows users to configure the video coding parameters of Kvazzup open-source video call software at runtime and evaluate their impact on Quality of Service (QoS) and Quality of Experience (QoE). The demonstration is carried out by implementing a new Kvazzup control panel for video call parameterization and visual quality, bit rate, latency, and frame rate evaluation.
本文提出了一个交互式演示设置,允许用户在运行时配置Kvazzup开源视频呼叫软件的视频编码参数,并评估其对服务质量(QoS)和体验质量(QoE)的影响。演示是通过实现一个新的Kvazzup控制面板,用于视频通话参数化和视觉质量、比特率、延迟和帧率评估。
{"title":"Live Demonstration: Interactive Quality of Experience Evaluation in Kvazzup Video Call","authors":"Joni Räsänen, Aaro Altonen, Alexandre Mercat, Jarno Vanne","doi":"10.1109/ISM.2020.00011","DOIUrl":"https://doi.org/10.1109/ISM.2020.00011","url":null,"abstract":"This paper presents an interactive demonstration setup, which allows users to configure the video coding parameters of Kvazzup open-source video call software at runtime and evaluate their impact on Quality of Service (QoS) and Quality of Experience (QoE). The demonstration is carried out by implementing a new Kvazzup control panel for video call parameterization and visual quality, bit rate, latency, and frame rate evaluation.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127895473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audiovisual, Genre, Neural and Topical Textual Embeddings for TV Programme Content Representation 电视节目内容表示的视听、体裁、神经和主题文本嵌入
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00041
Saba Nazir, Taner Cagali, M. Sadrzadeh, Chris Newell
TV programmes have their contents described by multiple means: textual subtitles, audiovisual files, and metadata such as genres. In order to represent these contents, we develop vectorial representations for their low-level multimodal features, group them with simple clustering techniques, and combine them using middle and late fusion. For textual features, we use LSI and Doc2Vec neural embeddings; for audio, MFCC's and Bags of Audio Words; for visual, SIFT, and Bags of Visual Words. We apply our model to a dataset of BBC TV programmes and use a standard recommender and pairwise similarity matrices of content vectors to estimate viewers' behaviours. The late fusion of genre, audio and video vectors with both of the textual embeddings significantly increase the precision and diversity of the results.
电视节目的内容有多种描述方式:文本字幕、视听文件和元数据(如类型)。为了表示这些内容,我们对它们的低级多模态特征进行了向量表示,用简单的聚类技术对它们进行分组,并使用中后期融合对它们进行组合。对于文本特征,我们使用LSI和Doc2Vec神经嵌入;音频,MFCC和音频单词袋;用于视觉,SIFT和视觉单词袋。我们将我们的模型应用于BBC电视节目的数据集,并使用标准推荐和内容向量的两两相似矩阵来估计观众的行为。流派、音频和视频向量与两种文本嵌入的后期融合显著提高了结果的精度和多样性。
{"title":"Audiovisual, Genre, Neural and Topical Textual Embeddings for TV Programme Content Representation","authors":"Saba Nazir, Taner Cagali, M. Sadrzadeh, Chris Newell","doi":"10.1109/ISM.2020.00041","DOIUrl":"https://doi.org/10.1109/ISM.2020.00041","url":null,"abstract":"TV programmes have their contents described by multiple means: textual subtitles, audiovisual files, and metadata such as genres. In order to represent these contents, we develop vectorial representations for their low-level multimodal features, group them with simple clustering techniques, and combine them using middle and late fusion. For textual features, we use LSI and Doc2Vec neural embeddings; for audio, MFCC's and Bags of Audio Words; for visual, SIFT, and Bags of Visual Words. We apply our model to a dataset of BBC TV programmes and use a standard recommender and pairwise similarity matrices of content vectors to estimate viewers' behaviours. The late fusion of genre, audio and video vectors with both of the textual embeddings significantly increase the precision and diversity of the results.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116342373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Llama - Low Latency Adaptive Media Algorithm Llama -低延迟自适应媒体算法
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00027
Tomasz Lyko, M. Broadbent, N. Race, M. Nilsson, Paul Farrow, S. Appleby
In the recent years, HTTP Adaptive Bit Rate (ABR) streaming including Dynamic Adaptive Streaming over HTTP (DASH) has become the most popular technology for video streaming over the Internet. The client device requests segments of content using HTTP, with an ABR algorithm selecting the quality at which to request each segment to trade-off video quality with the avoidance of stalling. This introduces high latency compared to traditional broadcast methods, mostly in the client buffer which needs to hold enough data to absorb any changes in network conditions. Clients employ an ABR algorithm which monitors network conditions and adjusts the quality at which segments are requested to maximise the user's Quality of Experience. The size of the client buffer depends on the ABR algorithm's capability to respond to changes in network conditions in a timely manner, hence, low latency live streaming requires an ABR algorithm that can perform well with a small client buffer. In this paper, we present Llama - a new ABR algorithm specifically designed to operate in such scenarios. Our new ABR algorithm employs the novel idea of using two independent throughput measurements made over different timescales. We have evaluated Llama by comparing it against four popular ABR algorithms in terms of multiple QoE metrics, across multiple client settings, and in various network scenarios based on CDN logs of a commercial live TV service. Llama outperforms other ABR algorithms, improving the P.1203 Mean Opinion Score (MOS) as well as reducing rebuffering by 33% when using DASH, and 68% with CMAF in the lowest latency scenario.
近年来,包括DASH (Dynamic Adaptive streaming over HTTP)在内的HTTP自适应比特率(ABR)流已经成为互联网上最流行的视频流技术。客户端设备使用HTTP请求内容片段,使用ABR算法选择请求每个片段的质量,以权衡视频质量并避免延迟。与传统的广播方法相比,这带来了高延迟,主要是在客户端缓冲区中,它需要保存足够的数据来吸收网络条件中的任何变化。客户端采用ABR算法,该算法监控网络条件并调整要求分段的质量,以最大限度地提高用户的体验质量。客户端缓冲区的大小取决于ABR算法及时响应网络条件变化的能力,因此,低延迟直播需要ABR算法能够在较小的客户端缓冲区中表现良好。在本文中,我们提出了Llama -一种专门设计用于此类场景的新型ABR算法。我们的新ABR算法采用了在不同时间尺度上使用两个独立吞吐量测量的新思想。我们通过将Llama与四种流行的ABR算法在多个QoE指标、跨多个客户端设置以及基于商业直播电视服务的CDN日志的各种网络场景中进行比较,对其进行了评估。Llama优于其他ABR算法,提高了P.1203平均意见评分(MOS),并且在使用DASH时减少了33%的再缓冲,在最低延迟情况下使用CMAF减少了68%。
{"title":"Llama - Low Latency Adaptive Media Algorithm","authors":"Tomasz Lyko, M. Broadbent, N. Race, M. Nilsson, Paul Farrow, S. Appleby","doi":"10.1109/ISM.2020.00027","DOIUrl":"https://doi.org/10.1109/ISM.2020.00027","url":null,"abstract":"In the recent years, HTTP Adaptive Bit Rate (ABR) streaming including Dynamic Adaptive Streaming over HTTP (DASH) has become the most popular technology for video streaming over the Internet. The client device requests segments of content using HTTP, with an ABR algorithm selecting the quality at which to request each segment to trade-off video quality with the avoidance of stalling. This introduces high latency compared to traditional broadcast methods, mostly in the client buffer which needs to hold enough data to absorb any changes in network conditions. Clients employ an ABR algorithm which monitors network conditions and adjusts the quality at which segments are requested to maximise the user's Quality of Experience. The size of the client buffer depends on the ABR algorithm's capability to respond to changes in network conditions in a timely manner, hence, low latency live streaming requires an ABR algorithm that can perform well with a small client buffer. In this paper, we present Llama - a new ABR algorithm specifically designed to operate in such scenarios. Our new ABR algorithm employs the novel idea of using two independent throughput measurements made over different timescales. We have evaluated Llama by comparing it against four popular ABR algorithms in terms of multiple QoE metrics, across multiple client settings, and in various network scenarios based on CDN logs of a commercial live TV service. Llama outperforms other ABR algorithms, improving the P.1203 Mean Opinion Score (MOS) as well as reducing rebuffering by 33% when using DASH, and 68% with CMAF in the lowest latency scenario.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129608415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Real-Time Detection of Events in Soccer Videos using 3D Convolutional Neural Networks 基于三维卷积神经网络的足球视频事件实时检测
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00030
Olav A. Norgård Rongved, S. Hicks, Vajira Lasantha Thambawita, H. Stensland, E. Zouganeli, Dag Johansen, M. Riegler, P. Halvorsen
In this paper, we present an algorithm for automatically detecting events in soccer videos using 3D convolutional neural networks. The algorithm uses a sliding window approach to scan over a given video to detect events such as goals, yellow/red cards, and player substitutions. We test the method on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.
本文提出了一种基于三维卷积神经网络的足球视频事件自动检测算法。该算法使用滑动窗口方法扫描给定视频,以检测进球、黄牌/红牌和球员换下等事件。我们在来自SoccerNet、瑞典Allsvenskan和挪威精英队的三个不同数据集上测试了该方法。总的来说,结果表明我们可以检测到具有高召回率、低延迟和准确的时间估计的事件。与当前最先进的技术相比,代价是精度略低,后者具有更高的延迟,并且在可以接受较不准确的时间估计时性能更好。除了提出的算法外,我们还对训练管道的不同部分如何影响最终结果进行了广泛的烧蚀研究。
{"title":"Real-Time Detection of Events in Soccer Videos using 3D Convolutional Neural Networks","authors":"Olav A. Norgård Rongved, S. Hicks, Vajira Lasantha Thambawita, H. Stensland, E. Zouganeli, Dag Johansen, M. Riegler, P. Halvorsen","doi":"10.1109/ISM.2020.00030","DOIUrl":"https://doi.org/10.1109/ISM.2020.00030","url":null,"abstract":"In this paper, we present an algorithm for automatically detecting events in soccer videos using 3D convolutional neural networks. The algorithm uses a sliding window approach to scan over a given video to detect events such as goals, yellow/red cards, and player substitutions. We test the method on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115026554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2020 IEEE International Symposium on Multimedia (ISM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1