2020 IEEE International Symposium on Multimedia (ISM)最新文献

Automatic Identification of Keywords in Lecture Video Segments 讲座视频片段关键词的自动识别

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00035

Raga Shalini Koka, Farah Naz Chowdhury, Mohammad Rajiur Rahman, T. Solorio, J. Subhlok

Lecture video is an increasingly important learning resource. However, the challenge of quickly finding the content of interest in a long lecture video is a critical limitation of this format. This paper introduces automatic discovery of keywords (or tags) for lecture video segments to improve navigation. A lecture video is divided into topical segments based on the frame-to-frame similarity of content. A user navigates the lecture video assisted by visual summaries and keywords for the segments. Keywords provide an overview of the content discussed in the segment to improve navigation. The input to the keyword identification algorithm is the text from the video frames extracted by OCR. Automatically discovering keywords is challenging as the suitability of an N-gram to be a keyword depends on a variety of factors including frequency in a segment and relative frequency in reference to the full video, font size, time on screen, and the existence in domain and language dictionaries. This paper explores how these factors are quantified and combined to identify good keywords. The key scientific contribution of this paper is the design, implementation, and evaluation of a keyword selection algorithm for lecture video segments. Evaluation is performed by comparing the keywords generated by the algorithm with the tags chosen by experts on 121 segments of 11 videos from STEM courses.

讲座视频是一种越来越重要的学习资源。然而，在一个很长的讲座视频中快速找到感兴趣的内容是这种格式的一个关键限制。本文介绍了讲座视频片段的关键词(或标签)自动发现，以改善导航。讲座视频根据内容的帧与帧之间的相似性划分为主题片段。用户通过视觉摘要和视频片段的关键字导航讲座视频。关键词提供了该部分讨论内容的概述，以改进导航。关键字识别算法的输入是由OCR提取的视频帧中的文本。自动发现关键字是具有挑战性的，因为N-gram是否适合作为关键字取决于各种因素，包括片段中的频率和相对于完整视频的频率、字体大小、屏幕上的时间以及在领域和语言字典中的存在。本文探讨了如何对这些因素进行量化和组合，以确定好的关键词。本文的主要科学贡献是设计、实现和评估讲座视频片段的关键字选择算法。通过将算法生成的关键词与专家在11个STEM课程视频的121个片段中选择的标签进行比较来进行评估。

{"title":"Automatic Identification of Keywords in Lecture Video Segments","authors":"Raga Shalini Koka, Farah Naz Chowdhury, Mohammad Rajiur Rahman, T. Solorio, J. Subhlok","doi":"10.1109/ISM.2020.00035","DOIUrl":"https://doi.org/10.1109/ISM.2020.00035","url":null,"abstract":"Lecture video is an increasingly important learning resource. However, the challenge of quickly finding the content of interest in a long lecture video is a critical limitation of this format. This paper introduces automatic discovery of keywords (or tags) for lecture video segments to improve navigation. A lecture video is divided into topical segments based on the frame-to-frame similarity of content. A user navigates the lecture video assisted by visual summaries and keywords for the segments. Keywords provide an overview of the content discussed in the segment to improve navigation. The input to the keyword identification algorithm is the text from the video frames extracted by OCR. Automatically discovering keywords is challenging as the suitability of an N-gram to be a keyword depends on a variety of factors including frequency in a segment and relative frequency in reference to the full video, font size, time on screen, and the existence in domain and language dictionaries. This paper explores how these factors are quantified and combined to identify good keywords. The key scientific contribution of this paper is the design, implementation, and evaluation of a keyword selection algorithm for lecture video segments. Evaluation is performed by comparing the keywords generated by the algorithm with the tags chosen by experts on 121 segments of 11 videos from STEM courses.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120940809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Between the Frames - Evaluation of Various Motion Interpolation Algorithms to Improve 360° Video Quality 帧之间-各种运动插值算法的评估，以提高360°视频质量

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00017

S. Fremerey, Frank Hofmeyer, Steve Göring, Dominik Keller, A. Raake

With the increasing availability of 360° video content, it becomes important to provide smoothly playing videos of high quality for end users. For this reason, we compare the influence of different Motion Interpolation (MI) algorithms on 360° video quality. After conducting a pre-test with 12 video experts in [3], we found that MI is a useful tool to increase the QoE (Quality of Experience) of omnidirectional videos. As a result of the pretest, we selected three suitable MI algorithms, namely ffmpeg Motion Compensated Interpolation (MCI), Butterflow and Super-SloMo. Subsequently, we interpolated 15 entertaining and realworld omnidirectional videos with a duration of 20 seconds from 30 fps (original framerate) to 90 fps, which is the native refresh rate of the HMD used, the HTC Vive Pro. To assess QoE, we conducted two subjective tests with 24 and 27 participants. In the first test we used a Modified Paired Comparison (M-PC) method, and in the second test the Absolute Category Rating (ACR) approach. In the M-PC test, 45 stimuli were used and in the ACR test 60. Results show that for most of the 360° videos, the interpolated versions obtained significantly higher quality scores than the lower-framerate source videos, validating our hypothesis that motion interpolation can improve the overall video quality for 360° video. As expected, it was observed that the relative comparisons in the M-PC test result in larger differences in terms of quality. Generally, the ACR method lead to similar results, while reflecting a more realistic viewing situation. In addition, we compared the different MI algorithms and can conclude that with sufficient available computing power Super-SloMo should be preferred for interpolation of omnidirectional videos, while MCI also shows a good performance.

随着360°视频内容的日益普及，为终端用户提供流畅播放的高质量视频变得非常重要。因此，我们比较了不同的运动插值算法对360°视频质量的影响。在b[3]进行了12位视频专家的预测试后，我们发现MI是提高全向视频QoE(体验质量)的有用工具。作为预测试的结果，我们选择了三种合适的MI算法，即ffmpeg运动补偿插值(MCI)， Butterflow和Super-SloMo。随后，我们插入了15个娱乐和现实世界的全方位视频，持续时间为20秒，从30帧/秒(原始帧率)到90帧/秒，这是所使用的HMD HTC Vive Pro的原生刷新率。为了评估QoE，我们对24名和27名参与者进行了两次主观测试。在第一个测试中，我们使用了修正配对比较(M-PC)方法，在第二个测试中，我们使用了绝对类别评级(ACR)方法。在M-PC测试中，使用了45个刺激，在ACR测试中使用了60个刺激。结果表明，对于大多数360°视频，插值版本获得的质量分数明显高于低帧率源视频，验证了我们的假设，即运动插值可以提高360°视频的整体视频质量。正如预期的那样，观察到M-PC测试中的相对比较结果在质量方面存在较大差异。一般来说，ACR方法得到的结果相似，同时反映了更真实的观看情况。此外，我们比较了不同的MI算法，可以得出结论，在足够的计算能力下，Super-SloMo应该优先用于全向视频的插值，而MCI也表现出良好的性能。

{"title":"Between the Frames - Evaluation of Various Motion Interpolation Algorithms to Improve 360° Video Quality","authors":"S. Fremerey, Frank Hofmeyer, Steve Göring, Dominik Keller, A. Raake","doi":"10.1109/ISM.2020.00017","DOIUrl":"https://doi.org/10.1109/ISM.2020.00017","url":null,"abstract":"With the increasing availability of 360° video content, it becomes important to provide smoothly playing videos of high quality for end users. For this reason, we compare the influence of different Motion Interpolation (MI) algorithms on 360° video quality. After conducting a pre-test with 12 video experts in [3], we found that MI is a useful tool to increase the QoE (Quality of Experience) of omnidirectional videos. As a result of the pretest, we selected three suitable MI algorithms, namely ffmpeg Motion Compensated Interpolation (MCI), Butterflow and Super-SloMo. Subsequently, we interpolated 15 entertaining and realworld omnidirectional videos with a duration of 20 seconds from 30 fps (original framerate) to 90 fps, which is the native refresh rate of the HMD used, the HTC Vive Pro. To assess QoE, we conducted two subjective tests with 24 and 27 participants. In the first test we used a Modified Paired Comparison (M-PC) method, and in the second test the Absolute Category Rating (ACR) approach. In the M-PC test, 45 stimuli were used and in the ACR test 60. Results show that for most of the 360° videos, the interpolated versions obtained significantly higher quality scores than the lower-framerate source videos, validating our hypothesis that motion interpolation can improve the overall video quality for 360° video. As expected, it was observed that the relative comparisons in the M-PC test result in larger differences in terms of quality. Generally, the ACR method lead to similar results, while reflecting a more realistic viewing situation. In addition, we compared the different MI algorithms and can conclude that with sufficient available computing power Super-SloMo should be preferred for interpolation of omnidirectional videos, while MCI also shows a good performance.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On Subpicture-based Viewport-dependent 360-degree Video Streaming using VVC 基于子图片的视口依赖的360度视频流使用VVC

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00021

Maryam Homayouni, A. Aminlou, M. Hannuksela

Virtual reality applications create an immersive experience using 360° video with high resolution and frame rate. However, since the user only views a portion of 360° video according to his/her current viewport, streaming the whole content with high resolution causes bandwidth wastage. To address this issue, viewport-dependent approaches have been proposed such that only the part of the video which falls within user's current viewport is transmitted in high quality while the rest of the content is transmitted in lower quality. The selection of high- and low-quality parts is constantly adapted according to the user's head motion, which requires frequent intra coded frames at switching points, leading to an increment in the overall streaming bitrate. In this paper a viewport-adaptive streaming scheme is introduced, which avoids intra frames at switching points by introducing long intra period for non-changing parts of the content during head motion. This scheme has been realized taking advantage of mixed Video Coding Layer (VCL) Network Abstraction Layer (NAL) unit feature of Versatile Video Coding (VVC) standard. This method reduces bitrate significantly, especially for the sequences with either no or only slow camera motion, which is common for 360° video capturing.

虚拟现实应用程序使用高分辨率和帧率的360°视频创建身临其境的体验。然而，由于用户只根据他/她当前的视口观看360°视频的一部分，以高分辨率流式传输整个内容会导致带宽浪费。为了解决这个问题，已经提出了依赖于视口的方法，这样只有在用户当前视口内的视频部分以高质量传输，而其余内容以低质量传输。高质量和低质量部分的选择是根据用户的头部运动不断调整的，这需要在切换点频繁的内部编码帧，导致整体流比特率的增加。本文介绍了一种视场自适应流传输方案，该方案通过在头部运动过程中为内容的非变化部分引入长时间的内帧，避免了切换点的内帧。该方案利用了通用视频编码(VVC)标准中混合视频编码层(VCL)、网络抽象层(NAL)单元的特点实现。这种方法显著降低了比特率，特别是对于没有或只有缓慢相机运动的序列，这对于360°视频捕获是常见的。

{"title":"On Subpicture-based Viewport-dependent 360-degree Video Streaming using VVC","authors":"Maryam Homayouni, A. Aminlou, M. Hannuksela","doi":"10.1109/ISM.2020.00021","DOIUrl":"https://doi.org/10.1109/ISM.2020.00021","url":null,"abstract":"Virtual reality applications create an immersive experience using 360° video with high resolution and frame rate. However, since the user only views a portion of 360° video according to his/her current viewport, streaming the whole content with high resolution causes bandwidth wastage. To address this issue, viewport-dependent approaches have been proposed such that only the part of the video which falls within user's current viewport is transmitted in high quality while the rest of the content is transmitted in lower quality. The selection of high- and low-quality parts is constantly adapted according to the user's head motion, which requires frequent intra coded frames at switching points, leading to an increment in the overall streaming bitrate. In this paper a viewport-adaptive streaming scheme is introduced, which avoids intra frames at switching points by introducing long intra period for non-changing parts of the content during head motion. This scheme has been realized taking advantage of mixed Video Coding Layer (VCL) Network Abstraction Layer (NAL) unit feature of Versatile Video Coding (VVC) standard. This method reduces bitrate significantly, especially for the sequences with either no or only slow camera motion, which is common for 360° video capturing.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132819889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach 更好地看两次-使用两阶段方法改善视觉场景感知

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00013

Christopher B. Kuhn, M. Hofbauer, G. Petrovic, E. Steinbach

Accurate visual scene perception plays an important role in fields such as medical imaging or autonomous driving. Recent advances in computer vision allow for accurate image classification, object detection and even pixel-wise semantic segmentation. Human vision has repeatedly been used as an inspiration for developing new machine vision approaches. In this work, we propose to adapt the “zoom lens model” from psychology for semantic scene segmentation. According to this model, humans first distribute their attention evenly across the entire field of view at low processing power. Then, they follow visual cues to look at a few smaller areas with increased attention. By looking twice, it is possible to refine the initial scene understanding without requiring additional input. We propose to perform semantic segmentation the same way. To obtain visual cues for deciding where to look twice, we use a failure region prediction approach based on a state-of-the-art failure prediction method. Then, the second, focused look is performed by a dedicated classifier that reclassifies the most challenging patches. Finally, pixels predicted to be errors are updated in the original semantic prediction. While focusing only on areas with the highest predicted failure probability, we achieve a classification accuracy of over 63% for the predicted failure regions. After updating the initial semantic prediction of 4000 test images from a large-scale driving data set, we reduce the absolute pixel-wise error of 232 road participants by 10% or more.

准确的视觉场景感知在医学成像或自动驾驶等领域发挥着重要作用。计算机视觉的最新进展允许精确的图像分类，目标检测甚至逐像素语义分割。人类视觉已经多次被用作开发新的机器视觉方法的灵感。在这项工作中，我们提出将心理学中的“变焦镜头模型”用于语义场景分割。根据这个模型，人类首先在低处理能力下将注意力均匀地分布在整个视野上。然后，他们根据视觉线索，集中注意力观察几个较小的区域。通过查看两次，可以在不需要额外输入的情况下改进初始场景理解。我们建议以同样的方式执行语义分割。为了获得决定在哪里看两次的视觉线索，我们使用了基于最先进的故障预测方法的故障区域预测方法。然后，由一个专门的分类器执行第二次集中查看，该分类器对最具挑战性的补丁进行重新分类。最后，在原始语义预测中更新被预测为错误的像素。虽然只关注预测失效概率最高的区域，但我们对预测失效区域的分类准确率超过63%。在更新了来自大规模驾驶数据集的4000张测试图像的初始语义预测后，我们将232名道路参与者的绝对像素误差降低了10%或更多。

{"title":"Better Look Twice - Improving Visual Scene Perception Using a Two-Stage Approach","authors":"Christopher B. Kuhn, M. Hofbauer, G. Petrovic, E. Steinbach","doi":"10.1109/ISM.2020.00013","DOIUrl":"https://doi.org/10.1109/ISM.2020.00013","url":null,"abstract":"Accurate visual scene perception plays an important role in fields such as medical imaging or autonomous driving. Recent advances in computer vision allow for accurate image classification, object detection and even pixel-wise semantic segmentation. Human vision has repeatedly been used as an inspiration for developing new machine vision approaches. In this work, we propose to adapt the “zoom lens model” from psychology for semantic scene segmentation. According to this model, humans first distribute their attention evenly across the entire field of view at low processing power. Then, they follow visual cues to look at a few smaller areas with increased attention. By looking twice, it is possible to refine the initial scene understanding without requiring additional input. We propose to perform semantic segmentation the same way. To obtain visual cues for deciding where to look twice, we use a failure region prediction approach based on a state-of-the-art failure prediction method. Then, the second, focused look is performed by a dedicated classifier that reclassifies the most challenging patches. Finally, pixels predicted to be errors are updated in the original semantic prediction. While focusing only on areas with the highest predicted failure probability, we achieve a classification accuracy of over 63% for the predicted failure regions. After updating the initial semantic prediction of 4000 test images from a large-scale driving data set, we reduce the absolute pixel-wise error of 232 road participants by 10% or more.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115890073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Bonsai Style Classification: a new database and baseline results 盆景风格分类:一个新的数据库和基线结果

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00025

Guilherme H. S. Nakahata, A. A. Constantino, Yandre M. G. Costa

Bonsai consists of an ancient art which is aimed at mimicking a tree in miniature. Despite being original and popular on the Asian continent, Bonsai has been widespread in several parts of the world. There are many techniques for styling the plants, classifying them in different patterns widely known by people who appreciate this art. In this work, we introduce a new database specially created for the development of research on Bonsai styles classification. The database is composed of 700 samples, equally distributed among the seven following classes: Formal Upright, Informal Upright, Slanting, Cascade, Semi Cascade, Literati and Wind Swept. The classes selected to compose the database were chosen considering the five basic styles and two more styles that have distinct characteristics from the others. The database was created by the authors themselves, using images available on the Pinterest platform, and they were subjected to a pre-processing criteria to remove similar photos and resize them. The baseline results presented here were obtained using deep models (CNN architectures) successfully used to address image classification tasks in different application domains: VGG, Xception, DenseNet and InceptionV3. These models were trained on ImageNet and we used transfer learning aiming to adapt it to the current proposal. In order to avoid overfitting, data augmentation was performed during training, along with the dropout method. Experimental results showed that VGG19 model obtained the highest accuracy rate, reaching 89%. In addition, we used DeconvNet and Deep Taylor methods aiming to find a proper explanation for the obtained results. It was noted that the VGG19 model better captured the most important aspects for the classification task investigated here, with a better performance to discriminate and generalize patterns in the task of classifying Bonsai styles.

盆景是一门古老的艺术，旨在模仿树木的微缩。尽管盆景在亚洲大陆是原创和流行的，但它已经在世界上的几个地方广泛传播。有很多技术来塑造植物的风格，将它们分类成不同的模式，欣赏这种艺术的人都知道。在这项工作中，我们介绍了一个专门为盆景风格分类研究的发展而创建的新数据库。该数据库由700个样本组成，平均分布在以下七个类别:正式直立，非正式直立，倾斜，梯级，半梯级，文人和风扫。选择组成数据库的类时考虑了五种基本样式和另外两种具有不同于其他样式的不同特征的样式。该数据库是由作者自己创建的，使用Pinterest平台上的图片，并根据预处理标准删除相似的照片并调整其大小。本文给出的基线结果是使用深度模型(CNN架构)获得的，这些模型成功地用于解决不同应用领域的图像分类任务:VGG、Xception、DenseNet和InceptionV3。这些模型是在ImageNet上训练的，我们使用迁移学习，旨在使其适应当前的提议。为了避免过拟合，在训练过程中进行数据增强，并辅以dropout方法。实验结果表明，VGG19模型的准确率最高，达到89%。此外，我们使用了DeconvNet和Deep Taylor方法，旨在为获得的结果找到适当的解释。VGG19模型更好地抓住了本研究分类任务中最重要的方面，在盆景风格分类任务中具有更好的模式判别和泛化性能。

{"title":"Bonsai Style Classification: a new database and baseline results","authors":"Guilherme H. S. Nakahata, A. A. Constantino, Yandre M. G. Costa","doi":"10.1109/ISM.2020.00025","DOIUrl":"https://doi.org/10.1109/ISM.2020.00025","url":null,"abstract":"Bonsai consists of an ancient art which is aimed at mimicking a tree in miniature. Despite being original and popular on the Asian continent, Bonsai has been widespread in several parts of the world. There are many techniques for styling the plants, classifying them in different patterns widely known by people who appreciate this art. In this work, we introduce a new database specially created for the development of research on Bonsai styles classification. The database is composed of 700 samples, equally distributed among the seven following classes: Formal Upright, Informal Upright, Slanting, Cascade, Semi Cascade, Literati and Wind Swept. The classes selected to compose the database were chosen considering the five basic styles and two more styles that have distinct characteristics from the others. The database was created by the authors themselves, using images available on the Pinterest platform, and they were subjected to a pre-processing criteria to remove similar photos and resize them. The baseline results presented here were obtained using deep models (CNN architectures) successfully used to address image classification tasks in different application domains: VGG, Xception, DenseNet and InceptionV3. These models were trained on ImageNet and we used transfer learning aiming to adapt it to the current proposal. In order to avoid overfitting, data augmentation was performed during training, along with the dropout method. Experimental results showed that VGG19 model obtained the highest accuracy rate, reaching 89%. In addition, we used DeconvNet and Deep Taylor methods aiming to find a proper explanation for the obtained results. It was noted that the VGG19 model better captured the most important aspects for the classification task investigated here, with a better performance to discriminate and generalize patterns in the task of classifying Bonsai styles.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124178091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring Driver Situation Awareness Using Region-of-Interest Prediction and Eye Tracking 利用兴趣区域预测和眼动追踪测量驾驶员的态势感知

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00022

M. Hofbauer, Christopher B. Kuhn, Lukas Püttner, G. Petrovic, E. Steinbach

With increasing progress in autonomous driving, the human does not have to be in control of the vehicle for the entire drive. A human driver obtains the control of the vehicle in case of an autonomous system failure or when the vehicle encounters an unknown traffic situation it cannot handle on its own. A critical part of this transition to human control is to ensure a sufficient driver situation awareness. Currently, no direct method to explicitly estimate driver awareness exists. In this paper, we propose a novel system to explicitly measure the situation awareness of the driver. Our approach is inspired by methods used in aviation. However, in contrast to aviation, the situation awareness in driving is determined by the detection and understanding of dynamically changing and previously unknown situation elements. Our approach uses machine learning to define the best possible situation awareness. We also propose to measure the actual situation awareness of the driver using eye tracking. Comparing the actual awareness to the target awareness allows us to accurately assess the awareness the driver has of the current traffic situation. To test our approach, we conducted a user study. We measured the situation awareness score of our model for 8 unique traffic scenarios. The results experimentally validate the accuracy of the proposed driver awareness model.

随着自动驾驶技术的不断进步，人类不必在整个驾驶过程中控制车辆。在自动驾驶系统出现故障或车辆遇到无法自行处理的未知交通状况时，驾驶员获得对车辆的控制权。向人类控制过渡的一个关键部分是确保驾驶员充分了解情况。目前，还没有明确估计驾驶员意识的直接方法。在本文中，我们提出了一种新的系统来显式测量驾驶员的态势感知。我们的方法受到航空中使用的方法的启发。然而，与航空相比，驾驶中的态势感知取决于对动态变化和先前未知的态势要素的检测和理解。我们的方法使用机器学习来定义最佳可能的情况感知。我们还建议使用眼动追踪来测量驾驶员的实际情况意识。将实际意识与目标意识进行比较，我们可以准确地评估驾驶员对当前交通状况的意识。为了测试我们的方法，我们进行了一项用户研究。我们为8个独特的交通场景测量了我们模型的态势感知得分。实验结果验证了所提驾驶员感知模型的准确性。

{"title":"Measuring Driver Situation Awareness Using Region-of-Interest Prediction and Eye Tracking","authors":"M. Hofbauer, Christopher B. Kuhn, Lukas Püttner, G. Petrovic, E. Steinbach","doi":"10.1109/ISM.2020.00022","DOIUrl":"https://doi.org/10.1109/ISM.2020.00022","url":null,"abstract":"With increasing progress in autonomous driving, the human does not have to be in control of the vehicle for the entire drive. A human driver obtains the control of the vehicle in case of an autonomous system failure or when the vehicle encounters an unknown traffic situation it cannot handle on its own. A critical part of this transition to human control is to ensure a sufficient driver situation awareness. Currently, no direct method to explicitly estimate driver awareness exists. In this paper, we propose a novel system to explicitly measure the situation awareness of the driver. Our approach is inspired by methods used in aviation. However, in contrast to aviation, the situation awareness in driving is determined by the detection and understanding of dynamically changing and previously unknown situation elements. Our approach uses machine learning to define the best possible situation awareness. We also propose to measure the actual situation awareness of the driver using eye tracking. Comparing the actual awareness to the target awareness allows us to accurately assess the awareness the driver has of the current traffic situation. To test our approach, we conducted a user study. We measured the situation awareness score of our model for 8 unique traffic scenarios. The results experimentally validate the accuracy of the proposed driver awareness model.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131050339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Automatic Sparsity-Aware Recognition for Keypoint Detection 关键点检测的自动稀疏感知识别

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00029

Yurui Xie, L. Guan

We present a novel Sparsity-Aware Keypoint detector (SAKD) to localize a set of discriminative keypoints via optimization of group-sparse coding. Unlike most of current handcrafted keypoint detectors that are limited by the manually defined local structures, the proposed method has the capacity to allow flexibility for exploiting diverse structures with the combination of visual atoms from a vocabulary. Another key valuable attribute is that its group-sparsity nature concentrates on discovering sharable structural patterns across keypoints within an image jointly. This main merit facilitates to localize repeatable keypoints and resists against distractors when image undergoes various transformations. Extensive experiments on four challenging benchmark datasets demonstrate that the proposed method achieves favorable performances compared with state-of-the-art in literature.

我们提出了一种新的稀疏感知关键点检测器(SAKD)，通过优化群稀疏编码来定位一组判别关键点。与目前大多数手工制作的关键点检测器受手工定义的局部结构的限制不同，本文提出的方法能够灵活地利用词汇表中视觉原子的组合来开发不同的结构。另一个关键的有价值的属性是它的群稀疏性集中于发现图像中共同的关键点之间的可共享结构模式。这一主要优点有利于定位可重复的关键点，并在图像经历各种变换时抵抗干扰。在四个具有挑战性的基准数据集上进行的大量实验表明，与文献中最先进的方法相比，该方法取得了良好的性能。

引用次数: 1

A comparative study of RTC applications RTC应用的比较研究

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00007

A. Nisticò, Dena Markudova, Martino Trevisan, M. Meo, G. Carofiglio

Real-Time Communication (RTC) applications have become ubiquitous and are nowadays fundamental for people to communicate with friends and relatives, as well as for enterprises to allow remote working and save travel costs. Countless competing platforms differ in the ease of use, features they implement, supported user equipment and targeted audience (consumer of business). However, there is no standard protocol or interoperability mechanism. This picture complicates the traffic management, making it hard to isolate RTC traffic for prioritization or obstruction. Moreover, undocumented operation could result in the traffic being blocked at firewalls or middleboxes. In this paper, we analyze 13 popular RTC applications, from widespread consumer apps, like Skype and Whatsapp, to business platforms dedicated to enterprises - Microsoft Teams and Webex Teams. We collect packet traces under different conditions and illustrate similarities and differences in their use of the network. We find that most applications employ the well-known RTP protocol, but we observe a few cases of different (and even undocumented) approaches. The majority of applications allow peer-to-peer communication during calls with only two participants. Six of them send redundant data for Forward Error Correction or encode the user video at different bitrates. In addition, we notice that many of them are easy to identify by looking at the destination servers or the domain names resolved via DNS. The packet traces we collected, along with the metadata we extract, are made available to the community.

实时通信(RTC)应用程序已经无处不在，是当今人们与朋友和亲戚沟通的基础，也是企业允许远程工作和节省差旅成本的基础。无数的竞争平台在易用性、实现的功能、支持的用户设备和目标受众(企业消费者)方面存在差异。但是，没有标准协议或互操作性机制。这种情况使流量管理变得复杂，使得隔离RTC流量以确定优先级或阻塞变得困难。此外，未记录的操作可能导致流量在防火墙或中间盒处被阻塞。在本文中，我们分析了13个流行的RTC应用程序，从广泛的消费者应用程序，如Skype和Whatsapp，到企业专用的商业平台，如Microsoft Teams和Webex Teams。我们收集了不同条件下的数据包轨迹，并说明了它们在网络使用中的异同。我们发现大多数应用程序都使用众所周知的RTP协议，但是我们也观察到一些不同的(甚至没有记录的)方法。大多数应用程序在只有两个参与者的呼叫期间允许点对点通信。其中六个发送冗余数据用于前向纠错或以不同的比特率对用户视频进行编码。此外，我们注意到其中许多很容易通过查看目标服务器或通过DNS解析的域名来识别。我们收集的数据包轨迹以及提取的元数据都可供社区使用。

{"title":"A comparative study of RTC applications","authors":"A. Nisticò, Dena Markudova, Martino Trevisan, M. Meo, G. Carofiglio","doi":"10.1109/ISM.2020.00007","DOIUrl":"https://doi.org/10.1109/ISM.2020.00007","url":null,"abstract":"Real-Time Communication (RTC) applications have become ubiquitous and are nowadays fundamental for people to communicate with friends and relatives, as well as for enterprises to allow remote working and save travel costs. Countless competing platforms differ in the ease of use, features they implement, supported user equipment and targeted audience (consumer of business). However, there is no standard protocol or interoperability mechanism. This picture complicates the traffic management, making it hard to isolate RTC traffic for prioritization or obstruction. Moreover, undocumented operation could result in the traffic being blocked at firewalls or middleboxes. In this paper, we analyze 13 popular RTC applications, from widespread consumer apps, like Skype and Whatsapp, to business platforms dedicated to enterprises - Microsoft Teams and Webex Teams. We collect packet traces under different conditions and illustrate similarities and differences in their use of the network. We find that most applications employ the well-known RTP protocol, but we observe a few cases of different (and even undocumented) approaches. The majority of applications allow peer-to-peer communication during calls with only two participants. Six of them send redundant data for Forward Error Correction or encode the user video at different bitrates. In addition, we notice that many of them are easy to identify by looking at the destination servers or the domain names resolved via DNS. The packet traces we collected, along with the metadata we extract, are made available to the community.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125319626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Computational Method for Optimal Attack Play Consisting of Run Plays and Hand-pass Plays for Seven-a-side Rugby 七人制橄榄球中由跑动和手传组成的最佳进攻战术的计算方法

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00031

Kotaro Yashiro, Yohei Nakada

Providing explanatory information during broadcast of team sports is becoming important to make rules, plays, tactics, and developments easier to understand for viewers, particularly for beginners. Against this background, this paper presents a computational method for selecting the optimal attack play for a try, considering run and hand-pass plays. In this method, attack plays consisting of runs and hand-passes are simulated from the current player position and speed data based on motion models for each player and the ball. We then evaluate the simulated attack plays. The optimal attack play can be obtained using the branch-and-bound algorithm based on the evaluation results. In this study, the proposed method is validated using four synthetic formation examples of seven-a-side rugby as an initial validation. Displaying the optimal attack plays computed using the proposed method can help viewers understand developments in games more easily.

在团体运动的转播中提供解释性信息对于使规则、比赛、战术和发展更容易被观众，特别是初学者理解变得越来越重要。在此背景下，本文提出了一种计算方法来选择最佳的进攻战术进行尝试，考虑跑动和手传战术。在这种方法中，基于每个球员和球的运动模型，从当前球员的位置和速度数据模拟由跑动和手传组成的进攻。然后我们评估模拟的攻击玩法。在评价结果的基础上，采用分支定界算法得到最优的攻击玩法。在本研究中，采用四种七人制橄榄球合成地层作为初始验证，验证了所提出的方法。显示使用所提出的方法计算的最佳攻击玩法可以帮助观众更容易地了解游戏的发展。

引用次数: 3

[Copyright notice] (版权)

2020 IEEE International Symposium on Multimedia (ISM)

Pub Date : 2020-12-01 DOI: 10.1109/ism.2020.00003

引用次数: 0