首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Natural Scene Statistics for Detecting Adversarial Examples in Deep Neural Networks 基于自然场景统计的深度神经网络对抗样本检测
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287056
Anouar Kherchouche, Sid Ahmed Fezza, W. Hamidouche, O. Déforges
The deep neural networks (DNNs) have been adopted in a wide spectrum of applications. However, it has been demonstrated that their are vulnerable to adversarial examples (AEs): carefully-crafted perturbations added to a clean input image. These AEs fool the DNNs which classify them incorrectly. Therefore, it is imperative to develop a detection method of AEs allowing the defense of DNNs. In this paper, we propose to characterize the adversarial perturbations through the use of natural scene statistics. We demonstrate that these statistical properties are altered by the presence of adversarial perturbations. Based on this finding, we design a classifier that exploits these scene statistics to determine if an input is adversarial or not. The proposed method has been evaluated against four prominent adversarial attacks and on three standards datasets. The experimental results have shown that the proposed detection method achieves a high detection accuracy, even against strong attacks, while providing a low false positive rate.
深度神经网络(dnn)已被广泛应用。然而,已经证明它们很容易受到对抗性示例(AEs)的影响:在干净的输入图像中添加精心制作的扰动。这些ae欺骗了对它们进行错误分类的dnn。因此,开发一种能够防御深层神经网络的ae检测方法势在必行。在本文中,我们建议通过使用自然场景统计来表征对抗性摄动。我们证明,这些统计性质被对抗性扰动的存在所改变。基于这一发现,我们设计了一个分类器,利用这些场景统计来确定输入是否是对抗性的。提出的方法已经针对四种突出的对抗性攻击和三个标准数据集进行了评估。实验结果表明,该检测方法在面对强攻击的情况下也具有较高的检测精度,同时具有较低的误报率。
{"title":"Natural Scene Statistics for Detecting Adversarial Examples in Deep Neural Networks","authors":"Anouar Kherchouche, Sid Ahmed Fezza, W. Hamidouche, O. Déforges","doi":"10.1109/MMSP48831.2020.9287056","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287056","url":null,"abstract":"The deep neural networks (DNNs) have been adopted in a wide spectrum of applications. However, it has been demonstrated that their are vulnerable to adversarial examples (AEs): carefully-crafted perturbations added to a clean input image. These AEs fool the DNNs which classify them incorrectly. Therefore, it is imperative to develop a detection method of AEs allowing the defense of DNNs. In this paper, we propose to characterize the adversarial perturbations through the use of natural scene statistics. We demonstrate that these statistical properties are altered by the presence of adversarial perturbations. Based on this finding, we design a classifier that exploits these scene statistics to determine if an input is adversarial or not. The proposed method has been evaluated against four prominent adversarial attacks and on three standards datasets. The experimental results have shown that the proposed detection method achieves a high detection accuracy, even against strong attacks, while providing a low false positive rate.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"259 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120939650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Spectrogram-Based Classification Of Spoken Foul Language Using Deep CNN 基于谱图的深度CNN口语污言秽语分类
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287133
A. Wazir, H. A. Karim, Mohd Haris Lye Abdullah, Sarina Mansor, Nouar Aldahoul, M. F. A. Fauzi, John See
Excessive content of profanity in audio and video files has proven to shape one’s character and behavior. Currently, conventional methods of manual detection and censorship are being used. Manual censorship method is time consuming and prone to misdetection of foul language. This paper proposed an intelligent model for foul language censorship through automated and robust detection by deep convolutional neural networks (CNNs). A dataset of foul language was collected and processed for the computation of audio spectrogram images that serve as an input to evaluate the classification of foul language. The proposed model was first tested for 2-class (Foul vs Normal) classification problem, the foul class is then further decomposed into a 10-class classification problem for exact detection of profanity. Experimental results show the viability of proposed system by demonstrating high performance of curse words classification with 1.24-2.71 Error Rate (ER) for 2-class and 5.49-8.30 F1- score. Proposed Resnet50 architecture outperforms other models in terms of accuracy, sensitivity, specificity, F1-score.
事实证明,音频和视频文件中过多的亵渎内容会塑造一个人的性格和行为。目前,正在使用传统的人工检测和审查方法。人工审查方法耗时长,容易对粗言秽语进行误检。本文提出了一种基于深度卷积神经网络(cnn)自动鲁棒检测的脏话审查智能模型。收集并处理了一个脏话数据集,用于计算音频谱图图像,作为评估脏话分类的输入。首先对该模型进行2类(犯规vs正常)分类问题的测试,然后将犯规类进一步分解为10类分类问题,以准确检测脏话。实验结果表明,该系统具有较高的分类效率,2类分类错误率为1.24 ~ 2.71,F1-分数为5.49 ~ 8.30。提出的Resnet50架构在准确性、灵敏度、特异性和f1评分方面优于其他模型。
{"title":"Spectrogram-Based Classification Of Spoken Foul Language Using Deep CNN","authors":"A. Wazir, H. A. Karim, Mohd Haris Lye Abdullah, Sarina Mansor, Nouar Aldahoul, M. F. A. Fauzi, John See","doi":"10.1109/MMSP48831.2020.9287133","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287133","url":null,"abstract":"Excessive content of profanity in audio and video files has proven to shape one’s character and behavior. Currently, conventional methods of manual detection and censorship are being used. Manual censorship method is time consuming and prone to misdetection of foul language. This paper proposed an intelligent model for foul language censorship through automated and robust detection by deep convolutional neural networks (CNNs). A dataset of foul language was collected and processed for the computation of audio spectrogram images that serve as an input to evaluate the classification of foul language. The proposed model was first tested for 2-class (Foul vs Normal) classification problem, the foul class is then further decomposed into a 10-class classification problem for exact detection of profanity. Experimental results show the viability of proposed system by demonstrating high performance of curse words classification with 1.24-2.71 Error Rate (ER) for 2-class and 5.49-8.30 F1- score. Proposed Resnet50 architecture outperforms other models in terms of accuracy, sensitivity, specificity, F1-score.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Low-Complexity Angular Intra-Prediction Convolutional Neural Network for Lossless HEVC 低复杂度角内预测卷积神经网络无损HEVC
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287067
H. Huang, I. Schiopu, A. Munteanu
The paper proposes a novel low-complexity Convolutional Neural Network (CNN) architecture for block-wise angular intra-prediction in lossless video coding. The proposed CNN architecture is designed based on an efficient patch processing layer structure. The proposed CNN-based prediction method is employed to process an input patch containing the causal neighborhood of the current block in order to directly generate the predicted block. The trained models are integrated in the HEVC video coding standard to perform CNN-based angular intra-prediction and to compete with the conventional HEVC prediction. The proposed CNN architecture contains a reduced number of parameters equivalent to only 37% of that of the state-of-the-art reference CNN architecture. Experimental results show that the inference runtime is also reduced by around 5.5% compared to that of the reference method. At the same time, the proposed coding systems yield 83% to 91% of the compression performance of the reference method. The results demonstrate the potential of structural and complexity optimizations in CNN-based intra-prediction for lossless HEVC.
提出了一种新颖的低复杂度卷积神经网络(CNN)结构,用于无损视频编码中逐块角度内预测。本文提出的CNN架构是基于高效的patch处理层结构设计的。提出的基于cnn的预测方法是对包含当前块的因果邻域的输入patch进行处理,从而直接生成预测块。将训练好的模型集成到HEVC视频编码标准中,进行基于cnn的角度内预测,并与传统的HEVC预测相抗衡。所提出的CNN架构包含的参数数量减少了,相当于目前最先进的参考CNN架构的37%。实验结果表明,与参考方法相比,推理运行时间也缩短了5.5%左右。同时,所提出的编码系统的压缩性能是参考方法的83%到91%。结果表明,在基于cnn的无损HEVC内部预测中,结构和复杂性优化具有潜力。
{"title":"Low-Complexity Angular Intra-Prediction Convolutional Neural Network for Lossless HEVC","authors":"H. Huang, I. Schiopu, A. Munteanu","doi":"10.1109/MMSP48831.2020.9287067","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287067","url":null,"abstract":"The paper proposes a novel low-complexity Convolutional Neural Network (CNN) architecture for block-wise angular intra-prediction in lossless video coding. The proposed CNN architecture is designed based on an efficient patch processing layer structure. The proposed CNN-based prediction method is employed to process an input patch containing the causal neighborhood of the current block in order to directly generate the predicted block. The trained models are integrated in the HEVC video coding standard to perform CNN-based angular intra-prediction and to compete with the conventional HEVC prediction. The proposed CNN architecture contains a reduced number of parameters equivalent to only 37% of that of the state-of-the-art reference CNN architecture. Experimental results show that the inference runtime is also reduced by around 5.5% compared to that of the reference method. At the same time, the proposed coding systems yield 83% to 91% of the compression performance of the reference method. The results demonstrate the potential of structural and complexity optimizations in CNN-based intra-prediction for lossless HEVC.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115961842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Defining Embedding Distortion for Sample Adaptive Offset-Based HEVC Video Steganography 基于样本自适应偏移的HEVC视频隐写嵌入失真定义
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287075
Yabing Cui, Yuanzhi Yao, Nenghai Yu
As a newly added in-loop filtering technique in High Efficiency Video Coding (HEVC), sample adaptive offset (SAO) can be utilized to embed messages for video steganography. This paper presents a novel SAO-based HEVC video steganographic scheme. The main principle is to design a suitable distortion function which expresses the embedding impacts on offsets based on minimizing embedding distortion. Two factors including the sample rate-distortion cost fluctuation and the sample statistical characteristic are considered in embedding distortion definition. Adaptive message embedding is implemented using syndrome-trellis codes (STC). Experimental results demonstrate the merits of the proposed scheme in terms of undetectability and video coding performance.
采样自适应偏移(SAO)是高效视频编码(HEVC)中新增的一种环内滤波技术,可用于视频隐写信息的嵌入。提出了一种新的基于sao的HEVC视频隐写方案。其主要原理是在最小化嵌入失真的基础上,设计一个合适的畸变函数来表达嵌入对偏移量的影响。在嵌入失真定义中考虑了样本率失真成本波动和样本统计特性两个因素。自适应信息嵌入采用证格码(STC)实现。实验结果证明了该方案在不可检测性和视频编码性能方面的优点。
{"title":"Defining Embedding Distortion for Sample Adaptive Offset-Based HEVC Video Steganography","authors":"Yabing Cui, Yuanzhi Yao, Nenghai Yu","doi":"10.1109/MMSP48831.2020.9287075","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287075","url":null,"abstract":"As a newly added in-loop filtering technique in High Efficiency Video Coding (HEVC), sample adaptive offset (SAO) can be utilized to embed messages for video steganography. This paper presents a novel SAO-based HEVC video steganographic scheme. The main principle is to design a suitable distortion function which expresses the embedding impacts on offsets based on minimizing embedding distortion. Two factors including the sample rate-distortion cost fluctuation and the sample statistical characteristic are considered in embedding distortion definition. Adaptive message embedding is implemented using syndrome-trellis codes (STC). Experimental results demonstrate the merits of the proposed scheme in terms of undetectability and video coding performance.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124021560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Blind reverberation time estimation from ambisonic recordings 从双声录音中估计盲混响时间
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287128
A. Pérez-López, A. Politis, E. Gómez
Reverberation time is an important room acoustic parameter, useful for many acoustic signal processing applications. Most of the existing work on blind reverberation time estimation focuses on the single-channel case. However, the recent developments and interest on immersive audio have brought to the market a number of spherical microphone arrays, together with the usage of ambisonics as a standard spatial audio convention. This work presents a novel blind reverberation time estimation method, which specifically targets ambisonic recordings, a field that remained unexplored to the best of our knowledge. Experimental validation on a synthetic reverberant dataset shows that the proposed algorithm outperforms state-of-the-art methods under most evaluation criteria in low noise conditions.
混响时间是一个重要的室内声学参数,在许多声学信号处理应用中都很有用。现有的盲混响时间估计工作大多集中在单通道情况下。然而,最近的发展和对沉浸式音频的兴趣已经给市场带来了许多球形麦克风阵列,以及作为标准空间音频惯例的双声系统的使用。这项工作提出了一种新的盲混响时间估计方法,该方法专门针对双声录音,这是一个据我们所知尚未探索的领域。在合成混响数据集上的实验验证表明,在低噪声条件下,该算法在大多数评估标准下都优于最先进的方法。
{"title":"Blind reverberation time estimation from ambisonic recordings","authors":"A. Pérez-López, A. Politis, E. Gómez","doi":"10.1109/MMSP48831.2020.9287128","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287128","url":null,"abstract":"Reverberation time is an important room acoustic parameter, useful for many acoustic signal processing applications. Most of the existing work on blind reverberation time estimation focuses on the single-channel case. However, the recent developments and interest on immersive audio have brought to the market a number of spherical microphone arrays, together with the usage of ambisonics as a standard spatial audio convention. This work presents a novel blind reverberation time estimation method, which specifically targets ambisonic recordings, a field that remained unexplored to the best of our knowledge. Experimental validation on a synthetic reverberant dataset shows that the proposed algorithm outperforms state-of-the-art methods under most evaluation criteria in low noise conditions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126785154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Generalized Operational Classifiers for Material Identification 用于材料识别的广义操作分类器
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287058
Xiaoyue Jiang, Ding Wang, D. Tran, S. Kiranyaz, M. Gabbouj, Xiaoyi Feng
Material is one of the intrinsic features of objects, and consequently material recognition plays an important role in image understanding. The same material may have various shapes and appearance, while keeping the same physical characteristic. This brings great challenges for material recognition. Besides suitable features, a powerful classifier also can improve the overall recognition performance. Due to the limitations of classical linear neurons, used in all shallow and deep neural networks, such as CNN, we propose to apply the generalized operational neurons to construct a classifier adaptively. These generalized operational perceptrons (GOP) contain a set of linear and nonlinear neurons, and possess a structure that can be built progressively. This makes GOP classifier more compact and can easily discriminate complex classes. The experiments demonstrate that GOP networks trained on a small portion of the data (4%) can achieve comparable performances to state-of-the-arts models trained on much larger portions of the dataset.
材料是物体的内在特征之一,因此材料识别在图像理解中起着重要的作用。同一种材料可以具有不同的形状和外观,同时保持相同的物理特性。这给材料识别带来了巨大的挑战。除了合适的特征外,强大的分类器还可以提高整体识别性能。由于经典线性神经元用于所有浅层和深层神经网络(如CNN)的局限性,我们提出应用广义操作神经元自适应构建分类器。这些广义操作感知器(GOP)包含一组线性和非线性神经元,并具有可逐步构建的结构。这使得GOP分类器更加紧凑,可以很容易地区分复杂的类。实验表明,在一小部分数据(4%)上训练的GOP网络可以达到与在更大部分数据集上训练的最先进模型相当的性能。
{"title":"Generalized Operational Classifiers for Material Identification","authors":"Xiaoyue Jiang, Ding Wang, D. Tran, S. Kiranyaz, M. Gabbouj, Xiaoyi Feng","doi":"10.1109/MMSP48831.2020.9287058","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287058","url":null,"abstract":"Material is one of the intrinsic features of objects, and consequently material recognition plays an important role in image understanding. The same material may have various shapes and appearance, while keeping the same physical characteristic. This brings great challenges for material recognition. Besides suitable features, a powerful classifier also can improve the overall recognition performance. Due to the limitations of classical linear neurons, used in all shallow and deep neural networks, such as CNN, we propose to apply the generalized operational neurons to construct a classifier adaptively. These generalized operational perceptrons (GOP) contain a set of linear and nonlinear neurons, and possess a structure that can be built progressively. This makes GOP classifier more compact and can easily discriminate complex classes. The experiments demonstrate that GOP networks trained on a small portion of the data (4%) can achieve comparable performances to state-of-the-arts models trained on much larger portions of the dataset.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133798696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MMSP 2020 Index MMSP 2020指数
Pub Date : 2020-09-21 DOI: 10.1109/mmsp48831.2020.9287137
{"title":"MMSP 2020 Index","authors":"","doi":"10.1109/mmsp48831.2020.9287137","DOIUrl":"https://doi.org/10.1109/mmsp48831.2020.9287137","url":null,"abstract":"","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132595380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study on viewing completion ratio of video streaming 视频流媒体观看完成率的研究
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287091
Pierre R. Lebreton, Kazuhisa Yamagishi
In this paper, a model is investigated for optimizing the encoding of adaptive bitrate video streaming. To this end, the relationship between quality, content duration, and acceptability measured by using the completion ratio is studied. This work is based on intensive subjective testing performed in a laboratory environment and shows the importance of stimulus duration in acceptance studies. A model to predict the completion ratio of videos is provided and shows good accuracy. By using this model, quality requirements can be derived on the basis of the target abandonment rate and content duration. This work will help video streaming providers to define suitable coding conditions when preparing content to be broadcast on their platform that will maintain user engagement.
本文研究了自适应比特率视频流的编码优化模型。为此,研究了用完成率测量的质量、内容工期和可接受性之间的关系。这项工作是基于在实验室环境中进行的密集主观测试,并显示了刺激持续时间在接受性研究中的重要性。给出了一个预测视频完成率的模型,该模型具有较好的准确性。利用该模型,可以根据目标放弃率和内容持续时间推导出质量需求。这项工作将帮助视频流媒体提供商在准备在其平台上播放的内容时定义合适的编码条件,以保持用户参与度。
{"title":"Study on viewing completion ratio of video streaming","authors":"Pierre R. Lebreton, Kazuhisa Yamagishi","doi":"10.1109/MMSP48831.2020.9287091","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287091","url":null,"abstract":"In this paper, a model is investigated for optimizing the encoding of adaptive bitrate video streaming. To this end, the relationship between quality, content duration, and acceptability measured by using the completion ratio is studied. This work is based on intensive subjective testing performed in a laboratory environment and shows the importance of stimulus duration in acceptance studies. A model to predict the completion ratio of videos is provided and shows good accuracy. By using this model, quality requirements can be derived on the basis of the target abandonment rate and content duration. This work will help video streaming providers to define suitable coding conditions when preparing content to be broadcast on their platform that will maintain user engagement.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115099430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Genre Classification for Gaming Videos 自动类型分类的游戏视频
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287122
Steve Goering, Robert Steger, Rakesh Rao Ramachandra Rao, A. Raake
Besides classical videos, videos of gaming matches, entire tournaments or individual sessions are streamed and viewed all over the world. The increased popularity of Twitch or YoutubeGaming shows the importance of additional research on gaming videos. One important pre-condition for live or offline encoding of gaming videos is the knowledge of game-specific properties. Knowing or automatically predicting the genre of a gaming video enables a more advanced and optimized encoding pipeline for streaming providers, especially because gaming videos of different genres vary a lot from classical 2D video, e.g., considering the CGI content, textures or camera motion. We describe several computer-vision based features that are optimized for speed and motivated by characteristics of popular games, to automatically predict the genre of a gaming video. Our prediction system uses random forest and gradient boosting trees as underlying machine-learning techniques, combined with feature selection. For the evaluation of our approach we use a dataset that was built as part of this work and consists of recorded gaming sessions for 6 genres from Twitch. In total 351 different videos are considered. We show that our prediction approach shows a good performance in terms of f1-score. Besides the evaluation of different machine-learning approaches, we additionally investigate the influence of the hyper-parameters for the algorithms.
除了经典的视频,游戏比赛的视频,整个比赛或个人会议都流媒体和世界各地观看。Twitch或youtubeaming的日益流行表明了对游戏视频进行额外研究的重要性。对游戏视频进行实时或离线编码的一个重要先决条件是了解游戏特定属性。了解或自动预测游戏视频的类型可以为流媒体提供商提供更先进和优化的编码管道,特别是因为不同类型的游戏视频与经典2D视频有很大不同,例如,考虑到CGI内容,纹理或摄像机运动。我们描述了几个基于计算机视觉的特征,这些特征针对速度进行了优化,并受到流行游戏特征的激励,以自动预测游戏视频的类型。我们的预测系统使用随机森林和梯度增强树作为潜在的机器学习技术,并结合特征选择。为了评估我们的方法,我们使用了一个数据集,该数据集是作为这项工作的一部分而构建的,由来自Twitch的6种类型的游戏会话记录组成。总共考虑了351个不同的视频。我们表明,我们的预测方法在f1-score方面表现出良好的性能。除了评估不同的机器学习方法外,我们还研究了超参数对算法的影响。
{"title":"Automated Genre Classification for Gaming Videos","authors":"Steve Goering, Robert Steger, Rakesh Rao Ramachandra Rao, A. Raake","doi":"10.1109/MMSP48831.2020.9287122","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287122","url":null,"abstract":"Besides classical videos, videos of gaming matches, entire tournaments or individual sessions are streamed and viewed all over the world. The increased popularity of Twitch or YoutubeGaming shows the importance of additional research on gaming videos. One important pre-condition for live or offline encoding of gaming videos is the knowledge of game-specific properties. Knowing or automatically predicting the genre of a gaming video enables a more advanced and optimized encoding pipeline for streaming providers, especially because gaming videos of different genres vary a lot from classical 2D video, e.g., considering the CGI content, textures or camera motion. We describe several computer-vision based features that are optimized for speed and motivated by characteristics of popular games, to automatically predict the genre of a gaming video. Our prediction system uses random forest and gradient boosting trees as underlying machine-learning techniques, combined with feature selection. For the evaluation of our approach we use a dataset that was built as part of this work and consists of recorded gaming sessions for 6 genres from Twitch. In total 351 different videos are considered. We show that our prediction approach shows a good performance in terms of f1-score. Besides the evaluation of different machine-learning approaches, we additionally investigate the influence of the hyper-parameters for the algorithms.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115686579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Object-Oriented Motion Estimation using Edge-Based Image Registration 基于边缘图像配准的面向对象运动估计
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287129
Md. Asikuzzaman, Deepak Rajamohan, M. Pickering
Video data storage and transmission cost can be reduced by minimizing the temporally redundant information among frames using an appropriate motion-compensated prediction technique. In the current video coding standard, the neighbouring frames are exploited to predict the motion of the current frame using global motion estimation-based approaches. However, the global motion estimation of a frame may not produce the actual motion of individual objects in the frame as each of the objects in a frame usually has its own motion. In this paper, an edge-based motion estimation technique is presented that finds the motion of each object in the frame rather than finding the global motion of that frame. In the proposed method, edge position difference (EPD) similarity measure-based image registration between the two frames is applied to register each object in the frame. A superpixel search is then applied to segment the registered object. Finally, the proposed edge-based image registration technique and Demons algorithm are applied to predict the objects in the current frame. Our experimental analysis demonstrates that the proposed algorithm can estimate the motions of individual objects in the current frame accurately compared to the existing global motion estimation-based approaches.
采用适当的运动补偿预测技术,最大限度地减少帧间的时间冗余信息,从而降低视频数据的存储和传输成本。在当前的视频编码标准中,使用基于全局运动估计的方法利用相邻帧来预测当前帧的运动。然而,一帧的全局运动估计可能不会产生帧中单个物体的实际运动,因为一帧中的每个物体通常都有自己的运动。本文提出了一种基于边缘的运动估计技术,该技术可以发现帧中每个物体的运动,而不是寻找该帧的全局运动。该方法采用基于边缘位置差(EPD)相似度度量的两帧图像配准方法对帧内的目标进行配准。然后应用超像素搜索对注册对象进行分割。最后,利用本文提出的基于边缘的图像配准技术和Demons算法对当前帧中的目标进行预测。实验分析表明,与现有的基于全局运动估计的方法相比,该算法可以准确地估计当前帧中单个物体的运动。
{"title":"Object-Oriented Motion Estimation using Edge-Based Image Registration","authors":"Md. Asikuzzaman, Deepak Rajamohan, M. Pickering","doi":"10.1109/MMSP48831.2020.9287129","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287129","url":null,"abstract":"Video data storage and transmission cost can be reduced by minimizing the temporally redundant information among frames using an appropriate motion-compensated prediction technique. In the current video coding standard, the neighbouring frames are exploited to predict the motion of the current frame using global motion estimation-based approaches. However, the global motion estimation of a frame may not produce the actual motion of individual objects in the frame as each of the objects in a frame usually has its own motion. In this paper, an edge-based motion estimation technique is presented that finds the motion of each object in the frame rather than finding the global motion of that frame. In the proposed method, edge position difference (EPD) similarity measure-based image registration between the two frames is applied to register each object in the frame. A superpixel search is then applied to segment the registered object. Finally, the proposed edge-based image registration technique and Demons algorithm are applied to predict the objects in the current frame. Our experimental analysis demonstrates that the proposed algorithm can estimate the motions of individual objects in the current frame accurately compared to the existing global motion estimation-based approaches.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1