首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Multi-Plane Image Video Compression 多平面图像视频压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287083
Scott Janus, J. Boyce, S. Bhatia, J. Tanner, Atul Divekar, Penne Lee
Multiplane Images (MPI) is a new approach for storing volumetric content. MPI represents a 3D scene within a view frustum with typically 32 planes of texture and transparency information per camera. MPI literature to date has been focused on still images but applying MPI to video will require substantial compression in order to be viable for real world productions. In this paper, we describe several techniques for compressing MPI video sequences by reducing pixel rate while maintaining acceptable visual quality. We focus on using traditional video compression codecs such as HEVC. While certainly a new codec algorithm specifically tailored to MPI would likely achieve very good results, no such devices exist today that support this hypothetical MPI codec. By comparison, hundreds of millions of real-time HEVC decoders are present in laptops and TVs today.
多平面图像(MPI)是一种存储体积内容的新方法。MPI代表一个视域内的3D场景,每个相机通常有32个纹理平面和透明度信息。迄今为止,MPI文献主要集中在静态图像上,但将MPI应用于视频将需要大量压缩,以便在现实世界的制作中可行。在本文中,我们描述了几种通过降低像素率同时保持可接受的视觉质量来压缩MPI视频序列的技术。我们专注于使用传统的视频压缩编解码器,如HEVC。当然,专门为MPI量身定制的新编解码器算法可能会取得非常好的效果,但目前还没有这样的设备支持这种假设的MPI编解码器。相比之下,数以亿计的实时HEVC解码器目前存在于笔记本电脑和电视中。
{"title":"Multi-Plane Image Video Compression","authors":"Scott Janus, J. Boyce, S. Bhatia, J. Tanner, Atul Divekar, Penne Lee","doi":"10.1109/MMSP48831.2020.9287083","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287083","url":null,"abstract":"Multiplane Images (MPI) is a new approach for storing volumetric content. MPI represents a 3D scene within a view frustum with typically 32 planes of texture and transparency information per camera. MPI literature to date has been focused on still images but applying MPI to video will require substantial compression in order to be viable for real world productions. In this paper, we describe several techniques for compressing MPI video sequences by reducing pixel rate while maintaining acceptable visual quality. We focus on using traditional video compression codecs such as HEVC. While certainly a new codec algorithm specifically tailored to MPI would likely achieve very good results, no such devices exist today that support this hypothetical MPI codec. By comparison, hundreds of millions of real-time HEVC decoders are present in laptops and TVs today.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115664149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Cross-Component Linear Model For Chroma Intra Prediction 色度内预测的联合交叉分量线性模型
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287167
R. G. Youvalari, J. Lainema
The Cross-Component Linear Model (CCLM) is an intra prediction technique that is adopted into the upcoming Versatile Video Coding (VVC) standard. CCLM attempts to reduce the inter-channel correlation by using a linear model. For that, the parameters of the model are calculated based on the reconstructed samples in luma channel as well as neighboring samples of the chroma coding block. In this paper, we propose a new method, called as Joint Cross-Component Linear Model (J-CCLM), in order to improve the prediction efficiency of the tool. The proposed J-CCLM technique predicts the samples of the coding block with a multi-hypothesis approach which consists of combining two intra prediction modes. To that end, the final prediction of the block is achieved by combining the conventional CCLM mode with an angular mode that is derived from the co-located luma block. The conducted experiments in VTM-8.0 test model of VVC illustrated that the proposed method provides on average more than 1.0% BD-Rate gain in chroma channels. Furthermore, the weighted YCbCr bitrate savings of 0.24% and 0.54% are achieved in 4:2:0 and 4:4:4 color formats, respectively.
跨分量线性模型(Cross-Component Linear Model, CCLM)是一种用于即将到来的通用视频编码(VVC)标准的帧内预测技术。CCLM试图通过使用线性模型来降低信道间的相关性。为此,基于亮度通道重构样本和色度编码块相邻样本计算模型参数。为了提高刀具的预测效率,本文提出了一种新的方法——联合交叉分量线性模型(J-CCLM)。提出的J-CCLM技术采用多假设方法对编码块样本进行预测,该方法结合了两种内预测模式。为此,通过将传统的CCLM模式与从共定位光斑块中导出的角模式相结合来实现块的最终预测。在VVC的VTM-8.0测试模型上进行的实验表明,该方法在色度通道上的BD-Rate平均增益大于1.0%。此外,在4:2:0和4:4:4颜色格式下,加权YCbCr比特率分别节省0.24%和0.54%。
{"title":"Joint Cross-Component Linear Model For Chroma Intra Prediction","authors":"R. G. Youvalari, J. Lainema","doi":"10.1109/MMSP48831.2020.9287167","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287167","url":null,"abstract":"The Cross-Component Linear Model (CCLM) is an intra prediction technique that is adopted into the upcoming Versatile Video Coding (VVC) standard. CCLM attempts to reduce the inter-channel correlation by using a linear model. For that, the parameters of the model are calculated based on the reconstructed samples in luma channel as well as neighboring samples of the chroma coding block. In this paper, we propose a new method, called as Joint Cross-Component Linear Model (J-CCLM), in order to improve the prediction efficiency of the tool. The proposed J-CCLM technique predicts the samples of the coding block with a multi-hypothesis approach which consists of combining two intra prediction modes. To that end, the final prediction of the block is achieved by combining the conventional CCLM mode with an angular mode that is derived from the co-located luma block. The conducted experiments in VTM-8.0 test model of VVC illustrated that the proposed method provides on average more than 1.0% BD-Rate gain in chroma channels. Furthermore, the weighted YCbCr bitrate savings of 0.24% and 0.54% are achieved in 4:2:0 and 4:4:4 color formats, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116603788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mesh Coding Extensions to MPEG-I V-PCC 网格编码扩展到MPEG-I V-PCC
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287057
Esmaeil Faramarzi, R. Joshi, M. Budagavi
Dynamic point clouds and meshes are used in a wide variety of applications such as gaming, visualization, medicine, and more recently AR/VR/MR. This paper presents two extensions of MPEG-I Video-based Point Cloud Compression (V-PCC) standard to support mesh coding. The extensions are based on Edgebreaker and TFAN mesh connectivity coding algorithms implemented in the Google Draco software and the MPEG SC3DMC software for mesh coding, respectively. Lossless results for the proposed frameworks on top of version 8.0 of the MPEG-I V-PCC test model (TMC2) are presented and compared with Draco for dense meshes.
动态点云和网格被广泛应用于游戏、可视化、医学以及最近的AR/VR/MR等应用中。本文提出了MPEG-I基于视频的点云压缩(V-PCC)标准的两个扩展,以支持网格编码。这些扩展分别基于Edgebreaker和TFAN网格连接编码算法,这些算法分别在Google Draco软件和MPEG SC3DMC软件中实现。给出了基于MPEG-I V-PCC测试模型(TMC2) 8.0的框架的无损结果,并与Draco进行了密集网格的比较。
{"title":"Mesh Coding Extensions to MPEG-I V-PCC","authors":"Esmaeil Faramarzi, R. Joshi, M. Budagavi","doi":"10.1109/MMSP48831.2020.9287057","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287057","url":null,"abstract":"Dynamic point clouds and meshes are used in a wide variety of applications such as gaming, visualization, medicine, and more recently AR/VR/MR. This paper presents two extensions of MPEG-I Video-based Point Cloud Compression (V-PCC) standard to support mesh coding. The extensions are based on Edgebreaker and TFAN mesh connectivity coding algorithms implemented in the Google Draco software and the MPEG SC3DMC software for mesh coding, respectively. Lossless results for the proposed frameworks on top of version 8.0 of the MPEG-I V-PCC test model (TMC2) are presented and compared with Draco for dense meshes.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130184671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation 利用音频编解码器改进自动语音识别以增强数据
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287127
N. Hailu, Ingo Siegert, A. Nürnberger
To train end-to-end automatic speech recognition models, it requires a large amount of labeled speech data. This goal is challenging for languages with fewer resources. In contrast to the commonly used feature level data augmentation, we propose to expand the training set by using different audio codecs at the data level. The augmentation method consists of using different audio codecs with changed bit rate, sampling rate, and bit depth. The change reassures variation in the input data without drastically affecting the audio quality. Besides, we can ensure that humans still perceive the audio, and any feature extraction is possible later. To demonstrate the general applicability of the proposed augmentation technique, we evaluated it in an end-to-end automatic speech recognition architecture in four languages. After applying the method, on the Amharic, Dutch, Slovenian, and Turkish datasets, we achieved a 1.57 average improvement in the character error rates (CER) without integrating language models. The result is comparable to the baseline result, showing CER improvement of 2.78, 1.25, 1.21, and 1.05 for each language. On the Amharic dataset, we reached a syllable error rate reduction of 6.12 compared to the baseline result.
为了训练端到端自动语音识别模型,需要大量的标记语音数据。这个目标对于资源较少的语言来说是一个挑战。与常用的特征级数据增强方法不同,我们提出在数据级使用不同的音频编解码器来扩展训练集。增强方法包括使用不同的音频编解码器,改变比特率、采样率和比特深度。这种变化保证了输入数据的变化,而不会严重影响音频质量。此外,我们可以确保人类仍然感知到音频,并且以后可以提取任何特征。为了证明所提出的增强技术的一般适用性,我们在四种语言的端到端自动语音识别体系结构中对其进行了评估。应用该方法后,在阿姆哈拉语、荷兰语、斯洛文尼亚语和土耳其语数据集上,我们在不集成语言模型的情况下实现了字符错误率(CER)的平均提高1.57。结果与基线结果相当,每种语言的CER改进分别为2.78、1.25、1.21和1.05。在阿姆哈拉语数据集上,与基线结果相比,我们的音节错误率降低了6.12。
{"title":"Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation","authors":"N. Hailu, Ingo Siegert, A. Nürnberger","doi":"10.1109/MMSP48831.2020.9287127","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287127","url":null,"abstract":"To train end-to-end automatic speech recognition models, it requires a large amount of labeled speech data. This goal is challenging for languages with fewer resources. In contrast to the commonly used feature level data augmentation, we propose to expand the training set by using different audio codecs at the data level. The augmentation method consists of using different audio codecs with changed bit rate, sampling rate, and bit depth. The change reassures variation in the input data without drastically affecting the audio quality. Besides, we can ensure that humans still perceive the audio, and any feature extraction is possible later. To demonstrate the general applicability of the proposed augmentation technique, we evaluated it in an end-to-end automatic speech recognition architecture in four languages. After applying the method, on the Amharic, Dutch, Slovenian, and Turkish datasets, we achieved a 1.57 average improvement in the character error rates (CER) without integrating language models. The result is comparable to the baseline result, showing CER improvement of 2.78, 1.25, 1.21, and 1.05 for each language. On the Amharic dataset, we reached a syllable error rate reduction of 6.12 compared to the baseline result.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130966940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability 基于深度学习的点云几何编码与分辨率可扩展性
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287060
André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira
Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.
点云是一种3D视觉表示格式,最近在沉浸式和交互式多媒体应用程序中变得非常重要。考虑到实际相关点云的点数量众多,市场需求不断增加,高效的点云编码已成为一个重要的研究课题。此外,可扩展性是点云编码的一个重要特征,特别是在实时应用中,对解码点云的快速高效访问是很重要的;然而,这一问题在文献中仍未得到充分探讨。在此背景下,本文提出了一种新的基于深度学习的点云几何编码方案,该方案通过隔行子采样具有分辨率可扩展性。随着额外层的解码,重构点云中的点数量增加,整体质量也随之提高。实验结果表明,所提出的可扩展点云几何编码方案优于当前基于MPEG几何的点云压缩标准。
{"title":"Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability","authors":"André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira","doi":"10.1109/MMSP48831.2020.9287060","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287060","url":null,"abstract":"Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125395526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Skeleton-based motion estimation for Point Cloud Compression 基于骨架的点云压缩运动估计
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287165
Chao Cao, C. Tulvan, M. Preda, T. Zaharia
With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.
随着点云获取技术的快速发展,高质量的人形点云越来越多地应用于VR/AR应用以及三维图形领域。为了达到接近真实的质量,这样的内容通常包含非常多的点(每帧3D对象超过50万个点)和相关属性(如颜色)。出于这个原因,处理有效的、专用的3D点云压缩(3DPCC)方法变得势在必行。在动态内容的情况下,这种要求甚至更强,其中3D点的坐标和属性随着时间的推移而变化。在本文中,我们提出了一种新的基于骨架的3DPCC方法,专门针对代表人形化身的动态点云的具体情况。该方法依赖于三维动态点云的多视图二维人体姿态估计。首先利用DensePose神经网络从投影的二维图像中提取人体部位。得到的二维分割信息被反向投影并聚合到三维空间中。这个过程使得将3D点云划分为一组3D身体部分成为可能。对于每个部分,在每两个连续帧之间估计一个三维仿射变换,并用于三维运动补偿。该方法已集成到基于视频的MPEG点云压缩(V-PCC)测试模型中。实验结果表明,在小幅度身体运动的特殊情况下,该方法在低比特率条件下的比特率降低率高达83%,优于有损互编码条件下的V-PCC测试模式。同时,所提出的框架具有支持各种特征的潜力,如兴趣区域和细节水平。
{"title":"Skeleton-based motion estimation for Point Cloud Compression","authors":"Chao Cao, C. Tulvan, M. Preda, T. Zaharia","doi":"10.1109/MMSP48831.2020.9287165","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287165","url":null,"abstract":"With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124308155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DEMI: Deep Video Quality Estimation Model using Perceptual Video Quality Dimensions 使用感知视频质量维度的深度视频质量估计模型
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287080
Saman Zadtootaghaj, Nabajeet Barman, Rakesh Rao Ramachandra Rao, Steve Göring, M. Martini, A. Raake, S. Möller
Existing works in the field of quality assessment focus separately on gaming and non-gaming content. Along with the traditional modeling approaches, deep learning based approaches have been used to develop quality models, due to their high prediction accuracy. In this paper, we present a deep learning based quality estimation model considering both gaming and non-gaming videos. The model is developed in three phases. First, a convolutional neural network (CNN) is trained based on an objective metric which allows the CNN to learn video artifacts such as blurriness and blockiness. Next, the model is fine-tuned based on a small image quality dataset using blockiness and blurriness ratings. Finally, a Random Forest is used to pool frame-level predictions and temporal information of videos in order to predict the overall video quality. The light-weight, low complexity nature of the model makes it suitable for real-time applications considering both gaming and non-gaming content while achieving similar performance to existing state-of-the-art model NDNetGaming. The model implementation for testing is available on GitHub1.
在质量评估领域的现有工作分别侧重于游戏和非游戏内容。与传统的建模方法一样,基于深度学习的方法由于其较高的预测精度而被用于开发高质量的模型。在本文中,我们提出了一个基于深度学习的质量估计模型,同时考虑了游戏和非游戏视频。该模型的发展分为三个阶段。首先,基于客观度量训练卷积神经网络(CNN),该度量允许CNN学习视频伪像,如模糊和块。接下来,基于使用块度和模糊度评级的小图像质量数据集对模型进行微调。最后,利用随机森林对视频的帧级预测和时间信息进行汇总,以预测整体视频质量。该模型的轻量、低复杂性使其适合考虑游戏和非游戏内容的实时应用,同时实现与现有最先进模型NDNetGaming相似的性能。用于测试的模型实现可在GitHub1上获得。
{"title":"DEMI: Deep Video Quality Estimation Model using Perceptual Video Quality Dimensions","authors":"Saman Zadtootaghaj, Nabajeet Barman, Rakesh Rao Ramachandra Rao, Steve Göring, M. Martini, A. Raake, S. Möller","doi":"10.1109/MMSP48831.2020.9287080","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287080","url":null,"abstract":"Existing works in the field of quality assessment focus separately on gaming and non-gaming content. Along with the traditional modeling approaches, deep learning based approaches have been used to develop quality models, due to their high prediction accuracy. In this paper, we present a deep learning based quality estimation model considering both gaming and non-gaming videos. The model is developed in three phases. First, a convolutional neural network (CNN) is trained based on an objective metric which allows the CNN to learn video artifacts such as blurriness and blockiness. Next, the model is fine-tuned based on a small image quality dataset using blockiness and blurriness ratings. Finally, a Random Forest is used to pool frame-level predictions and temporal information of videos in order to predict the overall video quality. The light-weight, low complexity nature of the model makes it suitable for real-time applications considering both gaming and non-gaming content while achieving similar performance to existing state-of-the-art model NDNetGaming. The model implementation for testing is available on GitHub1.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114807380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Towards Fast and Efficient VVC Encoding 实现快速高效的VVC编码
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287093
J. Brandenburg, A. Wieckowski, Tobias Hinz, Anastasia Henkel, Valeri George, Ivan Zupancic, C. Stoffers, B. Bross, H. Schwarz, D. Marpe
Versatile Video Coding (VVC) is a new international video coding standard to be finalized in July 2020. It is designed to provide around 50% bit-rate saving at the same subjective visual quality over its predecessor, High Efficiency Video Coding (H.265/HEVC). During the standard development, objective bit-rate savings of around 40% have been reported for the VVC reference software (VTM) compared to the HEVC reference software (HM). The unoptimized VTM encoder is around 9x, and the decoder around 2x, slower than HM. This paper discusses the VVC encoder complexity in terms of soft-ware runtime. The modular design of the standard allows a VVC encoder to trade off bit-rate savings and encoder runtime. Based on a detailed tradeoff analysis, results for different operating points are reported. Additionally, initial work on software and algorithm optimization is presented. With the optimized software algorithms, an operating point with an over 22x faster single-threaded encoder runtime than VTM can be achieved, i.e. around 2.5x faster than HM, while still providing more than 30% bit-rate savings over HM. Finally, our experiments demonstrate the flexibility of VVC and its potential for optimized soft-ware encoder implementations.
通用视频编码(VVC)是一项新的国际视频编码标准,将于2020年7月定稿。它的设计是在相同的主观视觉质量下提供大约50%的比特率节省,比它的前身,高效视频编码(H.265/HEVC)。在标准开发过程中,与HEVC参考软件(HM)相比,VVC参考软件(VTM)的客观比特率节省了约40%。未优化的VTM编码器约为9倍,解码器约为2倍,比HM慢。本文从软件运行时的角度讨论了VVC编码器的复杂性。该标准的模块化设计允许VVC编码器权衡比特率节省和编码器运行时间。基于详细的权衡分析,报告了不同操作点的结果。此外,还介绍了软件和算法优化方面的初步工作。通过优化的软件算法,可以实现比VTM快22倍以上的单线程编码器运行时间的操作点,即比HM快2.5倍左右,同时仍然提供比HM节省30%以上的比特率。最后,我们的实验证明了VVC的灵活性及其优化软件编码器实现的潜力。
{"title":"Towards Fast and Efficient VVC Encoding","authors":"J. Brandenburg, A. Wieckowski, Tobias Hinz, Anastasia Henkel, Valeri George, Ivan Zupancic, C. Stoffers, B. Bross, H. Schwarz, D. Marpe","doi":"10.1109/MMSP48831.2020.9287093","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287093","url":null,"abstract":"Versatile Video Coding (VVC) is a new international video coding standard to be finalized in July 2020. It is designed to provide around 50% bit-rate saving at the same subjective visual quality over its predecessor, High Efficiency Video Coding (H.265/HEVC). During the standard development, objective bit-rate savings of around 40% have been reported for the VVC reference software (VTM) compared to the HEVC reference software (HM). The unoptimized VTM encoder is around 9x, and the decoder around 2x, slower than HM. This paper discusses the VVC encoder complexity in terms of soft-ware runtime. The modular design of the standard allows a VVC encoder to trade off bit-rate savings and encoder runtime. Based on a detailed tradeoff analysis, results for different operating points are reported. Additionally, initial work on software and algorithm optimization is presented. With the optimized software algorithms, an operating point with an over 22x faster single-threaded encoder runtime than VTM can be achieved, i.e. around 2.5x faster than HM, while still providing more than 30% bit-rate savings over HM. Finally, our experiments demonstrate the flexibility of VVC and its potential for optimized soft-ware encoder implementations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124484794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Open-Source RTP Library for High-Speed 4K HEVC Video Streaming 开源RTP库用于高速4K HEVC视频流
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287162
Aaro Altonen, Joni Räsänen, Jaakko Laitinen, Marko Viitanen, Jarno Vanne
Efficient transport technologies for High Efficiency Video Coding (HEVC) are key enablers for economic 4K video transmission in current telecommunication networks. This paper introduces a novel open-source Real-time Transport Protocol (RTP) library called uvgRTP for high-speed 4K HEVC video streaming. Our library supports the latest RFC 3550 specification for RTP and an associated RFC 7798 RTP payload format for HEVC. It is written in C++ under a permissive 2-clause BSD license and it can be run on both Linux and Windows operating systems with a user-friendly interface. Our experiments on an Intel Core i7-4770 CPU show that uvgRTP is able to stream HEVC video at 5.0 Gb/s over a local 10 Gb/s network. It attains 4.4 times as high peak goodput and 92.1% lower latency than the state-of-the-art FFmpeg multimedia framework. It also outperforms LIVE555 with over double the goodput and 82.3% lower latency. These results indicate that uvgRTP is currently the fastest open-source RTP library for 4K HEVC video streaming.
高效视频编码(HEVC)的高效传输技术是当前电信网络中4K视频经济传输的关键实现因素。本文介绍了一种新型的开源实时传输协议库uvgRTP,用于高速4K HEVC视频流。我们的库支持最新的RFC 3550 RTP规范和相关的RFC 7798 RTP有效载荷格式的HEVC。它是在允许的2条款BSD许可下用c++编写的,它可以在Linux和Windows操作系统上运行,具有用户友好的界面。我们在英特尔酷睿i7-4770 CPU上的实验表明,uvgRTP能够在本地10gb /s的网络上以5.0 Gb/s的速度流式传输HEVC视频。与最先进的FFmpeg多媒体框架相比,它实现了4.4倍的峰值带宽和92.1%的低延迟。它的性能也比LIVE555高出一倍多,延迟降低了82.3%。这些结果表明,uvgRTP是目前4K HEVC视频流最快的开源RTP库。
{"title":"Open-Source RTP Library for High-Speed 4K HEVC Video Streaming","authors":"Aaro Altonen, Joni Räsänen, Jaakko Laitinen, Marko Viitanen, Jarno Vanne","doi":"10.1109/MMSP48831.2020.9287162","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287162","url":null,"abstract":"Efficient transport technologies for High Efficiency Video Coding (HEVC) are key enablers for economic 4K video transmission in current telecommunication networks. This paper introduces a novel open-source Real-time Transport Protocol (RTP) library called uvgRTP for high-speed 4K HEVC video streaming. Our library supports the latest RFC 3550 specification for RTP and an associated RFC 7798 RTP payload format for HEVC. It is written in C++ under a permissive 2-clause BSD license and it can be run on both Linux and Windows operating systems with a user-friendly interface. Our experiments on an Intel Core i7-4770 CPU show that uvgRTP is able to stream HEVC video at 5.0 Gb/s over a local 10 Gb/s network. It attains 4.4 times as high peak goodput and 92.1% lower latency than the state-of-the-art FFmpeg multimedia framework. It also outperforms LIVE555 with over double the goodput and 82.3% lower latency. These results indicate that uvgRTP is currently the fastest open-source RTP library for 4K HEVC video streaming.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124222832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
MultiANet: a Multi-Attention Network for Defocus Blur Detection MultiANet:用于离焦模糊检测的多注意力网络
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287072
Zeyu Jiang, Xun Xu, Chao Zhang, Ce Zhu
Defocus blur detection is a challenging task because of obscure homogenous regions and interferences of background clutter. Most existing deep learning-based methods mainly focus on building wider or deeper network to capture multi-level features, neglecting to extract the feature relationships of intermediate layers, thus hindering the discriminative ability of network. Moreover, fusing features at different levels have been demonstrated to be effective. However, direct integrating without distinction is not optimal because low-level features focus on fine details only and could be distracted by background clutters. To address these issues, we propose the Multi-Attention Network for stronger discriminative learning and spatial guided low-level feature learning. Specifically, a channel-wise attention module is applied to both high-level and low-level feature maps to capture channel-wise global dependencies. In addition, a spatial attention module is employed to low-level features maps to emphasize effective detailed information. Experimental results show the performance of our network is superior to the state-of-the-art algorithms.
散焦模糊检测是一项具有挑战性的任务,主要是由于模糊的均匀区域和背景杂波的干扰。现有的基于深度学习的方法大多侧重于构建更宽或更深的网络来捕获多层次特征,忽略了提取中间层的特征关系,从而阻碍了网络的判别能力。此外,不同层次的特征融合已被证明是有效的。然而,不加区分的直接集成并不是最优的,因为低级特征只关注细节,可能会被背景杂乱分散注意力。为了解决这些问题,我们提出了多注意网络来进行更强的判别学习和空间引导的低级特征学习。具体来说,一个channel-wise attention模块被应用于高级和低级特征映射,以捕获channel-wise全局依赖关系。此外,对底层地物图采用空间注意模块,强调有效的细节信息。实验结果表明,该网络的性能优于目前最先进的算法。
{"title":"MultiANet: a Multi-Attention Network for Defocus Blur Detection","authors":"Zeyu Jiang, Xun Xu, Chao Zhang, Ce Zhu","doi":"10.1109/MMSP48831.2020.9287072","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287072","url":null,"abstract":"Defocus blur detection is a challenging task because of obscure homogenous regions and interferences of background clutter. Most existing deep learning-based methods mainly focus on building wider or deeper network to capture multi-level features, neglecting to extract the feature relationships of intermediate layers, thus hindering the discriminative ability of network. Moreover, fusing features at different levels have been demonstrated to be effective. However, direct integrating without distinction is not optimal because low-level features focus on fine details only and could be distracted by background clutters. To address these issues, we propose the Multi-Attention Network for stronger discriminative learning and spatial guided low-level feature learning. Specifically, a channel-wise attention module is applied to both high-level and low-level feature maps to capture channel-wise global dependencies. In addition, a spatial attention module is employed to low-level features maps to emphasize effective detailed information. Experimental results show the performance of our network is superior to the state-of-the-art algorithms.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1