首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
A Large-scale Evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on Gaming Content 基于比特流的视频质量模型ITU-T P.1204.3在游戏内容上的大规模评估
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287055
Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake
The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.
近年来,游戏内容流(无论是被动的还是互动的)呈现出多种形式。游戏内容带来了一些传统2D视频所不具备的特点,例如内容的人工合成性质或游戏中物体的重复。此外,由于游戏内容的特殊性,用户对游戏内容的感知与传统2D视频不同,而且用户可能并不经常观看这类内容。因此,评估通常为传统2D视频设计的现有视频质量模型是否适用于游戏内容就变得势在必行。在本文中,我们评估了最近标准化的基于比特流的视频质量模型ITU-T P.1204.3在游戏内容上的适用性。为了分析该模型的性能,我们使用了4个不同的游戏数据集(3个公开可用+ 1个内部数据集),并将其与现有的最先进的模型进行比较。我们发现,ITU P.1204.3开箱模型在这些未见过的数据集上表现良好,在所有4个数据库中,5点绝对类别评级的RMSE范围为0.38 - 0.45,Pearson相关性为0.85 - 0.93。我们进一步提出了P.1204.3模型的全高清版本,因为原始模型经过了训练和验证,目标分辨率为4K/UHD-1。在所有数据库中使用50:50分割来训练和验证该变体,以确保所建议的模型适用于各种条件。
{"title":"A Large-scale Evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on Gaming Content","authors":"Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake","doi":"10.1109/MMSP48831.2020.9287055","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287055","url":null,"abstract":"The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133234512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Automatic Gain Control for Enhanced HDR Performance on Audio 自动增益控制增强音频上的HDR性能
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287160
D. Garcia, J. Hernandez, Steve Mann
We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.
介绍了一种通过自动控制各个信号通道的增益来提高音频信号高动态范围(HDR)技术性能的方法。自动增益控制(AGC)通过确保输入信号包含在期望范围内来补偿接收器的动态范围,而HDR利用这些多通道增益来扩展合成信号的动态范围。结果证明,当它们一起使用时,每种方法所带来的好处是复合的。实际上,我们产生了一个动态高动态范围(DHDR)复合信号。仿真显示了HDR AGC方法在不同条件下的性能增益。然后使用自定义PCB和微控制器实现该方法,以显示在现实世界和实时应用中的可行性。
{"title":"Automatic Gain Control for Enhanced HDR Performance on Audio","authors":"D. Garcia, J. Hernandez, Steve Mann","doi":"10.1109/MMSP48831.2020.9287160","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287160","url":null,"abstract":"We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133837621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
High Frame-Rate Virtual View Synthesis Based on Low Frame-Rate Input 基于低帧率输入的高帧率虚拟视图合成
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287076
K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek
In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.
本文研究了基于低帧率摄像机获取高分辨率、高帧率虚拟视图的方法,并将其应用于高性能多视图系统。我们演示了如何为多视图采集系统设置同步以记录所需的数据,然后如何处理数据以更高的帧速率创建虚拟视图,同时保持视图的高分辨率。我们分析了将时间帧插值与另一种侧视图合成技术相结合的各种方法,该技术使我们能够创建虚拟视点所需的高帧率视频。结果表明,所提出的方法能够提供预期的高质量、高分辨率和高帧率的虚拟视图。
{"title":"High Frame-Rate Virtual View Synthesis Based on Low Frame-Rate Input","authors":"K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek","doi":"10.1109/MMSP48831.2020.9287076","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287076","url":null,"abstract":"In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Criteria Contrast Enhancement Evaluation Measure using Wavelet Decomposition 基于小波分解的多准则对比度增强评价方法
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287051
Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi
An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric
一种有效的对比度增强方法不仅要提高图像的感知质量,而且要避免添加任何伪影或影响图像的自然度。这使得对比度增强评估(CEE)成为一项具有挑战性的任务,因为需要检查图像质量的改善和不必要的副作用。目前,还没有一种单一的CEE指标可以很好地适用于所有类型的增强标准。在本文中,我们提出了一种新的多标准CEE (MCCEE)测量,它有效地结合了不同的指标来给出一个单一的质量分数。为了充分发挥这些度量的潜力,我们进一步提出将它们应用于小波变换分解后的图像。这个新度量已经在两个自然图像对比度增强数据库以及医学计算机断层扫描(CT)图像上进行了测试。与现有的评估指标相比,结果显示了实质性的改进。度量的代码可在:https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric
{"title":"A Multi-Criteria Contrast Enhancement Evaluation Measure using Wavelet Decomposition","authors":"Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi","doi":"10.1109/MMSP48831.2020.9287051","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287051","url":null,"abstract":"An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Localization and Categorization of Early Reflections for Estimating Acoustic Reflection Coefficients 声反射系数估算中的早期反射定位与分类
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287099
Robert Hupke, Sebastian Lauster, Nils Poschadel, Marcel Nophut, Stephan Preihs, J. Peissig
Knowledge of room acoustic parameters such as frequency- and direction-dependent reflection coefficients, room volume, or geometric characteristics is important for the mod-eling of acoustic environments, e. g. to improve the plausibility of immersive audio in mixed reality applications or to transfer a physical acoustic environment into a completely virtual one. This paper presents a method for detecting first-order reflections in three-dimensions of spatial room impulse responses recorded with a spherical microphone array. By using geometric relations, the estimated direction of arrival (DOA), and the time difference of arrival (TDOA), the order of the respective mirror sound source is determined and categorized to the individual walls of the room. The detected DOA and TDOA of the first-order mirror sound sources are used to estimate the frequency-dependent reflection coefficients of the respective walls using a null-steering beamformer directed to the estimated DOA. Analysis in terms of DOA and TDOA indicates an accurate estimation for simulated and measured data. The estimation of the reflection coefficients shows a relative error of 3.5 % between 500 Hz and 4 kHz for simulated data. Furthermore, experimental challenges are discussed, such as the evaluation of the reflection coefficient estimation in real acoustic environments.
房间声学参数的知识,如频率和方向相关的反射系数,房间体积,或几何特征,对于声学环境的建模是很重要的,例如,在混合现实应用中提高沉浸式音频的合理性,或将物理声学环境转移到一个完全虚拟的环境。本文提出了一种用球形传声器阵列记录空间房间脉冲响应的一阶三维反射检测方法。利用几何关系、估计到达方向(DOA)和到达时间差(TDOA),确定各自镜像声源的顺序,并将其分类到房间的各个墙壁上。利用探测到的一阶镜像声源的DOA和TDOA,利用指向估计DOA的零指向波束形成器估计各自壁面的频率相关反射系数。从DOA和TDOA的角度分析表明,对模拟数据和实测数据的估计是准确的。对模拟数据在500 Hz和4 kHz范围内的反射系数估计的相对误差为3.5%。此外,还讨论了实际声环境下反射系数估计的评价等实验挑战。
{"title":"Localization and Categorization of Early Reflections for Estimating Acoustic Reflection Coefficients","authors":"Robert Hupke, Sebastian Lauster, Nils Poschadel, Marcel Nophut, Stephan Preihs, J. Peissig","doi":"10.1109/MMSP48831.2020.9287099","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287099","url":null,"abstract":"Knowledge of room acoustic parameters such as frequency- and direction-dependent reflection coefficients, room volume, or geometric characteristics is important for the mod-eling of acoustic environments, e. g. to improve the plausibility of immersive audio in mixed reality applications or to transfer a physical acoustic environment into a completely virtual one. This paper presents a method for detecting first-order reflections in three-dimensions of spatial room impulse responses recorded with a spherical microphone array. By using geometric relations, the estimated direction of arrival (DOA), and the time difference of arrival (TDOA), the order of the respective mirror sound source is determined and categorized to the individual walls of the room. The detected DOA and TDOA of the first-order mirror sound sources are used to estimate the frequency-dependent reflection coefficients of the respective walls using a null-steering beamformer directed to the estimated DOA. Analysis in terms of DOA and TDOA indicates an accurate estimation for simulated and measured data. The estimation of the reflection coefficients shows a relative error of 3.5 % between 500 Hz and 4 kHz for simulated data. Furthermore, experimental challenges are discussed, such as the evaluation of the reflection coefficient estimation in real acoustic environments.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116024673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Luminance Patterns for Point Cloud Quality Assessment 用于点云质量评估的局部亮度模式
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287154
Rafael Diniz, P. Freitas, Mylène C. Q. Farias
In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.
近年来,点云(PC)作为表示3D视觉内容的首选数据结构越来越受欢迎。PC应用程序的示例范围从小物体的3D表示到大地图。PC的出现引发了新的编码、传输和表示方法的发展。与此同时,还出现了评估PC内容视觉质量的新方法。本文提出了一种新的PC内容的客观全参考视觉质量度量,它使用了一个被提议的描述符,称为局部亮度模式(LLP)。它提取参考PC和测试PC的亮度信息的统计数据,并将其统计数据进行比较,以评估测试PC的感知质量。所提出的PC机质量评价方法适用于大型和小型PC机。使用公开可用的PC质量数据集,我们将所提出的方法与当前最先进的PC质量指标进行了比较,得到了相互竞争的结果。
{"title":"Local Luminance Patterns for Point Cloud Quality Assessment","authors":"Rafael Diniz, P. Freitas, Mylène C. Q. Farias","doi":"10.1109/MMSP48831.2020.9287154","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287154","url":null,"abstract":"In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116343394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF 基于深度神经网络的DOA约束CMNMF多声道歌声分离
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287068
A. Muñoz-Montoro, A. Politis, K. Drossos, J. Carabias-Orti
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on complex-valued multichannel non-negative matrix factorization (CMNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CMNMF method outperforms both the individual monophonic DL-based separation and the multichannel CMNMF baseline methods.
这项工作结合了两种强大的方法,多通道频谱分解和最近基于单音深度学习(DL)的频谱推断,解决了多通道源分离问题。不同信道的单个源光谱用掩蔽-去噪双网络估计,能够模拟音乐作品的长期时间模式。单音声源频谱图用于基于复值多通道非负矩阵分解(CMNMF)的空间协方差混合模型,该模型预测每个声源的空间特性。利用一个大型多通道数据集对该框架进行了评估。实验结果表明,我们的联合DL+CMNMF方法优于单个单音DL分离方法和多通道CMNMF基线方法。
{"title":"Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF","authors":"A. Muñoz-Montoro, A. Politis, K. Drossos, J. Carabias-Orti","doi":"10.1109/MMSP48831.2020.9287068","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287068","url":null,"abstract":"This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on complex-valued multichannel non-negative matrix factorization (CMNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CMNMF method outperforms both the individual monophonic DL-based separation and the multichannel CMNMF baseline methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123328475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based skeleton data compression 基于图的骨架数据压缩
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287103
Pratyusha Das, Antonio Ortega
With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.
随着可靠、快速、便携式采集系统的发展,人体运动捕捉数据正广泛应用于许多工业、医疗和监控应用中。这些系统可以同时跟踪多人,提供全身骨骼关键点以及面部、手部和脚部更详细的地标。这导致需要传输或存储大量的骨架数据。本文介绍了基于图的骨架压缩(GSC),这是一种有效的基于图的近无损压缩方法。我们使用了一个可分离的时空图变换和非均匀量化,系数扫描和熵编码与游程码近无损压缩。我们在大型NTU-RGB活动数据集上评估了所提出方法的压缩性能。我们的方法优于一维离散余弦变换方法沿时间方向应用。在近无损模式下,我们提出的压缩不影响动作识别性能。
{"title":"Graph-based skeleton data compression","authors":"Pratyusha Das, Antonio Ortega","doi":"10.1109/MMSP48831.2020.9287103","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287103","url":null,"abstract":"With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Single depth map super-resolution via joint non-local and local modeling 单深度图超分辨率通过联合非局部和局部建模
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287120
Yingying Zhang, Chao Ren, Honggang Chen, Ce Zhu
Depth maps are widely used in 3D imaging techniques because of the appearance of the consumer depth cameras. However, the practical application of the depth map is limited by the poor image quality. In this paper, we propose a novel framework for the single depth map super-resolution via joint the local and non-local constraints simultaneously in the depth map. For the non-local constraint, we use the group-based sparse representation to explore the non-local self-similarity of the depth map. For the local constraint, we first estimate gradient images in different directions of the desired high-resolution (HR) depth map, and then build a multi-directional gradient guided regularizer using these estimated gradient images to describe depth gradients with different orientations. Finally, the two complementary regularizers are cast into a unified optimization framework to obtain the desired HR image. The experimental results show that the proposed method can achieve better depth super-resolution performance than state-of-the-art methods.
由于消费者深度相机的出现,深度图在三维成像技术中得到了广泛的应用。然而,深度图的实际应用受到图像质量差的限制。本文提出了一种将深度图中的局部约束和非局部约束同时结合的单一深度图超分辨率框架。对于非局部约束,我们使用基于群的稀疏表示来探索深度图的非局部自相似度。对于局部约束,我们首先在期望的高分辨率(HR)深度图的不同方向估计梯度图像,然后利用这些估计的梯度图像构建多向梯度引导正则化器来描述不同方向的深度梯度。最后,将这两个互补的正则化器转换成一个统一的优化框架,以获得期望的HR图像。实验结果表明,该方法比现有方法具有更好的深度超分辨性能。
{"title":"Single depth map super-resolution via joint non-local and local modeling","authors":"Yingying Zhang, Chao Ren, Honggang Chen, Ce Zhu","doi":"10.1109/MMSP48831.2020.9287120","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287120","url":null,"abstract":"Depth maps are widely used in 3D imaging techniques because of the appearance of the consumer depth cameras. However, the practical application of the depth map is limited by the poor image quality. In this paper, we propose a novel framework for the single depth map super-resolution via joint the local and non-local constraints simultaneously in the depth map. For the non-local constraint, we use the group-based sparse representation to explore the non-local self-similarity of the depth map. For the local constraint, we first estimate gradient images in different directions of the desired high-resolution (HR) depth map, and then build a multi-directional gradient guided regularizer using these estimated gradient images to describe depth gradients with different orientations. Finally, the two complementary regularizers are cast into a unified optimization framework to obtain the desired HR image. The experimental results show that the proposed method can achieve better depth super-resolution performance than state-of-the-art methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"169 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120893131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Viewport Margins for 360-Degree Immersive Video 360度沉浸式视频的视口边缘
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287078
I. Curcio, Saba Ahsan
Immersive 360-degree video delivery is more and more widespread. New use cases are constantly emerging and make it a promising video technology for Extended Reality applications. Viewport Dependent Delivery (VDD) is an established technique used for saving network bit rate when transmitting omnidirectional video. One of the hardest challenges in VDD of 360-degree video is how to ensure that the video quality in the user’s viewport is always the highest possible, independent of the user’s head motion speed and span of motion. This paper introduces the concept of viewport margins. These can be understood as an extra high-quality spatial safety area around the user’s viewport. Viewport margins provide a better user experience for the receiver by reducing the Motion to High Quality Delay and the percentage of low-quality viewport seen by the user. We provide simulation results that show the advantage of using viewport margins for real-time low-delay VDD of 360-degree video. In particular, for a head motion of 90 degrees, using a 10-30% margins can reduce the percentage of viewport at low quality by 5-10% and using 30% margins reduces the motion to high quality delay to zero for head speeds up to 360 degrees per second, when the viewport feedback is sent every 33ms.
沉浸式360度视频传输越来越普遍。新的用例不断涌现,使其成为扩展现实应用中有前途的视频技术。视口相关传输(Viewport Dependent Delivery, VDD)是一种用于全向视频传输时节省网络比特率的成熟技术。在360度视频的VDD中,最困难的挑战之一是如何确保用户视口中的视频质量始终尽可能高,而不受用户头部运动速度和运动跨度的影响。本文介绍了视口边距的概念。这些可以被理解为围绕用户视口的额外高质量空间安全区域。视口边距通过减少运动到高质量延迟和用户看到的低质量视口的百分比,为接收者提供了更好的用户体验。我们提供的仿真结果显示了在360度视频的实时低延迟VDD中使用视口边缘的优势。特别是,对于90度的头部运动,使用10-30%的余量可以将低质量的视口百分比减少5-10%,使用30%的余量可以将高质量的运动延迟减少到零,当头部速度达到每秒360度时,当视口反馈每33毫秒发送一次。
{"title":"Viewport Margins for 360-Degree Immersive Video","authors":"I. Curcio, Saba Ahsan","doi":"10.1109/MMSP48831.2020.9287078","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287078","url":null,"abstract":"Immersive 360-degree video delivery is more and more widespread. New use cases are constantly emerging and make it a promising video technology for Extended Reality applications. Viewport Dependent Delivery (VDD) is an established technique used for saving network bit rate when transmitting omnidirectional video. One of the hardest challenges in VDD of 360-degree video is how to ensure that the video quality in the user’s viewport is always the highest possible, independent of the user’s head motion speed and span of motion. This paper introduces the concept of viewport margins. These can be understood as an extra high-quality spatial safety area around the user’s viewport. Viewport margins provide a better user experience for the receiver by reducing the Motion to High Quality Delay and the percentage of low-quality viewport seen by the user. We provide simulation results that show the advantage of using viewport margins for real-time low-delay VDD of 360-degree video. In particular, for a head motion of 90 degrees, using a 10-30% margins can reduce the percentage of viewport at low quality by 5-10% and using 30% margins reduces the motion to high quality delay to zero for head speeds up to 360 degrees per second, when the viewport feedback is sent every 33ms.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128288605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1