首页 > 最新文献

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文 中文
Optimizing Video Quality Estimation Across Resolutions 优化视频质量估计跨分辨率
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287116
Abhinau K. Venkataramanan, Chengyang Wu, A. Bovik
Many algorithms have been developed to evaluate the perceptual quality of images and videos, based on models of picture statistics and visual perception. These algorithms attempt to capture user experience better than simple metrics like the peak signal-to-noise ratio (PSNR) and are widely utilized on streaming service platforms and in social networking applications to improve users’ Quality of Experience. The growing demand for high-resolution streams and rapid increases in user-generated content (UGC) sharpens interest in the computation involved in carrying out perceptual quality measurements. In this direction, we propose a suite of methods to efficiently predict the structural similarity index (SSIM) of high-resolution videos distorted by scaling and compression, from computations performed at lower resolutions. We show the effectiveness of our algorithms by testing on a large corpus of videos and on subjective data.
基于图像统计和视觉感知模型,已经开发了许多算法来评估图像和视频的感知质量。这些算法试图比峰值信噪比(PSNR)等简单指标更好地捕捉用户体验,并广泛应用于流媒体服务平台和社交网络应用中,以提高用户的体验质量。对高分辨率流的不断增长的需求和用户生成内容(UGC)的快速增长,提高了对执行感知质量测量所涉及的计算的兴趣。在这个方向上,我们提出了一套方法来有效地预测高分辨率视频的结构相似指数(SSIM)被缩放和压缩,从较低分辨率进行的计算。我们通过在大量视频语料库和主观数据上进行测试来证明我们算法的有效性。
{"title":"Optimizing Video Quality Estimation Across Resolutions","authors":"Abhinau K. Venkataramanan, Chengyang Wu, A. Bovik","doi":"10.1109/MMSP48831.2020.9287116","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287116","url":null,"abstract":"Many algorithms have been developed to evaluate the perceptual quality of images and videos, based on models of picture statistics and visual perception. These algorithms attempt to capture user experience better than simple metrics like the peak signal-to-noise ratio (PSNR) and are widely utilized on streaming service platforms and in social networking applications to improve users’ Quality of Experience. The growing demand for high-resolution streams and rapid increases in user-generated content (UGC) sharpens interest in the computation involved in carrying out perceptual quality measurements. In this direction, we propose a suite of methods to efficiently predict the structural similarity index (SSIM) of high-resolution videos distorted by scaling and compression, from computations performed at lower resolutions. We show the effectiveness of our algorithms by testing on a large corpus of videos and on subjective data.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133406451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Online Multiple Object Tracking Using Single Object Tracker and Maximum Weight Clique Graph 基于单目标跟踪和最大权值团图的在线多目标跟踪
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287090
Yujie Hu, Xiang Zhang, Yexin Li, Ran Tian
Tracking multiple objects is a challenging task in time-critical video analysis systems. In the popular tracking-by-detection framework, the core problems of a tracker are the quality of the employed input detections and the effectiveness of the data association. Towards this end, we propose a multiple object tracking method which employs a single object tracker to improve the results of unreliable detection and data association simultaneously. Besides, we utilize maximum weight clique graph algorithm to handle the optimal assignment in an online mode. In our method, a robust single object tracker is used to connect previous tracked objects to tackle the current noise detection and improve the data association as a motion cue. Furthermore, we use person re-identification network to learn the historical appearances of the tracklets in order to promote the tracker’s identification ability. We conduct extensive experiments on the MOT benchmark to demonstrate the effectiveness of our tracker.
在时间紧迫的视频分析系统中,跟踪多个目标是一项具有挑战性的任务。在流行的检测跟踪框架中,跟踪器的核心问题是所采用的输入检测的质量和数据关联的有效性。为此,我们提出了一种采用单目标跟踪器的多目标跟踪方法,以同时改善不可靠检测和数据关联的结果。此外,我们利用最大权值团图算法来处理在线模式下的最优分配。在我们的方法中,使用一个鲁棒的单目标跟踪器连接先前跟踪的目标来解决当前的噪声检测问题,并改善数据关联作为运动线索。此外,为了提高跟踪器的识别能力,我们使用人再识别网络来学习跟踪器的历史外观。我们在MOT基准上进行了大量的实验,以证明我们的跟踪器的有效性。
{"title":"Online Multiple Object Tracking Using Single Object Tracker and Maximum Weight Clique Graph","authors":"Yujie Hu, Xiang Zhang, Yexin Li, Ran Tian","doi":"10.1109/MMSP48831.2020.9287090","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287090","url":null,"abstract":"Tracking multiple objects is a challenging task in time-critical video analysis systems. In the popular tracking-by-detection framework, the core problems of a tracker are the quality of the employed input detections and the effectiveness of the data association. Towards this end, we propose a multiple object tracking method which employs a single object tracker to improve the results of unreliable detection and data association simultaneously. Besides, we utilize maximum weight clique graph algorithm to handle the optimal assignment in an online mode. In our method, a robust single object tracker is used to connect previous tracked objects to tackle the current noise detection and improve the data association as a motion cue. Furthermore, we use person re-identification network to learn the historical appearances of the tracklets in order to promote the tracker’s identification ability. We conduct extensive experiments on the MOT benchmark to demonstrate the effectiveness of our tracker.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116765774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMSP 2020 List Reviewer Page MMSP 2020列表审查页面
Pub Date : 2020-09-21 DOI: 10.1109/mmsp48831.2020.9287101
{"title":"MMSP 2020 List Reviewer Page","authors":"","doi":"10.1109/mmsp48831.2020.9287101","DOIUrl":"https://doi.org/10.1109/mmsp48831.2020.9287101","url":null,"abstract":"","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123858930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding-Energy Optimal Video Encoding For x265 解码能量最佳视频编码为x265
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287054
Christian Herglotz, Marco Bader, Kristian Fischer, A. Kaup
This paper presents optimal x265-encoder configurations and an enhanced optimization algorithm for minimizing the software decoding energy of HEVC-coded videos. We reach this goal with two contributions. First, we perform a detailed analysis on the influence of various encoder settings on the decoding energy. Second, we include an enhanced version of an algorithm called decoding-energy-rate-distortion optimization into x265, which we optimize for fast and efficient encoding. This algorithm introduces the estimated decoding energy as an additional optimization criterion into the rate-distortion cost function. We evaluate the extended encoder in terms of bitrate, distortion, and decoding energy, where we perform energy measurements to prove the superior energy efficiency. We find that the combination of the ‘fastdecoding’ tuning option of x265 with the enhanced decoding-energy-rate-distortion optimization leads to 27.2% and 26.0% of decoding energy savings for OpenHEVC and HM decoding, respectively. At the same time, compression efficiency losses of 38.2% and negligible decreases in encoder runtime of 0.39% can be observed.
本文提出了最佳的x265编码器配置和一种增强的优化算法,以最大限度地减少hevc编码视频的软件解码能量。我们通过两项贡献实现了这一目标。首先,我们详细分析了各种编码器设置对解码能量的影响。其次,我们在x265中加入了一种称为解码能量率失真优化的算法的增强版本,我们对其进行了优化,以实现快速高效的编码。该算法在码率失真代价函数中引入估计解码能量作为附加优化准则。我们在比特率、失真和解码能量方面评估了扩展编码器,并进行了能量测量以证明其优越的能量效率。我们发现x265的“快速解码”调优选项与增强的解码能量率失真优化相结合,分别为OpenHEVC和HM解码节省了27.2%和26.0%的解码能量。同时,压缩效率损失38.2%,编码器运行时间减少0.39%,可以忽略不计。
{"title":"Decoding-Energy Optimal Video Encoding For x265","authors":"Christian Herglotz, Marco Bader, Kristian Fischer, A. Kaup","doi":"10.1109/MMSP48831.2020.9287054","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287054","url":null,"abstract":"This paper presents optimal x265-encoder configurations and an enhanced optimization algorithm for minimizing the software decoding energy of HEVC-coded videos. We reach this goal with two contributions. First, we perform a detailed analysis on the influence of various encoder settings on the decoding energy. Second, we include an enhanced version of an algorithm called decoding-energy-rate-distortion optimization into x265, which we optimize for fast and efficient encoding. This algorithm introduces the estimated decoding energy as an additional optimization criterion into the rate-distortion cost function. We evaluate the extended encoder in terms of bitrate, distortion, and decoding energy, where we perform energy measurements to prove the superior energy efficiency. We find that the combination of the ‘fastdecoding’ tuning option of x265 with the enhanced decoding-energy-rate-distortion optimization leads to 27.2% and 26.0% of decoding energy savings for OpenHEVC and HM decoding, respectively. At the same time, compression efficiency losses of 38.2% and negligible decreases in encoder runtime of 0.39% can be observed.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122272611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On Maximum A Posteriori Approximation of Hidden Markov Models for Proportional Data 比例数据隐马尔可夫模型的最大后验逼近
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287112
Samr Ali, N. Bouguila
Hidden Markov models (HMM) have recently risen as a key generative machine learning approach for time series data study and analysis. While early works focused only on applying HMMs for speech recognition, HMMs are now prominent in various fields such as video classification and genomics. In this paper, we develop a Maximum A Posteriori framework for learning the Generalized Dirichlet HMMs that have been proposed recently as an efficient way for modeling sequential proportional data. In contrast to the conventional Baum Welch algorithm, commonly used for learning HMMs, the proposed algorithm places priors for the learning of the desired parameters; hence, regularizing the estimation process. We validate our proposed approach on a challenging video processing application; namely, dynamic texture classification.
隐马尔可夫模型(HMM)最近已成为时间序列数据研究和分析的关键生成机器学习方法。虽然早期的工作只集中于将hmm应用于语音识别,但hmm现在在视频分类和基因组学等各个领域都很突出。在本文中,我们开发了一个极大a后验框架来学习广义狄利克雷hmm,这是最近提出的一种有效的顺序比例数据建模方法。与通常用于学习hmm的传统Baum Welch算法相反,该算法为所需参数的学习设置了先验;因此,对估计过程进行规范化。我们在一个具有挑战性的视频处理应用中验证了我们提出的方法;即动态纹理分类。
{"title":"On Maximum A Posteriori Approximation of Hidden Markov Models for Proportional Data","authors":"Samr Ali, N. Bouguila","doi":"10.1109/MMSP48831.2020.9287112","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287112","url":null,"abstract":"Hidden Markov models (HMM) have recently risen as a key generative machine learning approach for time series data study and analysis. While early works focused only on applying HMMs for speech recognition, HMMs are now prominent in various fields such as video classification and genomics. In this paper, we develop a Maximum A Posteriori framework for learning the Generalized Dirichlet HMMs that have been proposed recently as an efficient way for modeling sequential proportional data. In contrast to the conventional Baum Welch algorithm, commonly used for learning HMMs, the proposed algorithm places priors for the learning of the desired parameters; hence, regularizing the estimation process. We validate our proposed approach on a challenging video processing application; namely, dynamic texture classification.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127921950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reverberant Audio Blind Source Separation via Local Convolutive Independent Vector Analysis 基于局部卷积独立矢量分析的混响音频盲源分离
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287144
Fangchen Feng, Azeddine Beghdadi
In this paper, we propose a new formulation for the blind source separation problem for audio signals with convolutive mixtures to improve the separation performance of Independent Vector Analysis (IVA). The proposed method benefits from both the recently investigated convolutive approximation model and the IVA approaches that take advantages of the cross-band information to avoid permutation alignment. We first exploit the link between the IVA and the Sparse Component Analysis (SCA) methods through the structured sparsity. We then propose a new framework by combining the convolutive narrowband approximation and the Windowed-Group-Lasso (WGL). The optimisation of the model is based on the alternating optimisation approach where the convolutive kernel and the source components are jointly optimised.
为了提高独立矢量分析(IVA)的分离性能,本文提出了一种新的卷积混合音频信号盲源分离公式。该方法得益于最近研究的卷积近似模型和利用跨带信息避免排列对齐的IVA方法。我们首先通过结构化稀疏性挖掘了IVA和稀疏成分分析(SCA)方法之间的联系。然后,我们提出了一个将卷积窄带近似与窗口群lasso (WGL)相结合的新框架。模型的优化是基于交替优化方法,其中卷积核和源组件联合优化。
{"title":"Reverberant Audio Blind Source Separation via Local Convolutive Independent Vector Analysis","authors":"Fangchen Feng, Azeddine Beghdadi","doi":"10.1109/MMSP48831.2020.9287144","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287144","url":null,"abstract":"In this paper, we propose a new formulation for the blind source separation problem for audio signals with convolutive mixtures to improve the separation performance of Independent Vector Analysis (IVA). The proposed method benefits from both the recently investigated convolutive approximation model and the IVA approaches that take advantages of the cross-band information to avoid permutation alignment. We first exploit the link between the IVA and the Sparse Component Analysis (SCA) methods through the structured sparsity. We then propose a new framework by combining the convolutive narrowband approximation and the Windowed-Group-Lasso (WGL). The optimisation of the model is based on the alternating optimisation approach where the convolutive kernel and the source components are jointly optimised.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130845161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binaural Rendering From Distributed Microphone Signals Considering Loudspeaker Distance in Measurements 测量中考虑扬声器距离的分布式传声器信号双耳渲染
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287157
Naoto Iijima, Shoichi Koyama, H. Saruwatari
A method of binaural rendering from distributed microphone recordings that takes loudspeaker distance for measuring head-related transfer function (HRTF) into consideration is proposed. In general, to reproduce the binaural signals from the signals captured by multiple microphones in the recording area, the captured sound field is represented by plane-wave decomposition. Thus, HRTF is approximated as a transfer function from a plane-wave source in binaural rendering. To incorporate the distance in HRTF measurements, we propose a method based on the spherical-wave decomposition of a sound field, in which the HRTF is assumed to be measured from a point source. Result of experiments using HRTFs calculated by the boundary element method indicated that the accuracy of binaural signal reproduction by the proposed method based on the spherical-wave decomposition was higher than that by the plane-wave-decomposition-based method. We also evaluate the performance of signal conversion from distributed microphone measurements into binaural signals.
提出了一种考虑扬声器距离测量头部相关传递函数(HRTF)的分布式传声器录音双耳渲染方法。一般来说,为了从录音区域内多个麦克风捕获的信号中再现双耳信号,捕获的声场用平面波分解表示。因此,在双耳渲染中,HRTF近似为来自平面波源的传递函数。为了在HRTF测量中考虑距离,我们提出了一种基于声场球波分解的方法,其中假设HRTF是从点源测量的。用边界元法计算HRTFs的实验结果表明,基于球波分解的双耳信号再现精度高于基于平面波分解的双耳信号再现精度。我们还评估了从分布式麦克风测量信号转换成双耳信号的性能。
{"title":"Binaural Rendering From Distributed Microphone Signals Considering Loudspeaker Distance in Measurements","authors":"Naoto Iijima, Shoichi Koyama, H. Saruwatari","doi":"10.1109/MMSP48831.2020.9287157","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287157","url":null,"abstract":"A method of binaural rendering from distributed microphone recordings that takes loudspeaker distance for measuring head-related transfer function (HRTF) into consideration is proposed. In general, to reproduce the binaural signals from the signals captured by multiple microphones in the recording area, the captured sound field is represented by plane-wave decomposition. Thus, HRTF is approximated as a transfer function from a plane-wave source in binaural rendering. To incorporate the distance in HRTF measurements, we propose a method based on the spherical-wave decomposition of a sound field, in which the HRTF is assumed to be measured from a point source. Result of experiments using HRTFs calculated by the boundary element method indicated that the accuracy of binaural signal reproduction by the proposed method based on the spherical-wave decomposition was higher than that by the plane-wave-decomposition-based method. We also evaluate the performance of signal conversion from distributed microphone measurements into binaural signals.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115144293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Wavelet Scattering Transform and CNN for Closed Set Speaker Identification 小波散射变换和CNN用于闭集说话人识别
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287061
Wajdi Ghezaiel, L. Brun, O. Lézoray
In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples.
在实际应用中,说话人识别系统的性能会由于语音数量和质量的减少而下降。为此,我们提出了一个说话人识别系统,其中使用少量训练示例的短话语来识别人。因此,只使用非常少量的数据,涉及一个2-4秒的句子。为了实现这一点,我们提出了一种新颖的原始波形端到端卷积神经网络(CNN),用于文本无关的说话人识别。我们使用小波散射变换作为CNN网络第一层的固定初始化,并以监督的方式学习剩余的层。实验表明,结合小波散射变换和CNN的混合结构,即使训练样本数量少、持续时间短,也能成功地进行高效的说话人特征提取。
{"title":"Wavelet Scattering Transform and CNN for Closed Set Speaker Identification","authors":"Wajdi Ghezaiel, L. Brun, O. Lézoray","doi":"10.1109/MMSP48831.2020.9287061","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287061","url":null,"abstract":"In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129910242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Real-Time Frequency Selective Reconstruction through Register-Based Argmax Calculation 基于寄存器的Argmax计算实时频率选择重构
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287071
Andy Regensky, Simon Grosche, Jürgen Seiler, A. Kaup
Frequency Selective Reconstruction (FSR) is a state-of-the-art algorithm for solving diverse image reconstruction tasks, where a subset of pixel values in the image is missing. However, it entails a high computational complexity due to its iterative, blockwise procedure to reconstruct the missing pixel values. Although the complexity of FSR can be considerably decreased by performing its computations in the frequency domain, the reconstruction procedure still takes multiple seconds up to multiple minutes depending on the parameterization. However, FSR has the potential for a massive parallelization greatly improving its reconstruction time. In this paper, we introduce a novel highly parallelized formulation of FSR adapted to the capabilities of modern GPUs and propose a considerably accelerated calculation of the inherent argmax calculation. Altogether, we achieve a 100-fold speed-up, which enables the usage of FSR for real-time applications.
频率选择性重建(FSR)是一种最先进的算法,用于解决各种图像重建任务,其中图像中的像素值子集缺失。然而,由于它是迭代的、逐块的过程来重建缺失的像素值,因此需要很高的计算复杂度。尽管在频域进行FSR计算可以大大降低其复杂性,但根据参数化的不同,重建过程仍然需要数秒到数分钟。然而,FSR具有大规模并行化的潜力,大大提高了其重建时间。在本文中,我们介绍了一种适应现代gpu能力的新的高度并行化FSR公式,并提出了固有argmax计算的显着加速计算。总的来说,我们实现了100倍的加速,这使得FSR可以用于实时应用程序。
{"title":"Real-Time Frequency Selective Reconstruction through Register-Based Argmax Calculation","authors":"Andy Regensky, Simon Grosche, Jürgen Seiler, A. Kaup","doi":"10.1109/MMSP48831.2020.9287071","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287071","url":null,"abstract":"Frequency Selective Reconstruction (FSR) is a state-of-the-art algorithm for solving diverse image reconstruction tasks, where a subset of pixel values in the image is missing. However, it entails a high computational complexity due to its iterative, blockwise procedure to reconstruct the missing pixel values. Although the complexity of FSR can be considerably decreased by performing its computations in the frequency domain, the reconstruction procedure still takes multiple seconds up to multiple minutes depending on the parameterization. However, FSR has the potential for a massive parallelization greatly improving its reconstruction time. In this paper, we introduce a novel highly parallelized formulation of FSR adapted to the capabilities of modern GPUs and propose a considerably accelerated calculation of the inherent argmax calculation. Altogether, we achieve a 100-fold speed-up, which enables the usage of FSR for real-time applications.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125353308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Evolutionary-based Generative Approach for Audio Data Augmentation 基于进化的音频数据增强生成方法
Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287156
Silvan Mertes, Alice Baird, Dominik Schiller, Björn Schuller, E. André
In this paper, we introduce a novel framework to augment raw audio data for machine learning classification tasks. For the first part of our framework, we employ a generative adversarial network (GAN) to create new variants of the audio samples that are already existing in our source dataset for the classification task. In the second step, we then utilize an evolutionary algorithm to search the input domain space of the previously trained GAN, with respect to predefined characteristics of the generated audio. This way we are able to generate audio in a controlled manner that contributes to an improvement in classification performance of the original task. To validate our approach, we chose to test it on the task of soundscape classification. We show that our approach leads to a substantial improvement in classification results when compared to a training routine without data augmentation and training with uncontrolled data augmentation with GANs.
在本文中,我们引入了一个新的框架来增强原始音频数据用于机器学习分类任务。对于我们框架的第一部分,我们使用生成对抗网络(GAN)来为分类任务创建源数据集中已经存在的音频样本的新变体。在第二步中,我们利用进化算法来搜索先前训练的GAN的输入域空间,相对于生成音频的预定义特征。通过这种方式,我们能够以一种可控的方式生成音频,这有助于提高原始任务的分类性能。为了验证我们的方法,我们选择在音景分类任务上进行测试。我们表明,与没有数据增强的训练常规和使用gan进行无控制数据增强的训练相比,我们的方法在分类结果上有了实质性的改进。
{"title":"An Evolutionary-based Generative Approach for Audio Data Augmentation","authors":"Silvan Mertes, Alice Baird, Dominik Schiller, Björn Schuller, E. André","doi":"10.1109/MMSP48831.2020.9287156","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287156","url":null,"abstract":"In this paper, we introduce a novel framework to augment raw audio data for machine learning classification tasks. For the first part of our framework, we employ a generative adversarial network (GAN) to create new variants of the audio samples that are already existing in our source dataset for the classification task. In the second step, we then utilize an evolutionary algorithm to search the input domain space of the previously trained GAN, with respect to predefined characteristics of the generated audio. This way we are able to generate audio in a controlled manner that contributes to an improvement in classification performance of the original task. To validate our approach, we chose to test it on the task of soundscape classification. We show that our approach leads to a substantial improvement in classification results when compared to a training routine without data augmentation and training with uncontrolled data augmentation with GANs.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128642548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1