首页 > 最新文献

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)最新文献

英文 中文
A comparative study of example-guided audio source separation approaches based on nonnegative matrix factorization 基于非负矩阵分解的实例引导音频源分离方法的比较研究
A. Ozerov, Srdan Kitic, P. Pérez
We consider example-guided audio source separation approaches, where the audio mixture to be separated is supplied with source examples that are assumed matching the sources in the mixture both in frequency and time. These approaches were successfully applied to the tasks such as source separation by humming, score-informed music source separation, and music source separation guided by covers. Most of proposed methods are based on nonnegative matrix factorization (NMF) and its variants, including methods using NMF models pre-trained from examples as an initialization of mixture NMF decomposition, methods using those models as hyperparameters of priors of mixture NMF decomposition, and methods using coupled NMF models. Moreover, those methods differ by the choice of the NMF divergence and the NMF prior. However, there is no systematic comparison of all these methods. In this work, we compare existing methods and some new variants on the score-informed and cover-guided source separation tasks.
我们考虑了示例引导的音频源分离方法,其中要分离的音频混合提供了假设在频率和时间上与混合中的源匹配的源示例。这些方法成功地应用于嗡嗡声源分离、乐谱通知音乐源分离和封面引导音乐源分离等任务。目前提出的方法大多基于非负矩阵分解(NMF)及其变体,包括使用从样本中预训练的NMF模型作为混合NMF分解的初始化方法、使用这些模型作为混合NMF分解先验的超参数方法以及使用耦合NMF模型的方法。此外,这些方法在NMF散度和NMF先验的选择上也有所不同。然而,这些方法并没有系统的比较。在这项工作中,我们比较了现有的方法和一些新的变体在分数通知和覆盖引导的源分离任务。
{"title":"A comparative study of example-guided audio source separation approaches based on nonnegative matrix factorization","authors":"A. Ozerov, Srdan Kitic, P. Pérez","doi":"10.1109/MLSP.2017.8168196","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168196","url":null,"abstract":"We consider example-guided audio source separation approaches, where the audio mixture to be separated is supplied with source examples that are assumed matching the sources in the mixture both in frequency and time. These approaches were successfully applied to the tasks such as source separation by humming, score-informed music source separation, and music source separation guided by covers. Most of proposed methods are based on nonnegative matrix factorization (NMF) and its variants, including methods using NMF models pre-trained from examples as an initialization of mixture NMF decomposition, methods using those models as hyperparameters of priors of mixture NMF decomposition, and methods using coupled NMF models. Moreover, those methods differ by the choice of the NMF divergence and the NMF prior. However, there is no systematic comparison of all these methods. In this work, we compare existing methods and some new variants on the score-informed and cover-guided source separation tasks.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"18 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81634470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text to image generative model using constrained embedding space mapping 文本到图像的约束嵌入空间映射生成模型
Subhajit Chaudhury, Sakyasingha Dasgupta, Asim Munawar, Md. A. Salam Khan, Ryuki Tachibana
We present a conditional generative method that maps low-dimensional embeddings of image and natural language to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. Based on this, we present a method to learn the conditional probability distribution of the two embedding spaces; first, by mapping them to a shared latent space and generating back the individual embeddings from this common space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick — wherein, the single shared latent space is replaced by two separate latent spaces. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the Euclidean distance between their distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors, thereby enabling the generation of specific colored images from the respective text data.
我们提出了一种条件生成方法,将图像和自然语言的低维嵌入映射到一个共同的潜在空间,从而提取它们之间的语义关系。首先提取特定于模态的嵌入,然后执行约束优化程序将两个嵌入空间投影到公共流形。在此基础上,提出了一种学习两个嵌入空间的条件概率分布的方法;首先,将它们映射到一个共享的潜在空间,并从这个公共空间生成单个嵌入。然而,为了使独立的条件推理能够从公共潜在空间表示中单独提取相应的嵌入,我们部署了一个代理变量技巧——其中,单个共享潜在空间被两个单独的潜在空间取代。我们设计了一个目标函数,这样,在训练过程中,我们可以通过最小化它们分布函数之间的欧几里得距离来迫使这些独立的空间彼此靠近。实验结果表明,学习到的联合模型可以泛化到具有附加颜色属性的双MNIST数字的学习概念,从而能够从各自的文本数据中生成特定的彩色图像。
{"title":"Text to image generative model using constrained embedding space mapping","authors":"Subhajit Chaudhury, Sakyasingha Dasgupta, Asim Munawar, Md. A. Salam Khan, Ryuki Tachibana","doi":"10.1109/MLSP.2017.8168111","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168111","url":null,"abstract":"We present a conditional generative method that maps low-dimensional embeddings of image and natural language to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. Based on this, we present a method to learn the conditional probability distribution of the two embedding spaces; first, by mapping them to a shared latent space and generating back the individual embeddings from this common space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick — wherein, the single shared latent space is replaced by two separate latent spaces. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the Euclidean distance between their distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors, thereby enabling the generation of specific colored images from the respective text data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"62 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85222872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Dictionary learning for pitch estimation in speech signals 语音信号中音调估计的字典学习
F. Huang, P. Balázs
This paper presents an automatic approach for parameter training for a sparsity-based pitch estimation method that has been previously published. For this pitch estimation method, the harmonic dictionary is a key parameter that needs to be carefully prepared beforehand. In the original method, extensive human supervision and involvement are required to construct and label the dictionary. In this study, we propose to employ dictionary learning algorithms to learn the dictionary directly from training data. We apply and compare 3 typical dictionary learning algorithms, i.e., the method of optimized directions (MOD), K-SVD and online dictionary learning (ODL), and propose a post-processing method to label and adapt a learned dictionary for pitch estimation. Results show that MOD and properly initialized ODL (pi-ODL) can lead to dictionaries that exhibit the desired harmonic structures for pitch estimation, and the post-processing method can significantly improve performance of the learned dictionaries in pitch estimation. The dictionary obtained with pi-ODL and post-processing attained pitch estimation accuracy close to the optimal performance of the manual dictionary. It is positively shown that dictionary learning is feasible and promising for this application.
本文提出了一种基于稀疏性的基音估计方法的参数自动训练方法。对于这种基音估计方法,谐波字典是一个需要事先精心准备的关键参数。在最初的方法中,需要大量的人工监督和参与来构建和标记词典。在本研究中,我们提出使用字典学习算法直接从训练数据中学习字典。我们应用并比较了优化方向法(MOD)、K-SVD和在线字典学习(ODL) 3种典型的字典学习算法,并提出了一种后处理方法来标记和调整学习到的字典用于音高估计。结果表明,MOD和适当初始化的ODL (pi-ODL)可以得到具有所需谐波结构的字典,并且后处理方法可以显著提高学习到的字典在基音估计中的性能。使用pi-ODL和后处理获得的字典获得的基音估计精度接近手动字典的最佳性能。结果表明,字典学习在这一应用中是可行的。
{"title":"Dictionary learning for pitch estimation in speech signals","authors":"F. Huang, P. Balázs","doi":"10.1109/MLSP.2017.8168173","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168173","url":null,"abstract":"This paper presents an automatic approach for parameter training for a sparsity-based pitch estimation method that has been previously published. For this pitch estimation method, the harmonic dictionary is a key parameter that needs to be carefully prepared beforehand. In the original method, extensive human supervision and involvement are required to construct and label the dictionary. In this study, we propose to employ dictionary learning algorithms to learn the dictionary directly from training data. We apply and compare 3 typical dictionary learning algorithms, i.e., the method of optimized directions (MOD), K-SVD and online dictionary learning (ODL), and propose a post-processing method to label and adapt a learned dictionary for pitch estimation. Results show that MOD and properly initialized ODL (pi-ODL) can lead to dictionaries that exhibit the desired harmonic structures for pitch estimation, and the post-processing method can significantly improve performance of the learned dictionaries in pitch estimation. The dictionary obtained with pi-ODL and post-processing attained pitch estimation accuracy close to the optimal performance of the manual dictionary. It is positively shown that dictionary learning is feasible and promising for this application.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"49 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88947433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic plant identification using stem automata 使用茎自动机的自动植物识别
Kan Li, Ying Ma, J. Príncipe
In this paper, we propose a novel approach to automatically identify plant species using dynamics of plant growth and development or spatiotemporal evolution model (STEM). The online kernel adaptive autoregressive-moving-average (KAARMA) algorithm, a discrete-time dynamical system in the kernel reproducing Hilbert space (RKHS), is used to learn plant-development syntactic patterns from feature-vector sequences automatically extracted from 2D plant images, generated by stochastic L-systems. Results show multiclass KAARMA STEM can automatically identify plant species based on growth patterns. Furthermore, finite state machines extracted from trained KAARMA STEM retains competitive performance and are robust to noise. Automatically constructing an L-system or formal grammar to replicate a spatiotemporal structure is an open problem. This is an important first step to not only identify plants but also to generate realistic plant models automatically from observations.
本文提出了一种利用植物生长发育动态或时空演化模型(STEM)自动识别植物物种的新方法。在线核自适应自回归移动平均(KAARMA)算法是核再现希尔伯特空间(RKHS)中的一个离散时间动力系统,用于从随机l系统生成的二维植物图像中自动提取的特征向量序列中学习植物发育语法模式。结果表明,多类KAARMA STEM能够基于生长模式自动识别植物物种。此外,从训练好的KAARMA STEM中提取的有限状态机保持了竞争性能,并且对噪声具有鲁棒性。自动构建l系统或形式语法来复制时空结构是一个开放的问题。这是重要的第一步,不仅可以识别植物,而且可以根据观测自动生成真实的植物模型。
{"title":"Automatic plant identification using stem automata","authors":"Kan Li, Ying Ma, J. Príncipe","doi":"10.1109/MLSP.2017.8168147","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168147","url":null,"abstract":"In this paper, we propose a novel approach to automatically identify plant species using dynamics of plant growth and development or spatiotemporal evolution model (STEM). The online kernel adaptive autoregressive-moving-average (KAARMA) algorithm, a discrete-time dynamical system in the kernel reproducing Hilbert space (RKHS), is used to learn plant-development syntactic patterns from feature-vector sequences automatically extracted from 2D plant images, generated by stochastic L-systems. Results show multiclass KAARMA STEM can automatically identify plant species based on growth patterns. Furthermore, finite state machines extracted from trained KAARMA STEM retains competitive performance and are robust to noise. Automatically constructing an L-system or formal grammar to replicate a spatiotemporal structure is an open problem. This is an important first step to not only identify plants but also to generate realistic plant models automatically from observations.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"21 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78059955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Texture classification from single uncalibrated images: Random matrix theory approach 单张未校准图像的纹理分类:随机矩阵理论方法
E. Nadimi, J. Herp, M. M. Buijs, V. Blanes-Vidal
We studied the problem of classifying textured-materials from their single-imaged appearance, under general viewing and illumination conditions, using the theory of random matrices. To evaluate the performance of our algorithm, two distinct databases of images were used: The CUReT database and our database of colorectal polyp images collected from patients undergoing colon capsule endoscopy for early cancer detection. During the learning stage, our classifier algorithm established the universality laws for the empirical spectral density of the largest singular value and normalized largest singular value of the image intensity matrix adapted to the eigenvalues of the information-plus-noise model. We showed that these two densities converge to the generalized extreme value (GEV-Frechet) and Gaussian G1 distribution with rate O(N1/2), respectively. To validate the algorithm, we introduced a set of unseen images to the algorithm. Misclassification rate of approximately 1%–6%, depending on the database, was obtained, which is superior to the reported values of 5%–45% in previous research studies.
我们研究了在一般视觉和光照条件下,利用随机矩阵理论从纹理材料的单图像外观进行分类的问题。为了评估我们的算法的性能,我们使用了两个不同的图像数据库:CUReT数据库和我们的结肠息肉图像数据库,这些图像来自于接受结肠胶囊内窥镜检查以进行早期癌症检测的患者。在学习阶段,我们的分类器算法建立了适应于信息加噪声模型特征值的图像强度矩阵的最大奇异值和归一化最大奇异值的经验谱密度的通用性规律。我们证明了这两个密度分别收敛于广义极值(GEV-Frechet)和高斯G1分布,速率为0 (N1/2)。为了验证算法,我们引入了一组未见过的图像到算法中。根据数据库的不同,得到的误分类率约为1%-6%,优于以往研究报告的5%-45%。
{"title":"Texture classification from single uncalibrated images: Random matrix theory approach","authors":"E. Nadimi, J. Herp, M. M. Buijs, V. Blanes-Vidal","doi":"10.1109/MLSP.2017.8168115","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168115","url":null,"abstract":"We studied the problem of classifying textured-materials from their single-imaged appearance, under general viewing and illumination conditions, using the theory of random matrices. To evaluate the performance of our algorithm, two distinct databases of images were used: The CUReT database and our database of colorectal polyp images collected from patients undergoing colon capsule endoscopy for early cancer detection. During the learning stage, our classifier algorithm established the universality laws for the empirical spectral density of the largest singular value and normalized largest singular value of the image intensity matrix adapted to the eigenvalues of the information-plus-noise model. We showed that these two densities converge to the generalized extreme value (GEV-Frechet) and Gaussian G1 distribution with rate O(N1/2), respectively. To validate the algorithm, we introduced a set of unseen images to the algorithm. Misclassification rate of approximately 1%–6%, depending on the database, was obtained, which is superior to the reported values of 5%–45% in previous research studies.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"34 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87945035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Correntropy induced metric based common spatial patterns 基于共同空间模式的相关熵诱导度量
J. Dong, Badong Chen, N. Lu, Haixian Wang, Nanning Zheng
Common spatial patterns (CSP) is a widely used method in the field of electroencephalogram (EEG) signal processing. The goal of CSP is to find spatial filters that maximize the ratio between the variances of two classes. The conventional CSP is however sensitive to outliers because it is based on the L2-norm. Inspired by the correntropy induced metric (CIM), we propose in this work a new algorithm, called CIM based CSP (CSP-CIM), to improve the robustness of CSP with respect to outliers. The CSP-CIM searches the optimal solution by a simple gradient based iterative algorithm. A toy example and a real EEG dataset are used to demonstrate the desirable performance of the new method.
共同空间模式(CSP)是脑电图(EEG)信号处理领域中应用广泛的一种方法。CSP的目标是找到最大化两个类的方差之比的空间过滤器。然而,传统的CSP对异常值很敏感,因为它是基于l2规范的。受相关熵诱导度量(CIM)的启发,我们提出了一种新的算法,称为基于CIM的CSP (CSP-CIM),以提高CSP对异常值的鲁棒性。CSP-CIM通过一种简单的基于梯度的迭代算法来搜索最优解。通过一个玩具样例和一个真实的脑电数据集来验证新方法的良好性能。
{"title":"Correntropy induced metric based common spatial patterns","authors":"J. Dong, Badong Chen, N. Lu, Haixian Wang, Nanning Zheng","doi":"10.1109/MLSP.2017.8168132","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168132","url":null,"abstract":"Common spatial patterns (CSP) is a widely used method in the field of electroencephalogram (EEG) signal processing. The goal of CSP is to find spatial filters that maximize the ratio between the variances of two classes. The conventional CSP is however sensitive to outliers because it is based on the L2-norm. Inspired by the correntropy induced metric (CIM), we propose in this work a new algorithm, called CIM based CSP (CSP-CIM), to improve the robustness of CSP with respect to outliers. The CSP-CIM searches the optimal solution by a simple gradient based iterative algorithm. A toy example and a real EEG dataset are used to demonstrate the desirable performance of the new method.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81882011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Improving image classification with frequency domain layers for feature extraction 改进图像分类的频域层特征提取
J. Stuchi, M. A. Angeloni, R. F. Pereira, L. Boccato, G. Folego, Paulo V. S. Prado, R. Attux
Machine learning has been increasingly used in current days. Great improvements, especially in deep neural networks, helped to boost the achievable performance in computer vision and signal processing applications. Although different techniques were applied for deep architectures, the frequency domain has not been thoroughly explored in this field. In this context, this paper presents a new method for extracting discriminative features according to the Fourier analysis. The proposed frequency extractor layer can be combined with deep architectures in order to improve image classification. Computational experiments were performed on face liveness detection problem, yielding better results than those presented in the literature for the grandtest protocol of Replay-Attack Database. This paper also aims to raise the discussion on how frequency domain layers can be used in deep architectures to further improve the network performance.
如今,机器学习的应用越来越广泛。巨大的改进,特别是在深度神经网络方面,有助于提高计算机视觉和信号处理应用的可实现性能。尽管在深度体系结构中应用了不同的技术,但在该领域的频域尚未得到充分的探索。在此背景下,本文提出了一种基于傅里叶分析的判别特征提取新方法。所提出的频率提取层可以与深度结构相结合,以提高图像分类能力。对人脸活动性检测问题进行了计算实验,对于重放攻击数据库的最优协议,得到了比文献中更好的结果。本文还提出了如何在深度架构中使用频域层以进一步提高网络性能的讨论。
{"title":"Improving image classification with frequency domain layers for feature extraction","authors":"J. Stuchi, M. A. Angeloni, R. F. Pereira, L. Boccato, G. Folego, Paulo V. S. Prado, R. Attux","doi":"10.1109/MLSP.2017.8168168","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168168","url":null,"abstract":"Machine learning has been increasingly used in current days. Great improvements, especially in deep neural networks, helped to boost the achievable performance in computer vision and signal processing applications. Although different techniques were applied for deep architectures, the frequency domain has not been thoroughly explored in this field. In this context, this paper presents a new method for extracting discriminative features according to the Fourier analysis. The proposed frequency extractor layer can be combined with deep architectures in order to improve image classification. Computational experiments were performed on face liveness detection problem, yielding better results than those presented in the literature for the grandtest protocol of Replay-Attack Database. This paper also aims to raise the discussion on how frequency domain layers can be used in deep architectures to further improve the network performance.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90542996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Semi-Blind speech enhancement basedon recurrent neural network for source separation and dereverberation 基于递归神经网络的半盲语音增强源分离与去噪
Masaya Wake, Yoshiaki Bando, M. Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
This paper describes a semi-blind speech enhancement method using a semi-blind recurrent neural network (SB-RNN) for human-robot speech interaction. When a robot interacts with a human using speech signals, the robot inputs not only audio signals recorded by its own microphone but also speech signals made by the robot itself, which can be used for semi-blind speech enhancement. The SB-RNN consists of cascaded two modules: a semi-blind source separation module and a blind dereverberation module. Each module has a recurrent layer to capture the temporal correlations of speech signals. The SB-RNN is trained in a manner of multi-task learning, i.e., isolated echoic speech signals are used as teacher signals for the output of the separation module in addition to isolated unechoic signals for the output of the dereverberation module. Experimental results showed that the source to distortion ratio was improved by 2.30 dB on average compared to a conventional method based on a semi-blind independent component analysis. The results also showed the effectiveness of modularization of the network, multi-task learning, the recurrent structure, and semi-blind source separation.
本文提出了一种利用半盲递归神经网络(SB-RNN)进行人机语音交互的半盲语音增强方法。当机器人使用语音信号与人进行交互时,机器人不仅输入自身麦克风录制的音频信号,还输入机器人自身发出的语音信号,可用于半盲语音增强。SB-RNN由级联的两个模块组成:半盲源分离模块和盲去噪模块。每个模块都有一个循环层来捕获语音信号的时间相关性。SB-RNN采用多任务学习的方式进行训练,即在分离模块的输出中使用孤立的回声语音信号作为教师信号,在去噪模块的输出中使用孤立的无回声信号。实验结果表明,与基于半盲独立分量分析的传统方法相比,源失真比平均提高了2.30 dB。结果还显示了网络模块化、多任务学习、循环结构和半盲源分离的有效性。
{"title":"Semi-Blind speech enhancement basedon recurrent neural network for source separation and dereverberation","authors":"Masaya Wake, Yoshiaki Bando, M. Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara","doi":"10.1109/MLSP.2017.8168191","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168191","url":null,"abstract":"This paper describes a semi-blind speech enhancement method using a semi-blind recurrent neural network (SB-RNN) for human-robot speech interaction. When a robot interacts with a human using speech signals, the robot inputs not only audio signals recorded by its own microphone but also speech signals made by the robot itself, which can be used for semi-blind speech enhancement. The SB-RNN consists of cascaded two modules: a semi-blind source separation module and a blind dereverberation module. Each module has a recurrent layer to capture the temporal correlations of speech signals. The SB-RNN is trained in a manner of multi-task learning, i.e., isolated echoic speech signals are used as teacher signals for the output of the separation module in addition to isolated unechoic signals for the output of the dereverberation module. Experimental results showed that the source to distortion ratio was improved by 2.30 dB on average compared to a conventional method based on a semi-blind independent component analysis. The results also showed the effectiveness of modularization of the network, multi-task learning, the recurrent structure, and semi-blind source separation.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"98 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76982569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Bayesian forecasting and anomaly detection framework for vehicular monitoring networks 车辆监测网络的贝叶斯预测与异常检测框架
Maria Scalabrin, Matteo Gadaleta, Riccardo Bonetto, M. Rossi
In this paper, we are concerned with the automated and runtime analysis of vehicular data from large scale traffic monitoring networks. This problem is tackled through localized and small-size Bayesian networks (BNs), which are utilized to capture the spatio-temporal relationships underpinning traffic data from nearby road links. A dedicated BN is set up, trained, and tested for each road in the monitored geographical map. The joint probability distribution between the cause nodes and the effect node in the BN is tracked through a Gaussian Mixture Model (GMM), whose parameters are estimated via Bayesian Variational Inference (BVI). Forecasting and anomaly detection are performed on statistical measures derived at runtime by the trained GMMs. Our design choices lead to several advantages: the approach is scalable as a small-size BN is associated with and independently trained for each road and the localized nature of the framework allows flagging atypical behaviors at their point of origin in the monitored geographical map. The effectiveness of the proposed framework is tested using a large dataset from a real network deployment, comparing its prediction performance with that of selected regression algorithms from the literature, while also quantifying its anomaly detection capabilities.
在本文中,我们关注的是大规模交通监控网络中车辆数据的自动运行分析。这个问题是通过局部和小尺寸贝叶斯网络(BNs)来解决的,该网络用于捕获附近道路连接的交通数据的时空关系。为监测的地理地图中的每条道路设置、训练和测试一个专用的网络。通过高斯混合模型(GMM)跟踪网络中原因节点和效果节点之间的联合概率分布,并通过贝叶斯变分推理(BVI)估计其参数。预测和异常检测是在运行时由训练好的gmm导出的统计度量上执行的。我们的设计选择带来了几个优势:该方法是可扩展的,因为小型BN与每条道路相关联并独立训练,框架的局域性允许在监测的地理地图的起源点标记非典型行为。使用来自真实网络部署的大型数据集测试了所提出框架的有效性,将其预测性能与文献中选择的回归算法进行了比较,同时量化了其异常检测能力。
{"title":"A Bayesian forecasting and anomaly detection framework for vehicular monitoring networks","authors":"Maria Scalabrin, Matteo Gadaleta, Riccardo Bonetto, M. Rossi","doi":"10.1109/MLSP.2017.8168151","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168151","url":null,"abstract":"In this paper, we are concerned with the automated and runtime analysis of vehicular data from large scale traffic monitoring networks. This problem is tackled through localized and small-size Bayesian networks (BNs), which are utilized to capture the spatio-temporal relationships underpinning traffic data from nearby road links. A dedicated BN is set up, trained, and tested for each road in the monitored geographical map. The joint probability distribution between the cause nodes and the effect node in the BN is tracked through a Gaussian Mixture Model (GMM), whose parameters are estimated via Bayesian Variational Inference (BVI). Forecasting and anomaly detection are performed on statistical measures derived at runtime by the trained GMMs. Our design choices lead to several advantages: the approach is scalable as a small-size BN is associated with and independently trained for each road and the localized nature of the framework allows flagging atypical behaviors at their point of origin in the monitored geographical map. The effectiveness of the proposed framework is tested using a large dataset from a real network deployment, comparing its prediction performance with that of selected regression algorithms from the literature, while also quantifying its anomaly detection capabilities.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"123 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83473001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Unsupervised domain adaptation with copula models copula模型的无监督域自适应
Cuong D. Tran, Ognjen Rudovic, V. Pavlovic
We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predictive densities beyond the common exponential family; (b) we show how to leverage Sklar's theorem, the essence of the copula formulation relating the joint density to the copula dependency functions, to find effective feature mappings that mitigate the domain mismatch. By transforming the data to a copula domain, we show on a number of benchmark datasets (including human emotion estimation), and using different regression models for prediction, that we can achieve a more robust and accurate estimation of target labels, compared to recently proposed feature transformation (adaptation) methods.
我们研究了无监督域自适应任务,即在训练过程中不提供目标域的标记数据。为了处理源分布和目标分布在特征和标签上的潜在差异,我们利用了基于copula的回归框架。这种方法的好处是双重的:(a)它允许我们在普通指数族之外建立更大范围的条件预测密度模型;(b)我们展示了如何利用Sklar定理,即联结密度与联结依赖函数相关的联结公式的本质,来找到减轻域不匹配的有效特征映射。通过将数据转换到一个copula域,我们展示了一些基准数据集(包括人类情感估计),并使用不同的回归模型进行预测,与最近提出的特征转换(自适应)方法相比,我们可以实现更鲁棒和准确的目标标签估计。
{"title":"Unsupervised domain adaptation with copula models","authors":"Cuong D. Tran, Ognjen Rudovic, V. Pavlovic","doi":"10.1109/MLSP.2017.8168131","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168131","url":null,"abstract":"We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predictive densities beyond the common exponential family; (b) we show how to leverage Sklar's theorem, the essence of the copula formulation relating the joint density to the copula dependency functions, to find effective feature mappings that mitigate the domain mismatch. By transforming the data to a copula domain, we show on a number of benchmark datasets (including human emotion estimation), and using different regression models for prediction, that we can achieve a more robust and accurate estimation of target labels, compared to recently proposed feature transformation (adaptation) methods.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"64 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76760070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1