首页 > 最新文献

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)最新文献

英文 中文
Classical quadrature rules via Gaussian processes 高斯过程的经典正交规则
T. Karvonen, S. Särkkä
In an extension to some previous work on the topic, we show how all classical polynomial-based quadrature rules can be interpreted as Bayesian quadrature rules if the covariance kernel is selected suitably. As the resulting Bayesian quadrature rules have zero posterior integral variance, the results of this article are mostly of theoretical interest in clarifying the relationship between the two different approaches to numerical integration.
在对先前关于该主题的一些工作的扩展中,我们展示了如果选择合适的协方差核,如何将所有基于多项式的经典正交规则解释为贝叶斯正交规则。由于所得到的贝叶斯正交规则的后验积分方差为零,因此本文的结果在澄清两种不同的数值积分方法之间的关系方面主要具有理论意义。
{"title":"Classical quadrature rules via Gaussian processes","authors":"T. Karvonen, S. Särkkä","doi":"10.1109/MLSP.2017.8168195","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168195","url":null,"abstract":"In an extension to some previous work on the topic, we show how all classical polynomial-based quadrature rules can be interpreted as Bayesian quadrature rules if the covariance kernel is selected suitably. As the resulting Bayesian quadrature rules have zero posterior integral variance, the results of this article are mostly of theoretical interest in clarifying the relationship between the two different approaches to numerical integration.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"162 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75777821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR 语音增强是否能够实现端到端的ASR目标?:多通道端到端ASR的实验分析
Tsubasa Ochiai, Shinji Watanabe, S. Katagiri
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-todistortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.
最近,我们提出了一种新的多通道端到端语音识别体系结构,该体系结构将多通道语音增强和语音识别的组件集成到一个基于神经网络的体系结构中,并展示了其在自动语音识别(ASR)中的基本效用。然而,所提议的综合系统的行为仍然不够明确。一个悬而未决的问题是,语音增强组件是否真的获得了语音增强(噪声抑制)能力,因为它是基于端到端的ASR目标而不是语音增强目标进行优化的。本文利用CHiME-4语料库进行系统评价实验,解决了这一问题。我们首先通过观察两个信号级测量:信失真比和语音质量的感知评估,证明集成的端到端架构成功地获得了足够的语音增强能力,优于传统的替代方案(延迟和波束形成器)。我们的研究结果表明,为了进一步提高集成系统的性能,我们必须提高后期语音识别组件的功率。然而,可用的多通道噪声语音数据量不足。基于这些情况,我们接下来研究了使用大量单通道干净语音数据(例如WSJ语料库)对语音识别组件进行额外训练的效果。我们还表明,我们的干净语音方法显着提高了多通道端到端架构在多通道噪声ASR任务中的总体性能。
{"title":"Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR","authors":"Tsubasa Ochiai, Shinji Watanabe, S. Katagiri","doi":"10.1109/MLSP.2017.8168188","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168188","url":null,"abstract":"Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-todistortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"40 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76271262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Inferring room semantics using acoustic monitoring 利用声学监测推断房间语义
Muhammad A Shah, B. Raj, Khaled A. Harras
Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space over time, using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings, we update our confidence in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms at three university buildings. In our evaluation, the confidence for the true label generally outstripped the confidence for all other labels and in some cases converged to 100% with less than 30 samples.
了解用户的环境背景,即了解用户的室内位置及其环境的语义,可以促进许多位置感知应用程序的开发。在本文中,我们提出了一种声学监测技术,该技术可以使用室内空间的音频记录来推断室内空间随时间的语义知识。我们的技术利用这些空间的脉冲响应以及其中产生的环境声音来确定它们的语义标签。当我们处理更多的录音时,我们更新了对指定标签的信心。我们在三个大学建筑的不同类型房间中获得的单说话人语音记录数据集上评估了我们的技术。在我们的评估中,真实标签的置信度通常超过所有其他标签的置信度,并且在某些情况下,在少于30个样本的情况下收敛到100%。
{"title":"Inferring room semantics using acoustic monitoring","authors":"Muhammad A Shah, B. Raj, Khaled A. Harras","doi":"10.1109/MLSP.2017.8168153","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168153","url":null,"abstract":"Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space over time, using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings, we update our confidence in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms at three university buildings. In our evaluation, the confidence for the true label generally outstripped the confidence for all other labels and in some cases converged to 100% with less than 30 samples.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"18 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73853891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Hankel subspace method for efficient gesture representation 高效手势表示的Hankel子空间方法
B. Gatto, Anna Bogdanova, L. S. Souza, E. M. Santos
Gesture recognition technology provides multiple opportunities for direct human-computer interaction, without the use of additional external devices. As such, it had been an appealing research area in the field of computer vision. Many of its challenges are related to the complexity of human gestures, which may produce nonlinear distributions under different viewpoints. In this paper, we introduce a novel framework for gesture recognition, which achieves high discrimination of spatial and temporal information while significantly decreasing the computational cost. The proposed method consists of four stages. First, we generate an ordered subset of images from a gesture video, filtering out those that do not contribute to the recognition task. Second, we express spatial and temporal gesture information in a compact trajectory matrix. Then, we represent the obtained matrix as a subspace, achieving discriminative information, as the trajectory matrices derived from different gestures generate dissimilar clusters in a low dimension space. Finally, we apply soft weights to find the optimal dimension of each gesture subspace. We demonstrate practical and theoretical gains of our compact representation through experimental evaluation using two publicity available gesture datasets.
手势识别技术为直接人机交互提供了多种机会,而无需使用额外的外部设备。因此,它一直是计算机视觉领域一个很有吸引力的研究领域。它的许多挑战与人类手势的复杂性有关,这可能会在不同的视角下产生非线性分布。在本文中,我们引入了一种新的手势识别框架,在显著降低计算成本的同时,实现了对空间和时间信息的高度识别。该方法分为四个阶段。首先,我们从手势视频中生成一个有序的图像子集,过滤掉那些对识别任务没有贡献的图像。其次,我们用一个紧凑的轨迹矩阵来表达空间和时间的手势信息。然后,我们将得到的矩阵表示为一个子空间,获得判别信息,因为来自不同手势的轨迹矩阵在低维空间中产生不同的聚类。最后,我们应用软权重来找到每个手势子空间的最优维度。我们通过使用两个公开可用的手势数据集进行实验评估,证明了我们的紧凑表示的实践和理论收益。
{"title":"Hankel subspace method for efficient gesture representation","authors":"B. Gatto, Anna Bogdanova, L. S. Souza, E. M. Santos","doi":"10.1109/MLSP.2017.8168114","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168114","url":null,"abstract":"Gesture recognition technology provides multiple opportunities for direct human-computer interaction, without the use of additional external devices. As such, it had been an appealing research area in the field of computer vision. Many of its challenges are related to the complexity of human gestures, which may produce nonlinear distributions under different viewpoints. In this paper, we introduce a novel framework for gesture recognition, which achieves high discrimination of spatial and temporal information while significantly decreasing the computational cost. The proposed method consists of four stages. First, we generate an ordered subset of images from a gesture video, filtering out those that do not contribute to the recognition task. Second, we express spatial and temporal gesture information in a compact trajectory matrix. Then, we represent the obtained matrix as a subspace, achieving discriminative information, as the trajectory matrices derived from different gestures generate dissimilar clusters in a low dimension space. Finally, we apply soft weights to find the optimal dimension of each gesture subspace. We demonstrate practical and theoretical gains of our compact representation through experimental evaluation using two publicity available gesture datasets.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"37 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75278315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Unsupervised multiview learning with partial distribution information 具有部分分布信息的无监督多视图学习
Shashini De Silva, Jinsub Kim, R. Raich
We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated training instances is considered, wherein latent label variables for subsequently collected training instances form a first-order Markov chain. A convex optimization approach and expectation-maximization algorithms are presented to train classifiers. The efficacy of the proposed approach is validated using the experiments with the iris data and the MNIST handwritten digit data.
我们考虑了一种训练数据收集机制,其中,从已知的类条件分布中获取额外的特征,而不是用类标签注释每个训练实例。将真标签作为潜在变量,提出了一种基于这些无标签训练数据的最大似然方法来训练分类器。此外,考虑了相关训练实例的情况,其中随后收集的训练实例的潜在标签变量形成一阶马尔可夫链。提出了凸优化方法和期望最大化算法来训练分类器。通过虹膜数据和MNIST手写数字数据的实验验证了该方法的有效性。
{"title":"Unsupervised multiview learning with partial distribution information","authors":"Shashini De Silva, Jinsub Kim, R. Raich","doi":"10.1109/MLSP.2017.8168138","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168138","url":null,"abstract":"We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated training instances is considered, wherein latent label variables for subsequently collected training instances form a first-order Markov chain. A convex optimization approach and expectation-maximization algorithms are presented to train classifiers. The efficacy of the proposed approach is validated using the experiments with the iris data and the MNIST handwritten digit data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74547237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind source separation for nonstationary tensor-valued time series 非平稳张量值时间序列的盲源分离
Joni Virta, K. Nordhausen
Two standard assumptions of the classical blind source separation (BSS) theory are frequently violated by modern data sets. First, the majority of the existing methodology assumes vector-valued signals while data exhibiting a natural tensor structure is frequently observed. Second, many typical BSS applications exhibit serial dependence which is usually modeled using second order stationarity assumptions, which is however often quite unrealistic. To address these two issues we extend three existing methods of nonstationary blind source separation to tensor-valued time series. The resulting methods naturally factor in the tensor form of the observations without resorting to vectorization of the signals. Additionally, the methods allow for two types of nonstationarity, either the source series are blockwise second order weak stationary or their variances change smoothly in time. A simulation study and an application to video data show that the proposed extensions outperform their vectorial counterparts and successfully identify source series of interest.
经典盲源分离(BSS)理论的两个标准假设经常被现代数据集违背。首先,大多数现有方法假设向量值信号,而经常观察到表现出自然张量结构的数据。其次,许多典型的BSS应用程序表现出序列依赖性,通常使用二阶平稳性假设进行建模,然而这通常是非常不现实的。为了解决这两个问题,我们将现有的三种非平稳盲源分离方法推广到张量值时间序列。由此产生的方法自然地考虑了观测的张量形式,而不诉诸于信号的矢量化。此外,该方法允许两种类型的非平稳性,要么源序列是块二阶弱平稳,要么它们的方差随时间平滑变化。仿真研究和对视频数据的应用表明,所提出的扩展优于矢量扩展,能够成功地识别感兴趣的源序列。
{"title":"Blind source separation for nonstationary tensor-valued time series","authors":"Joni Virta, K. Nordhausen","doi":"10.1109/MLSP.2017.8168122","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168122","url":null,"abstract":"Two standard assumptions of the classical blind source separation (BSS) theory are frequently violated by modern data sets. First, the majority of the existing methodology assumes vector-valued signals while data exhibiting a natural tensor structure is frequently observed. Second, many typical BSS applications exhibit serial dependence which is usually modeled using second order stationarity assumptions, which is however often quite unrealistic. To address these two issues we extend three existing methods of nonstationary blind source separation to tensor-valued time series. The resulting methods naturally factor in the tensor form of the observations without resorting to vectorization of the signals. Additionally, the methods allow for two types of nonstationarity, either the source series are blockwise second order weak stationary or their variances change smoothly in time. A simulation study and an application to video data show that the proposed extensions outperform their vectorial counterparts and successfully identify source series of interest.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"62 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84017047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Neural network alternatives toconvolutive audio models for source separation 用于源分离的卷积音频模型的神经网络替代品
Shrikant Venkataramani, Cem Subakan, P. Smaragdis
Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSS eval metrics.
卷积非负矩阵分解模型使用具有时间维的频率模板来分解给定的音频频谱图。在本文中,我们提出了一个卷积自编码器模型,作为卷积NMF的神经网络替代方案。利用神经网络赋予的建模灵活性,我们还探索了在编码器中使用循环神经网络的想法。对TIMIT数据集混合语音的实验结果表明,从BSS评估指标来看,卷积架构在分离性能上有显著提高。
{"title":"Neural network alternatives toconvolutive audio models for source separation","authors":"Shrikant Venkataramani, Cem Subakan, P. Smaragdis","doi":"10.1109/MLSP.2017.8168108","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168108","url":null,"abstract":"Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSS eval metrics.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"65 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85488090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Deep convolutional neural networks for interpretable analysis of EEG sleep stage scoring 基于深度卷积神经网络的脑电图睡眠阶段评分可解释性分析
A. Vilamala, Kristoffer Hougaard Madsen, L. K. Hansen
Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out over the last years. In this work, we resort to multitaper spectral analysis to create visually interpretable images of sleep patterns from EEG signals as inputs to a deep convolutional network trained to solve visual recognition tasks. As a working example of transfer learning, a system able to accurately classify sleep stages in new unseen patients is presented. Evaluations in a widely-used publicly available dataset favourably compare to state-of-the-art results, while providing a framework for visual interpretation of outcomes.
睡眠研究对于诊断失眠、嗜睡症或睡眠呼吸暂停等睡眠障碍非常重要。他们依靠从原始的睡眠信号中手动对睡眠阶段进行评分,这是一项繁琐的视觉任务,需要训练有素的专业人员的工作量。因此,在过去的几年里,人们一直在努力研究基于机器学习技术的自动舞台评分。在这项工作中,我们采用多锥度频谱分析,从脑电图信号中创建视觉上可解释的睡眠模式图像,作为深度卷积网络的输入,训练用于解决视觉识别任务。作为迁移学习的一个工作示例,提出了一个能够准确地对未见过的新患者的睡眠阶段进行分类的系统。与最先进的结果相比,广泛使用的公开可用数据集中的评估具有优势,同时为结果的可视化解释提供了框架。
{"title":"Deep convolutional neural networks for interpretable analysis of EEG sleep stage scoring","authors":"A. Vilamala, Kristoffer Hougaard Madsen, L. K. Hansen","doi":"10.1109/MLSP.2017.8168133","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168133","url":null,"abstract":"Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out over the last years. In this work, we resort to multitaper spectral analysis to create visually interpretable images of sleep patterns from EEG signals as inputs to a deep convolutional network trained to solve visual recognition tasks. As a working example of transfer learning, a system able to accurately classify sleep stages in new unseen patients is presented. Evaluations in a widely-used publicly available dataset favourably compare to state-of-the-art results, while providing a framework for visual interpretation of outcomes.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74506339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Deep divergence-based clustering 基于深度发散的聚类
Michael C. Kampffmeyer, Sigurd Løkse, F. Bianchi, L. Livi, A. Salberg, R. Jenssen
A promising direction in deep learning research is to learn representations and simultaneously discover cluster structure in unlabeled data by optimizing a discriminative loss function. Contrary to supervised deep learning, this line of research is in its infancy and the design and optimization of a suitable loss function with the aim of training deep neural networks for clustering is still an open challenge. In this paper, we propose to leverage the discriminative power of information theoretic divergence measures, which have experienced success in traditional clustering, to develop a new deep clustering network. Our proposed loss function incorporates explicitly the geometry of the output space, and facilitates fully unsupervised training end-to-end. Experiments on real datasets show that the proposed algorithm achieves competitive performance with respect to other state-of-the-art methods.
通过优化判别损失函数来学习表示并同时发现未标记数据中的聚类结构是深度学习研究的一个有前途的方向。与监督深度学习相反,这方面的研究还处于起步阶段,设计和优化一个合适的损失函数,以训练用于聚类的深度神经网络仍然是一个开放的挑战。在本文中,我们提出利用在传统聚类中取得成功的信息论发散测度的判别能力来开发一种新的深度聚类网络。我们提出的损失函数明确地结合了输出空间的几何形状,并促进了端到端的完全无监督训练。在实际数据集上的实验表明,该算法与其他最先进的方法相比具有竞争力。
{"title":"Deep divergence-based clustering","authors":"Michael C. Kampffmeyer, Sigurd Løkse, F. Bianchi, L. Livi, A. Salberg, R. Jenssen","doi":"10.1109/MLSP.2017.8168158","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168158","url":null,"abstract":"A promising direction in deep learning research is to learn representations and simultaneously discover cluster structure in unlabeled data by optimizing a discriminative loss function. Contrary to supervised deep learning, this line of research is in its infancy and the design and optimization of a suitable loss function with the aim of training deep neural networks for clustering is still an open challenge. In this paper, we propose to leverage the discriminative power of information theoretic divergence measures, which have experienced success in traditional clustering, to develop a new deep clustering network. Our proposed loss function incorporates explicitly the geometry of the output space, and facilitates fully unsupervised training end-to-end. Experiments on real datasets show that the proposed algorithm achieves competitive performance with respect to other state-of-the-art methods.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85449374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Visualizing and improving scattering networks 可视化和改进散射网络
Fergal Cotter, N. Kingsbury
Scattering Transforms (or ScatterNets) introduced by Mallat in [1] are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) — the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.
Mallat在[1]中引入的散射变换(或ScatterNets)是创建一个定义良好的特征提取器以用于模式识别和图像分类任务的一个有希望的开始。由于它们与卷积神经网络(cnn)的架构相似,它们特别令人感兴趣,同时不需要参数学习并且仍然表现非常好(特别是在约束分类任务中)。在本文中,我们可视化了使用“散点网”时散点网的较深层对什么敏感。我们表明,高阶的ScatterNets对复杂的边缘模式(棋盘和波纹边缘)很敏感。这些复杂的模式可能对纹理分类有用,但与卷积神经网络(cnn)的第二层和第三层可视化的模式非常不同——卷积神经网络是目前最先进的图像分类器。我们认为这可能是目前ScatterNet和cnn之间性能差距的来源(在CIFAR-10上,ScatterNet+SVM与ResNet的性能差距为83% vs 93%)。然后,我们使用这些可视化工具对ScatterNet设计提出可能的增强,这表明它们具有提取更接近cnn的特征的能力,同时仍然定义良好并具有ScatterNets的基本不变性。
{"title":"Visualizing and improving scattering networks","authors":"Fergal Cotter, N. Kingsbury","doi":"10.1109/MLSP.2017.8168136","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168136","url":null,"abstract":"Scattering Transforms (or ScatterNets) introduced by Mallat in [1] are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) — the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"59 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79763828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1