Pub Date : 2017-12-07DOI: 10.1109/MLSP.2017.8168195
T. Karvonen, S. Särkkä
In an extension to some previous work on the topic, we show how all classical polynomial-based quadrature rules can be interpreted as Bayesian quadrature rules if the covariance kernel is selected suitably. As the resulting Bayesian quadrature rules have zero posterior integral variance, the results of this article are mostly of theoretical interest in clarifying the relationship between the two different approaches to numerical integration.
{"title":"Classical quadrature rules via Gaussian processes","authors":"T. Karvonen, S. Särkkä","doi":"10.1109/MLSP.2017.8168195","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168195","url":null,"abstract":"In an extension to some previous work on the topic, we show how all classical polynomial-based quadrature rules can be interpreted as Bayesian quadrature rules if the covariance kernel is selected suitably. As the resulting Bayesian quadrature rules have zero posterior integral variance, the results of this article are mostly of theoretical interest in clarifying the relationship between the two different approaches to numerical integration.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"162 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75777821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-05DOI: 10.1109/MLSP.2017.8168188
Tsubasa Ochiai, Shinji Watanabe, S. Katagiri
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-todistortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.
{"title":"Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR","authors":"Tsubasa Ochiai, Shinji Watanabe, S. Katagiri","doi":"10.1109/MLSP.2017.8168188","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168188","url":null,"abstract":"Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open question is whether the speech enhancement component really gains speech enhancement (noise suppression) ability, because it is optimized based on end-to-end ASR objectives instead of speech enhancement objectives. In this paper, we solve this question by conducting systematic evaluation experiments using the CHiME-4 corpus. We first show that the integrated end-to-end architecture successfully obtains adequate speech enhancement ability that is superior to that of a conventional alternative (a delay-and-sum beamformer) by observing two signal-level measures: the signal-todistortion ratio and the perceptual evaluation of speech quality. Our findings suggest that to further increase the performances of an integrated system, we must boost the power of the latter-stage speech recognition component. However, an insufficient amount of multichannel noisy speech data is available. Based on these situations, we next investigate the effect of using a large amount of single-channel clean speech data, e.g., the WSJ corpus, for additional training of the speech recognition component. We also show that our approach with clean speech significantly improves the total performance of multichannel end-to-end architecture in the multichannel noisy ASR tasks.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"40 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76271262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168153
Muhammad A Shah, B. Raj, Khaled A. Harras
Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space over time, using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings, we update our confidence in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms at three university buildings. In our evaluation, the confidence for the true label generally outstripped the confidence for all other labels and in some cases converged to 100% with less than 30 samples.
{"title":"Inferring room semantics using acoustic monitoring","authors":"Muhammad A Shah, B. Raj, Khaled A. Harras","doi":"10.1109/MLSP.2017.8168153","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168153","url":null,"abstract":"Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space over time, using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings, we update our confidence in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms at three university buildings. In our evaluation, the confidence for the true label generally outstripped the confidence for all other labels and in some cases converged to 100% with less than 30 samples.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"18 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73853891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168114
B. Gatto, Anna Bogdanova, L. S. Souza, E. M. Santos
Gesture recognition technology provides multiple opportunities for direct human-computer interaction, without the use of additional external devices. As such, it had been an appealing research area in the field of computer vision. Many of its challenges are related to the complexity of human gestures, which may produce nonlinear distributions under different viewpoints. In this paper, we introduce a novel framework for gesture recognition, which achieves high discrimination of spatial and temporal information while significantly decreasing the computational cost. The proposed method consists of four stages. First, we generate an ordered subset of images from a gesture video, filtering out those that do not contribute to the recognition task. Second, we express spatial and temporal gesture information in a compact trajectory matrix. Then, we represent the obtained matrix as a subspace, achieving discriminative information, as the trajectory matrices derived from different gestures generate dissimilar clusters in a low dimension space. Finally, we apply soft weights to find the optimal dimension of each gesture subspace. We demonstrate practical and theoretical gains of our compact representation through experimental evaluation using two publicity available gesture datasets.
{"title":"Hankel subspace method for efficient gesture representation","authors":"B. Gatto, Anna Bogdanova, L. S. Souza, E. M. Santos","doi":"10.1109/MLSP.2017.8168114","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168114","url":null,"abstract":"Gesture recognition technology provides multiple opportunities for direct human-computer interaction, without the use of additional external devices. As such, it had been an appealing research area in the field of computer vision. Many of its challenges are related to the complexity of human gestures, which may produce nonlinear distributions under different viewpoints. In this paper, we introduce a novel framework for gesture recognition, which achieves high discrimination of spatial and temporal information while significantly decreasing the computational cost. The proposed method consists of four stages. First, we generate an ordered subset of images from a gesture video, filtering out those that do not contribute to the recognition task. Second, we express spatial and temporal gesture information in a compact trajectory matrix. Then, we represent the obtained matrix as a subspace, achieving discriminative information, as the trajectory matrices derived from different gestures generate dissimilar clusters in a low dimension space. Finally, we apply soft weights to find the optimal dimension of each gesture subspace. We demonstrate practical and theoretical gains of our compact representation through experimental evaluation using two publicity available gesture datasets.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"37 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75278315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168138
Shashini De Silva, Jinsub Kim, R. Raich
We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated training instances is considered, wherein latent label variables for subsequently collected training instances form a first-order Markov chain. A convex optimization approach and expectation-maximization algorithms are presented to train classifiers. The efficacy of the proposed approach is validated using the experiments with the iris data and the MNIST handwritten digit data.
{"title":"Unsupervised multiview learning with partial distribution information","authors":"Shashini De Silva, Jinsub Kim, R. Raich","doi":"10.1109/MLSP.2017.8168138","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168138","url":null,"abstract":"We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated training instances is considered, wherein latent label variables for subsequently collected training instances form a first-order Markov chain. A convex optimization approach and expectation-maximization algorithms are presented to train classifiers. The efficacy of the proposed approach is validated using the experiments with the iris data and the MNIST handwritten digit data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74547237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168122
Joni Virta, K. Nordhausen
Two standard assumptions of the classical blind source separation (BSS) theory are frequently violated by modern data sets. First, the majority of the existing methodology assumes vector-valued signals while data exhibiting a natural tensor structure is frequently observed. Second, many typical BSS applications exhibit serial dependence which is usually modeled using second order stationarity assumptions, which is however often quite unrealistic. To address these two issues we extend three existing methods of nonstationary blind source separation to tensor-valued time series. The resulting methods naturally factor in the tensor form of the observations without resorting to vectorization of the signals. Additionally, the methods allow for two types of nonstationarity, either the source series are blockwise second order weak stationary or their variances change smoothly in time. A simulation study and an application to video data show that the proposed extensions outperform their vectorial counterparts and successfully identify source series of interest.
{"title":"Blind source separation for nonstationary tensor-valued time series","authors":"Joni Virta, K. Nordhausen","doi":"10.1109/MLSP.2017.8168122","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168122","url":null,"abstract":"Two standard assumptions of the classical blind source separation (BSS) theory are frequently violated by modern data sets. First, the majority of the existing methodology assumes vector-valued signals while data exhibiting a natural tensor structure is frequently observed. Second, many typical BSS applications exhibit serial dependence which is usually modeled using second order stationarity assumptions, which is however often quite unrealistic. To address these two issues we extend three existing methods of nonstationary blind source separation to tensor-valued time series. The resulting methods naturally factor in the tensor form of the observations without resorting to vectorization of the signals. Additionally, the methods allow for two types of nonstationarity, either the source series are blockwise second order weak stationary or their variances change smoothly in time. A simulation study and an application to video data show that the proposed extensions outperform their vectorial counterparts and successfully identify source series of interest.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"62 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84017047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168108
Shrikant Venkataramani, Cem Subakan, P. Smaragdis
Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSS eval metrics.
{"title":"Neural network alternatives toconvolutive audio models for source separation","authors":"Shrikant Venkataramani, Cem Subakan, P. Smaragdis","doi":"10.1109/MLSP.2017.8168108","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168108","url":null,"abstract":"Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSS eval metrics.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"65 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85488090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168133
A. Vilamala, Kristoffer Hougaard Madsen, L. K. Hansen
Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out over the last years. In this work, we resort to multitaper spectral analysis to create visually interpretable images of sleep patterns from EEG signals as inputs to a deep convolutional network trained to solve visual recognition tasks. As a working example of transfer learning, a system able to accurately classify sleep stages in new unseen patients is presented. Evaluations in a widely-used publicly available dataset favourably compare to state-of-the-art results, while providing a framework for visual interpretation of outcomes.
{"title":"Deep convolutional neural networks for interpretable analysis of EEG sleep stage scoring","authors":"A. Vilamala, Kristoffer Hougaard Madsen, L. K. Hansen","doi":"10.1109/MLSP.2017.8168133","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168133","url":null,"abstract":"Sleep studies are important for diagnosing sleep disorders such as insomnia, narcolepsy or sleep apnea. They rely on manual scoring of sleep stages from raw polisomnography signals, which is a tedious visual task requiring the workload of highly trained professionals. Consequently, research efforts to purse for an automatic stage scoring based on machine learning techniques have been carried out over the last years. In this work, we resort to multitaper spectral analysis to create visually interpretable images of sleep patterns from EEG signals as inputs to a deep convolutional network trained to solve visual recognition tasks. As a working example of transfer learning, a system able to accurately classify sleep stages in new unseen patients is presented. Evaluations in a widely-used publicly available dataset favourably compare to state-of-the-art results, while providing a framework for visual interpretation of outcomes.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74506339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168158
Michael C. Kampffmeyer, Sigurd Løkse, F. Bianchi, L. Livi, A. Salberg, R. Jenssen
A promising direction in deep learning research is to learn representations and simultaneously discover cluster structure in unlabeled data by optimizing a discriminative loss function. Contrary to supervised deep learning, this line of research is in its infancy and the design and optimization of a suitable loss function with the aim of training deep neural networks for clustering is still an open challenge. In this paper, we propose to leverage the discriminative power of information theoretic divergence measures, which have experienced success in traditional clustering, to develop a new deep clustering network. Our proposed loss function incorporates explicitly the geometry of the output space, and facilitates fully unsupervised training end-to-end. Experiments on real datasets show that the proposed algorithm achieves competitive performance with respect to other state-of-the-art methods.
{"title":"Deep divergence-based clustering","authors":"Michael C. Kampffmeyer, Sigurd Løkse, F. Bianchi, L. Livi, A. Salberg, R. Jenssen","doi":"10.1109/MLSP.2017.8168158","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168158","url":null,"abstract":"A promising direction in deep learning research is to learn representations and simultaneously discover cluster structure in unlabeled data by optimizing a discriminative loss function. Contrary to supervised deep learning, this line of research is in its infancy and the design and optimization of a suitable loss function with the aim of training deep neural networks for clustering is still an open challenge. In this paper, we propose to leverage the discriminative power of information theoretic divergence measures, which have experienced success in traditional clustering, to develop a new deep clustering network. Our proposed loss function incorporates explicitly the geometry of the output space, and facilitates fully unsupervised training end-to-end. Experiments on real datasets show that the proposed algorithm achieves competitive performance with respect to other state-of-the-art methods.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85449374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168136
Fergal Cotter, N. Kingsbury
Scattering Transforms (or ScatterNets) introduced by Mallat in [1] are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) — the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.
Mallat在[1]中引入的散射变换(或ScatterNets)是创建一个定义良好的特征提取器以用于模式识别和图像分类任务的一个有希望的开始。由于它们与卷积神经网络(cnn)的架构相似,它们特别令人感兴趣,同时不需要参数学习并且仍然表现非常好(特别是在约束分类任务中)。在本文中,我们可视化了使用“散点网”时散点网的较深层对什么敏感。我们表明,高阶的ScatterNets对复杂的边缘模式(棋盘和波纹边缘)很敏感。这些复杂的模式可能对纹理分类有用,但与卷积神经网络(cnn)的第二层和第三层可视化的模式非常不同——卷积神经网络是目前最先进的图像分类器。我们认为这可能是目前ScatterNet和cnn之间性能差距的来源(在CIFAR-10上,ScatterNet+SVM与ResNet的性能差距为83% vs 93%)。然后,我们使用这些可视化工具对ScatterNet设计提出可能的增强,这表明它们具有提取更接近cnn的特征的能力,同时仍然定义良好并具有ScatterNets的基本不变性。
{"title":"Visualizing and improving scattering networks","authors":"Fergal Cotter, N. Kingsbury","doi":"10.1109/MLSP.2017.8168136","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168136","url":null,"abstract":"Scattering Transforms (or ScatterNets) introduced by Mallat in [1] are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a ‘DeScatterNet’. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) — the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"59 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79763828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}