Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854018
C. Hegde, Aswin C. Sankaranarayanan, Richard Baraniuk
We consider the efficient acquisition, parameter estimation, and recovery of signal ensembles that lie on a low-dimensional manifold in a high-dimensional ambient signal space. Our particular focus is on randomized, compressive acquisition of signals from the manifold generated by the transformation of a base signal by operators from a Lie group. Such manifolds factor prominently in a number of applications, including radar and sonar array processing, camera arrays, and video processing. Leveraging the fact that Lie group manifolds admit a convenient analytical characterization, we develop new theory and algorithms for: (1) estimating the Lie operator parameters from compressive measurements, and (2) recovering the base signal from compressive measurements. We validate our approach with several of numerical simulations, including the reconstruction of an affine-transformed video sequence from compressive measurements.
{"title":"LIE operators for compressive sensing","authors":"C. Hegde, Aswin C. Sankaranarayanan, Richard Baraniuk","doi":"10.1109/ICASSP.2014.6854018","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854018","url":null,"abstract":"We consider the efficient acquisition, parameter estimation, and recovery of signal ensembles that lie on a low-dimensional manifold in a high-dimensional ambient signal space. Our particular focus is on randomized, compressive acquisition of signals from the manifold generated by the transformation of a base signal by operators from a Lie group. Such manifolds factor prominently in a number of applications, including radar and sonar array processing, camera arrays, and video processing. Leveraging the fact that Lie group manifolds admit a convenient analytical characterization, we develop new theory and algorithms for: (1) estimating the Lie operator parameters from compressive measurements, and (2) recovering the base signal from compressive measurements. We validate our approach with several of numerical simulations, including the reconstruction of an affine-transformed video sequence from compressive measurements.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"2342-2346"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91284213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854108
K. Priya, K. Manasa, Sumohana S. Channappayya
Sparsity-based Distance Measure (SDM), a sparse reconstruction-based image similarity measure was recently proposed and shown to have promising applications in image classification, clustering and retrieval. In this paper, we present a statistical evaluation of SDM's performance as an image quality assessment (IQA) algorithm. This evaluation is carried out on the LIVE image database. We show that the SDM performs fairly in comparison with the state-of-the-art while possessing several attractive properties. Specifically, we demonstrate its robustness to rotation (90°, 180°), scaling, and combinations of distortions - properties that are highly desirable of any IQA algorithm.
{"title":"A statistical evaluation of Sparsity-based Distance Measure (SDM) as an image quality assessment algorithm","authors":"K. Priya, K. Manasa, Sumohana S. Channappayya","doi":"10.1109/ICASSP.2014.6854108","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854108","url":null,"abstract":"Sparsity-based Distance Measure (SDM), a sparse reconstruction-based image similarity measure was recently proposed and shown to have promising applications in image classification, clustering and retrieval. In this paper, we present a statistical evaluation of SDM's performance as an image quality assessment (IQA) algorithm. This evaluation is carried out on the LIVE image database. We show that the SDM performs fairly in comparison with the state-of-the-art while possessing several attractive properties. Specifically, we demonstrate its robustness to rotation (90°, 180°), scaling, and combinations of distortions - properties that are highly desirable of any IQA algorithm.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"2789-2792"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89848384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854053
J. Dennis, T. H. Dat, Haizhou Li, Chng Eng Siong
Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.
{"title":"A discriminatively trained Hough Transform for frame-level phoneme recognition","authors":"J. Dennis, T. H. Dat, Haizhou Li, Chng Eng Siong","doi":"10.1109/ICASSP.2014.6854053","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854053","url":null,"abstract":"Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 12 1","pages":"2514-2518"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83841263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854643
V. Schwarz, Gabor Hannak, G. Matz
Average consensus is a well-studied method for distributed averaging. The convergence properties of average consensus depend on the averaging weights. Examples for commonly used weight designs are Metropolis-Hastings (MH) weights and constant weights. In this paper, we provide a complete convergence analysis for a generalized MH weight design that encompasses conventional MH as special case. More specifically, we formulate sufficient and necessary conditions for convergence. A main conclusion is that AC with MH weights is guaranteed to converge unless the underlying network is a regular bipartite graph.
{"title":"On the convergence of average consensus with generalized metropolis-hasting weights","authors":"V. Schwarz, Gabor Hannak, G. Matz","doi":"10.1109/ICASSP.2014.6854643","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854643","url":null,"abstract":"Average consensus is a well-studied method for distributed averaging. The convergence properties of average consensus depend on the averaging weights. Examples for commonly used weight designs are Metropolis-Hastings (MH) weights and constant weights. In this paper, we provide a complete convergence analysis for a generalized MH weight design that encompasses conventional MH as special case. More specifically, we formulate sufficient and necessary conditions for convergence. A main conclusion is that AC with MH weights is guaranteed to converge unless the underlying network is a regular bipartite graph.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 1","pages":"5442-5446"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74962987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854930
Xiao-Lei Zhang
The mismatching problem between the training and test speech corpora hinders the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem by the unsupervised domain adaptation techniques, which try to find a shared feature subspace between the mismatching corpora. The denoising deep neural network is used as the learning machine. Three domain adaptation techniques are used for analysis. Experimental results show that the unsupervised domain adaptation technique is promising to the mismatching problem of VAD.
{"title":"Unsupervised domain adaptation for deep neural network based voice activity detection","authors":"Xiao-Lei Zhang","doi":"10.1109/ICASSP.2014.6854930","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854930","url":null,"abstract":"The mismatching problem between the training and test speech corpora hinders the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem by the unsupervised domain adaptation techniques, which try to find a shared feature subspace between the mismatching corpora. The denoising deep neural network is used as the learning machine. Three domain adaptation techniques are used for analysis. Experimental results show that the unsupervised domain adaptation technique is promising to the mismatching problem of VAD.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"140 1","pages":"6864-6868"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78535753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854160
Tetsuya Kawase, Masanori Takehara, S. Tamura, S. Hayamizu, Ryuhei Tenmoku, T. Kurata
In this paper, we propose to use staying area data toward the estimation of serving time for customers. To classify utterances enables us to estimate conversation types between speakers. However, its performance becomes lower in real environments. We propose a method using area data with sound data to solve this problem. We also propose a method to estimate the conversation types using the decision trees. They were tested with the data recorded in a Japanese restaurant. In the experiment to classify utterances, the proposed method performed better than the method using only sound data. In the experiment to estimate the conversation types, we succeeded to recover 70% of the mis-classified conversations using both of sound and area data.
{"title":"Improvement of utterance clustering by using employees' sound and area data","authors":"Tetsuya Kawase, Masanori Takehara, S. Tamura, S. Hayamizu, Ryuhei Tenmoku, T. Kurata","doi":"10.1109/ICASSP.2014.6854160","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854160","url":null,"abstract":"In this paper, we propose to use staying area data toward the estimation of serving time for customers. To classify utterances enables us to estimate conversation types between speakers. However, its performance becomes lower in real environments. We propose a method using area data with sound data to solve this problem. We also propose a method to estimate the conversation types using the decision trees. They were tested with the data recorded in a Japanese restaurant. In the experiment to classify utterances, the proposed method performed better than the method using only sound data. In the experiment to estimate the conversation types, we succeeded to recover 70% of the mis-classified conversations using both of sound and area data.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"3047-3051"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79973280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854040
Enming Luo, Stanley H. Chan, Truong Q. Nguyen
Classical image denoising algorithms based on single noisy images and generic image databases will soon reach their performance limits. In this paper, we propose to denoise images using targeted external image databases. Formulating denoising as an optimal filter design problem, we utilize the targeted databases to (1) determine the basis functions of the optimal filter by means of group sparsity; (2) determine the spectral coefficients of the optimal filter by means of localized priors. For a variety of scenarios such as text images, multiview images, and face images, we demonstrate superior denoising results over existing algorithms.
{"title":"Image denoising by targeted external databases","authors":"Enming Luo, Stanley H. Chan, Truong Q. Nguyen","doi":"10.1109/ICASSP.2014.6854040","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854040","url":null,"abstract":"Classical image denoising algorithms based on single noisy images and generic image databases will soon reach their performance limits. In this paper, we propose to denoise images using targeted external image databases. Formulating denoising as an optimal filter design problem, we utilize the targeted databases to (1) determine the basis functions of the optimal filter by means of group sparsity; (2) determine the spectral coefficients of the optimal filter by means of localized priors. For a variety of scenarios such as text images, multiview images, and face images, we demonstrate superior denoising results over existing algorithms.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"119 1","pages":"2450-2454"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91536582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854578
Irene Camino, U. Zölzer
Nowadays, many different image processing applications are of high interest to maritime authorities because of security reasons. Depending on the application, different kinds of images are employed. The extraction of ship silhouettes requires high resolution images in order to obtain accurate results. However, when the characteristics of the naval environment are visible the background complexity increases greatly and automatic approaches fail. In order to overcome these difficulties we propose an automatic initialization for graph segmentation techniques. A comparative study of earlier suggested initializations for different graph segmentation techniques is also presented. It shows that, under such unfavorable image conditions, finding the proper initialization in an automatic way is not trivial. Yet, the precision and recall achieved by our initialization are considerable higher regardless the graph segmentation. Furthermore, the performance is highly increased since the best results are obtained after only the first iteration.
{"title":"Automatic initialization for naval application of graph segmentation techniques: A comparative study","authors":"Irene Camino, U. Zölzer","doi":"10.1109/ICASSP.2014.6854578","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854578","url":null,"abstract":"Nowadays, many different image processing applications are of high interest to maritime authorities because of security reasons. Depending on the application, different kinds of images are employed. The extraction of ship silhouettes requires high resolution images in order to obtain accurate results. However, when the characteristics of the naval environment are visible the background complexity increases greatly and automatic approaches fail. In order to overcome these difficulties we propose an automatic initialization for graph segmentation techniques. A comparative study of earlier suggested initializations for different graph segmentation techniques is also presented. It shows that, under such unfavorable image conditions, finding the proper initialization in an automatic way is not trivial. Yet, the precision and recall achieved by our initialization are considerable higher regardless the graph segmentation. Furthermore, the performance is highly increased since the best results are obtained after only the first iteration.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"5120-5124"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87714610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854454
Kehuang Li, Zhen Huang, You-Chi Cheng, Chin-Hui Lee
We propose a maximal figure-of-merit (MFoM) learning framework to directly maximize mean average precision (MAP) which is a key performance metric in many multi-class classification tasks. Conventional classifiers based on support vector machines cannot be easily adopted to optimize the MAP metric. On the other hand, classifiers based on deep neural networks (DNNs) have recently been shown to deliver a great discrimination capability in automatic speech recognition and image classification as well. However, DNNs are usually optimized with the minimum cross entropy criterion. In contrast to most conventional classification methods, our proposed approach can be formulated to embed DNNs and MAP into the objective function to be optimized during training. The combination of the proposed maximum MAP (MMAP) technique and DNNs introduces nonlinearity to the linear discriminant function (LDF) in order to increase the flexibility and discriminant power of the original MFoM-trained LDF based classifiers. Tested on both automatic image annotation and audio event classification, the experimental results show consistent improvements of MAP on both datasets when compared with other state-of-the-art classifiers without using MMAP.
{"title":"A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers","authors":"Kehuang Li, Zhen Huang, You-Chi Cheng, Chin-Hui Lee","doi":"10.1109/ICASSP.2014.6854454","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854454","url":null,"abstract":"We propose a maximal figure-of-merit (MFoM) learning framework to directly maximize mean average precision (MAP) which is a key performance metric in many multi-class classification tasks. Conventional classifiers based on support vector machines cannot be easily adopted to optimize the MAP metric. On the other hand, classifiers based on deep neural networks (DNNs) have recently been shown to deliver a great discrimination capability in automatic speech recognition and image classification as well. However, DNNs are usually optimized with the minimum cross entropy criterion. In contrast to most conventional classification methods, our proposed approach can be formulated to embed DNNs and MAP into the objective function to be optimized during training. The combination of the proposed maximum MAP (MMAP) technique and DNNs introduces nonlinearity to the linear discriminant function (LDF) in order to increase the flexibility and discriminant power of the original MFoM-trained LDF based classifiers. Tested on both automatic image annotation and audio event classification, the experimental results show consistent improvements of MAP on both datasets when compared with other state-of-the-art classifiers without using MMAP.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4503-4507"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79662591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6853932
J. Goldberger
We propose a new detection algorithm for MIMO communication systems employing a two-dimensional marginal of the Gaussian approximation of the exact discrete distribution of the transmitted data given the received data. From the 2D distributions we derive one-dimensional marginals by averaging all the 2D joint distributions related to a single input symbol. We prove that this strategy to obtain a 1D distribution from a set of not necessarily consistent 2D distributions is optimal (for a specified criterion). The improved performance of the proposed algorithm is demonstrated on several instances of the problem of MIMO detection.
{"title":"MIMO detection based on averaging Gaussian projections","authors":"J. Goldberger","doi":"10.1109/ICASSP.2014.6853932","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853932","url":null,"abstract":"We propose a new detection algorithm for MIMO communication systems employing a two-dimensional marginal of the Gaussian approximation of the exact discrete distribution of the transmitted data given the received data. From the 2D distributions we derive one-dimensional marginals by averaging all the 2D joint distributions related to a single input symbol. We prove that this strategy to obtain a 1D distribution from a set of not necessarily consistent 2D distributions is optimal (for a specified criterion). The improved performance of the proposed algorithm is demonstrated on several instances of the problem of MIMO detection.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"1916-1920"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80095244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}