Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854018
C. Hegde, Aswin C. Sankaranarayanan, Richard Baraniuk
We consider the efficient acquisition, parameter estimation, and recovery of signal ensembles that lie on a low-dimensional manifold in a high-dimensional ambient signal space. Our particular focus is on randomized, compressive acquisition of signals from the manifold generated by the transformation of a base signal by operators from a Lie group. Such manifolds factor prominently in a number of applications, including radar and sonar array processing, camera arrays, and video processing. Leveraging the fact that Lie group manifolds admit a convenient analytical characterization, we develop new theory and algorithms for: (1) estimating the Lie operator parameters from compressive measurements, and (2) recovering the base signal from compressive measurements. We validate our approach with several of numerical simulations, including the reconstruction of an affine-transformed video sequence from compressive measurements.
{"title":"LIE operators for compressive sensing","authors":"C. Hegde, Aswin C. Sankaranarayanan, Richard Baraniuk","doi":"10.1109/ICASSP.2014.6854018","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854018","url":null,"abstract":"We consider the efficient acquisition, parameter estimation, and recovery of signal ensembles that lie on a low-dimensional manifold in a high-dimensional ambient signal space. Our particular focus is on randomized, compressive acquisition of signals from the manifold generated by the transformation of a base signal by operators from a Lie group. Such manifolds factor prominently in a number of applications, including radar and sonar array processing, camera arrays, and video processing. Leveraging the fact that Lie group manifolds admit a convenient analytical characterization, we develop new theory and algorithms for: (1) estimating the Lie operator parameters from compressive measurements, and (2) recovering the base signal from compressive measurements. We validate our approach with several of numerical simulations, including the reconstruction of an affine-transformed video sequence from compressive measurements.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"2342-2346"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91284213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854108
K. Priya, K. Manasa, Sumohana S. Channappayya
Sparsity-based Distance Measure (SDM), a sparse reconstruction-based image similarity measure was recently proposed and shown to have promising applications in image classification, clustering and retrieval. In this paper, we present a statistical evaluation of SDM's performance as an image quality assessment (IQA) algorithm. This evaluation is carried out on the LIVE image database. We show that the SDM performs fairly in comparison with the state-of-the-art while possessing several attractive properties. Specifically, we demonstrate its robustness to rotation (90°, 180°), scaling, and combinations of distortions - properties that are highly desirable of any IQA algorithm.
{"title":"A statistical evaluation of Sparsity-based Distance Measure (SDM) as an image quality assessment algorithm","authors":"K. Priya, K. Manasa, Sumohana S. Channappayya","doi":"10.1109/ICASSP.2014.6854108","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854108","url":null,"abstract":"Sparsity-based Distance Measure (SDM), a sparse reconstruction-based image similarity measure was recently proposed and shown to have promising applications in image classification, clustering and retrieval. In this paper, we present a statistical evaluation of SDM's performance as an image quality assessment (IQA) algorithm. This evaluation is carried out on the LIVE image database. We show that the SDM performs fairly in comparison with the state-of-the-art while possessing several attractive properties. Specifically, we demonstrate its robustness to rotation (90°, 180°), scaling, and combinations of distortions - properties that are highly desirable of any IQA algorithm.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"2789-2792"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89848384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854053
J. Dennis, T. H. Dat, Haizhou Li, Chng Eng Siong
Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.
{"title":"A discriminatively trained Hough Transform for frame-level phoneme recognition","authors":"J. Dennis, T. H. Dat, Haizhou Li, Chng Eng Siong","doi":"10.1109/ICASSP.2014.6854053","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854053","url":null,"abstract":"Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 12 1","pages":"2514-2518"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83841263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854643
V. Schwarz, Gabor Hannak, G. Matz
Average consensus is a well-studied method for distributed averaging. The convergence properties of average consensus depend on the averaging weights. Examples for commonly used weight designs are Metropolis-Hastings (MH) weights and constant weights. In this paper, we provide a complete convergence analysis for a generalized MH weight design that encompasses conventional MH as special case. More specifically, we formulate sufficient and necessary conditions for convergence. A main conclusion is that AC with MH weights is guaranteed to converge unless the underlying network is a regular bipartite graph.
{"title":"On the convergence of average consensus with generalized metropolis-hasting weights","authors":"V. Schwarz, Gabor Hannak, G. Matz","doi":"10.1109/ICASSP.2014.6854643","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854643","url":null,"abstract":"Average consensus is a well-studied method for distributed averaging. The convergence properties of average consensus depend on the averaging weights. Examples for commonly used weight designs are Metropolis-Hastings (MH) weights and constant weights. In this paper, we provide a complete convergence analysis for a generalized MH weight design that encompasses conventional MH as special case. More specifically, we formulate sufficient and necessary conditions for convergence. A main conclusion is that AC with MH weights is guaranteed to converge unless the underlying network is a regular bipartite graph.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 1","pages":"5442-5446"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74962987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854930
Xiao-Lei Zhang
The mismatching problem between the training and test speech corpora hinders the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem by the unsupervised domain adaptation techniques, which try to find a shared feature subspace between the mismatching corpora. The denoising deep neural network is used as the learning machine. Three domain adaptation techniques are used for analysis. Experimental results show that the unsupervised domain adaptation technique is promising to the mismatching problem of VAD.
{"title":"Unsupervised domain adaptation for deep neural network based voice activity detection","authors":"Xiao-Lei Zhang","doi":"10.1109/ICASSP.2014.6854930","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854930","url":null,"abstract":"The mismatching problem between the training and test speech corpora hinders the practical use of the machine-learning-based voice activity detection (VAD). In this paper, we try to address this problem by the unsupervised domain adaptation techniques, which try to find a shared feature subspace between the mismatching corpora. The denoising deep neural network is used as the learning machine. Three domain adaptation techniques are used for analysis. Experimental results show that the unsupervised domain adaptation technique is promising to the mismatching problem of VAD.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"140 1","pages":"6864-6868"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78535753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854160
Tetsuya Kawase, Masanori Takehara, S. Tamura, S. Hayamizu, Ryuhei Tenmoku, T. Kurata
In this paper, we propose to use staying area data toward the estimation of serving time for customers. To classify utterances enables us to estimate conversation types between speakers. However, its performance becomes lower in real environments. We propose a method using area data with sound data to solve this problem. We also propose a method to estimate the conversation types using the decision trees. They were tested with the data recorded in a Japanese restaurant. In the experiment to classify utterances, the proposed method performed better than the method using only sound data. In the experiment to estimate the conversation types, we succeeded to recover 70% of the mis-classified conversations using both of sound and area data.
{"title":"Improvement of utterance clustering by using employees' sound and area data","authors":"Tetsuya Kawase, Masanori Takehara, S. Tamura, S. Hayamizu, Ryuhei Tenmoku, T. Kurata","doi":"10.1109/ICASSP.2014.6854160","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854160","url":null,"abstract":"In this paper, we propose to use staying area data toward the estimation of serving time for customers. To classify utterances enables us to estimate conversation types between speakers. However, its performance becomes lower in real environments. We propose a method using area data with sound data to solve this problem. We also propose a method to estimate the conversation types using the decision trees. They were tested with the data recorded in a Japanese restaurant. In the experiment to classify utterances, the proposed method performed better than the method using only sound data. In the experiment to estimate the conversation types, we succeeded to recover 70% of the mis-classified conversations using both of sound and area data.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"3047-3051"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79973280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854040
Enming Luo, Stanley H. Chan, Truong Q. Nguyen
Classical image denoising algorithms based on single noisy images and generic image databases will soon reach their performance limits. In this paper, we propose to denoise images using targeted external image databases. Formulating denoising as an optimal filter design problem, we utilize the targeted databases to (1) determine the basis functions of the optimal filter by means of group sparsity; (2) determine the spectral coefficients of the optimal filter by means of localized priors. For a variety of scenarios such as text images, multiview images, and face images, we demonstrate superior denoising results over existing algorithms.
{"title":"Image denoising by targeted external databases","authors":"Enming Luo, Stanley H. Chan, Truong Q. Nguyen","doi":"10.1109/ICASSP.2014.6854040","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854040","url":null,"abstract":"Classical image denoising algorithms based on single noisy images and generic image databases will soon reach their performance limits. In this paper, we propose to denoise images using targeted external image databases. Formulating denoising as an optimal filter design problem, we utilize the targeted databases to (1) determine the basis functions of the optimal filter by means of group sparsity; (2) determine the spectral coefficients of the optimal filter by means of localized priors. For a variety of scenarios such as text images, multiview images, and face images, we demonstrate superior denoising results over existing algorithms.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"119 1","pages":"2450-2454"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91536582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854578
Irene Camino, U. Zölzer
Nowadays, many different image processing applications are of high interest to maritime authorities because of security reasons. Depending on the application, different kinds of images are employed. The extraction of ship silhouettes requires high resolution images in order to obtain accurate results. However, when the characteristics of the naval environment are visible the background complexity increases greatly and automatic approaches fail. In order to overcome these difficulties we propose an automatic initialization for graph segmentation techniques. A comparative study of earlier suggested initializations for different graph segmentation techniques is also presented. It shows that, under such unfavorable image conditions, finding the proper initialization in an automatic way is not trivial. Yet, the precision and recall achieved by our initialization are considerable higher regardless the graph segmentation. Furthermore, the performance is highly increased since the best results are obtained after only the first iteration.
{"title":"Automatic initialization for naval application of graph segmentation techniques: A comparative study","authors":"Irene Camino, U. Zölzer","doi":"10.1109/ICASSP.2014.6854578","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854578","url":null,"abstract":"Nowadays, many different image processing applications are of high interest to maritime authorities because of security reasons. Depending on the application, different kinds of images are employed. The extraction of ship silhouettes requires high resolution images in order to obtain accurate results. However, when the characteristics of the naval environment are visible the background complexity increases greatly and automatic approaches fail. In order to overcome these difficulties we propose an automatic initialization for graph segmentation techniques. A comparative study of earlier suggested initializations for different graph segmentation techniques is also presented. It shows that, under such unfavorable image conditions, finding the proper initialization in an automatic way is not trivial. Yet, the precision and recall achieved by our initialization are considerable higher regardless the graph segmentation. Furthermore, the performance is highly increased since the best results are obtained after only the first iteration.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"5120-5124"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87714610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6853932
J. Goldberger
We propose a new detection algorithm for MIMO communication systems employing a two-dimensional marginal of the Gaussian approximation of the exact discrete distribution of the transmitted data given the received data. From the 2D distributions we derive one-dimensional marginals by averaging all the 2D joint distributions related to a single input symbol. We prove that this strategy to obtain a 1D distribution from a set of not necessarily consistent 2D distributions is optimal (for a specified criterion). The improved performance of the proposed algorithm is demonstrated on several instances of the problem of MIMO detection.
{"title":"MIMO detection based on averaging Gaussian projections","authors":"J. Goldberger","doi":"10.1109/ICASSP.2014.6853932","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853932","url":null,"abstract":"We propose a new detection algorithm for MIMO communication systems employing a two-dimensional marginal of the Gaussian approximation of the exact discrete distribution of the transmitted data given the received data. From the 2D distributions we derive one-dimensional marginals by averaging all the 2D joint distributions related to a single input symbol. We prove that this strategy to obtain a 1D distribution from a set of not necessarily consistent 2D distributions is optimal (for a specified criterion). The improved performance of the proposed algorithm is demonstrated on several instances of the problem of MIMO detection.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"1916-1920"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80095244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-07-14DOI: 10.1109/ICASSP.2014.6854095
F. Verde, A. Scaglione, D. Darsena, G. Gelli
In this paper, we propose an opportunistic amplify-and-forward relaying scheme for a cognitive radio network, which is aimed at allowing a secondary user (SU) to transmit over the same time-frequency slot of a primary user (PU). In our scheme, the SU amplifies and transmits the PU signal it receives, by using as relaying gain the information symbols that the SU wishes to transmit towards its own secondary receiver. The information theoretic limits of the proposed protocol are investigated by showing that, in some operative conditions of practical interest, the SU can embed its information symbols in the PU signal, without violating the cognitive radio principle of protecting the PU transmission and, at the same time, by attaining low transmission rates.
{"title":"An amplify-and-forward scheme for cognitive radios","authors":"F. Verde, A. Scaglione, D. Darsena, G. Gelli","doi":"10.1109/ICASSP.2014.6854095","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854095","url":null,"abstract":"In this paper, we propose an opportunistic amplify-and-forward relaying scheme for a cognitive radio network, which is aimed at allowing a secondary user (SU) to transmit over the same time-frequency slot of a primary user (PU). In our scheme, the SU amplifies and transmits the PU signal it receives, by using as relaying gain the information symbols that the SU wishes to transmit towards its own secondary receiver. The information theoretic limits of the proposed protocol are investigated by showing that, in some operative conditions of practical interest, the SU can embed its information symbols in the PU signal, without violating the cognitive radio principle of protecting the PU transmission and, at the same time, by attaining low transmission rates.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"2724-2728"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83833025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}