Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518827
R. Narayanan
This paper examines the results of our research on the use of ultra-wideband noise waveforms for imaging objects behind walls. The advantages of using thermally generated noise as a probing signal are introduced. The technique of heterodyne correlation, used to inject coherence in the random noise probing signal and to collapse the wideband reflected signal into a single frequency, is presented. We address issues related to locating, detection, and tracking humans behind walls using the Hilbert-Huang transform approach for human activity characterization. The results indicate that noise radar technology combined with modern signal processing approaches is indeed a viable technique for covert high-resolution imaging of obscured stationary and moving targets.
{"title":"Through wall radar imaging using UWB noise waveforms","authors":"R. Narayanan","doi":"10.1109/ICASSP.2008.4518827","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518827","url":null,"abstract":"This paper examines the results of our research on the use of ultra-wideband noise waveforms for imaging objects behind walls. The advantages of using thermally generated noise as a probing signal are introduced. The technique of heterodyne correlation, used to inject coherence in the random noise probing signal and to collapse the wideband reflected signal into a single frequency, is presented. We address issues related to locating, detection, and tracking humans behind walls using the Hilbert-Huang transform approach for human activity characterization. The results indicate that noise radar technology combined with modern signal processing approaches is indeed a viable technique for covert high-resolution imaging of obscured stationary and moving targets.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125082350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517723
F. Luisier, T. Blu
We propose an extension of the recently devised SURE-LET grayscale denoising approach for multichannel images. Assuming additive Gaussian white noise, the unknown linear parameters of a transform-domain/wwfwwe multichannel thresholding are globally optimized by minimizing Stein's unbiased MSE estimate (SURE) in the image-domain. Using the undecimated wavelet transform, we demonstrate the efficiency of this approach for denoising color images by comparing our results with two other state-of-the-art denoising algorithms.
{"title":"SURE-LET multichannel image denoising: undecimated wavelet thresholding","authors":"F. Luisier, T. Blu","doi":"10.1109/ICASSP.2008.4517723","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517723","url":null,"abstract":"We propose an extension of the recently devised SURE-LET grayscale denoising approach for multichannel images. Assuming additive Gaussian white noise, the unknown linear parameters of a transform-domain/wwfwwe multichannel thresholding are globally optimized by minimizing Stein's unbiased MSE estimate (SURE) in the image-domain. Using the undecimated wavelet transform, we demonstrate the efficiency of this approach for denoising color images by comparing our results with two other state-of-the-art denoising algorithms.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125120819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a 3-layer Bayesian hierarchical detection framework (BHDF) is proposed for robust parking space detection. In practice, the challenges of the parking space detection problem come from luminance variations, inter- occlusions among cars, and occlusions caused by environmental obstacles. Instead of determining the status of parking spaces one by one, the proposed BHDF framework models the inter-occluded patterns as semantic knowledge and couple local classifiers with adjacency constraints to determine the status of parking spaces in a row-by-row manner. By applying the BHDF to the parking space detection problem, the available parking spaces and the labeling of parked cars can be achieved in a robust and efficient manner. Furthermore, this BHDF framework is generic enough to be used for various kinds of detection and segmentation applications.
{"title":"A Bayesian hierarchical detection framework for parking space detection","authors":"Chingchun Huang, Sheng-Jyh Wang, Yao-Jen Chang, Tsuhan Chen","doi":"10.1109/ICASSP.2008.4518055","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518055","url":null,"abstract":"In this paper, a 3-layer Bayesian hierarchical detection framework (BHDF) is proposed for robust parking space detection. In practice, the challenges of the parking space detection problem come from luminance variations, inter- occlusions among cars, and occlusions caused by environmental obstacles. Instead of determining the status of parking spaces one by one, the proposed BHDF framework models the inter-occluded patterns as semantic knowledge and couple local classifiers with adjacency constraints to determine the status of parking spaces in a row-by-row manner. By applying the BHDF to the parking space detection problem, the available parking spaces and the labeling of parked cars can be achieved in a robust and efficient manner. Furthermore, this BHDF framework is generic enough to be used for various kinds of detection and segmentation applications.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125889803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517911
Kai Steinert, M. Schönle, C. Beaugeant, T. Fingscheidt
Echo cancellation and noise reduction for hands-free systems are challenging tasks in speech signal processing. The presence of strong local speech and noise and a changing acoustical enclosing may severely impair the performance of the algorithms. Usually additional constraints such as a low signal delay are also requested for real time implementation. We present a hands-free system consisting of a delayless sub- band adaptive filter with a low-delay echo and noise suppression postfilter. All parameters are estimated in the subband domain, whereas the filtering takes place in the time domain. Thus, our system has a significantly lower processing delay than similar proposals. We compare its performance with respect to echo and noise attenuation and speech distortion with a state-of-the-art hands-free system in a simulated car environment.
{"title":"Hands-free system with low-delay subband acoustic echo control and noise reduction","authors":"Kai Steinert, M. Schönle, C. Beaugeant, T. Fingscheidt","doi":"10.1109/ICASSP.2008.4517911","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517911","url":null,"abstract":"Echo cancellation and noise reduction for hands-free systems are challenging tasks in speech signal processing. The presence of strong local speech and noise and a changing acoustical enclosing may severely impair the performance of the algorithms. Usually additional constraints such as a low signal delay are also requested for real time implementation. We present a hands-free system consisting of a delayless sub- band adaptive filter with a low-delay echo and noise suppression postfilter. All parameters are estimated in the subband domain, whereas the filtering takes place in the time domain. Thus, our system has a significantly lower processing delay than similar proposals. We compare its performance with respect to echo and noise attenuation and speech distortion with a state-of-the-art hands-free system in a simulated car environment.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123217509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517554
S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino
This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a QCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.
{"title":"Speaker indexing and speech enhancement in real meetings / conversations","authors":"S. Araki, M. Fujimoto, K. Ishizuka, H. Sawada, S. Makino","doi":"10.1109/ICASSP.2008.4517554","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517554","url":null,"abstract":"This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a QCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123435057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518840
Francesco Nicolo, N. Schmid
The ability of practical biometric systems to recognize a large number of subjects is constrained by a variety of factors that include a choice of a source encoding technique, quality of images, complexity and variability of underlying patterns and of collected data. Given a source encoding technique, the remaining factors can be attributed to distortions due to a biometric recognition channel. In this work, we define empirical mutual information and recognition rate and evaluate empirical recognition capacity of biometric systems under the constraint of two global encoding techniques: principal component analysis (PCA) and independent component analysis (ICA). The empirical capacity of biometric systems is numerically evaluated as a point of intersection of the empirical mutual information rate curve plotted as a function of the recognition rate and the diagonal line bisecting the first quadrant. The developed methodology is applied to find the empirical capacity of different recognition channels formed during acquisition of different iris and face databases.
{"title":"Empirical capacity of a biometric channel under the constraint of global PCA and ICA encoding","authors":"Francesco Nicolo, N. Schmid","doi":"10.1109/ICASSP.2008.4518840","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518840","url":null,"abstract":"The ability of practical biometric systems to recognize a large number of subjects is constrained by a variety of factors that include a choice of a source encoding technique, quality of images, complexity and variability of underlying patterns and of collected data. Given a source encoding technique, the remaining factors can be attributed to distortions due to a biometric recognition channel. In this work, we define empirical mutual information and recognition rate and evaluate empirical recognition capacity of biometric systems under the constraint of two global encoding techniques: principal component analysis (PCA) and independent component analysis (ICA). The empirical capacity of biometric systems is numerically evaluated as a point of intersection of the empirical mutual information rate curve plotted as a function of the recognition rate and the diagonal line bisecting the first quadrant. The developed methodology is applied to find the empirical capacity of different recognition channels formed during acquisition of different iris and face databases.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123521422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518602
Shinji Watanabe, Atsushi Nakamura
Incremental adaptation techniques for speech recognition are aimed at adjusting acoustic models quickly and stably to time-variant acoustic characteristics due to temporal changes of speaker, speaking style, noise source, etc. We proposed a novel incremental adaptation framework based on a macroscopic time evolution system, which models the time-variant characteristics by successively updating posterior distributions of acoustic model parameters. In this paper, we provide a unified interpretation of the proposal and the two major conventional approaches of indirect adaptation via transformation parameters (e.g. maximum likelihood linear regression (MLLR)) and direct adaptation of classifier parameters (e.g. maximum a posteriori (MAP)). We reveal analytically and experimentally that the proposed incremental adaptation involves both the conventional and their combinatorial approaches, and simultaneously possesses their quick and stable adaptation characteristics.
{"title":"A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches","authors":"Shinji Watanabe, Atsushi Nakamura","doi":"10.1109/ICASSP.2008.4518602","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518602","url":null,"abstract":"Incremental adaptation techniques for speech recognition are aimed at adjusting acoustic models quickly and stably to time-variant acoustic characteristics due to temporal changes of speaker, speaking style, noise source, etc. We proposed a novel incremental adaptation framework based on a macroscopic time evolution system, which models the time-variant characteristics by successively updating posterior distributions of acoustic model parameters. In this paper, we provide a unified interpretation of the proposal and the two major conventional approaches of indirect adaptation via transformation parameters (e.g. maximum likelihood linear regression (MLLR)) and direct adaptation of classifier parameters (e.g. maximum a posteriori (MAP)). We reveal analytically and experimentally that the proposed incremental adaptation involves both the conventional and their combinatorial approaches, and simultaneously possesses their quick and stable adaptation characteristics.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517715
Shuicheng Yan, Ming Liu, Thomas S. Huang
Motivated by the fact that age information can often be observed from local evidence on the human face, we contribute to the age estimation problem in two aspects. On the one hand, we present a new feature descriptor, called spatially flexible patch (SFP), which encodes the local appearance and position information simultaneously. SFP has the potential to alleviate the problem of insufficient samples owing to that SFPs similar in appearance yet slightly different in position can still provide similar confidence for age estimation. One the other hand, the SFP associated with age label is modeled with Gaussian Mixture Model, and then age estimation is conducted by maximizing the sum of likelihoods from all the SFPs associated with the hypothetic age. Experiments are conducted on the YAMAHA database with 8,000 face images and ages ranging from 0 to 93. Compared with the latest reported results, our new algorithm brings encouraging reduction in mean absolute error for age estimation.
{"title":"Extracting age information from local spatially flexible patches","authors":"Shuicheng Yan, Ming Liu, Thomas S. Huang","doi":"10.1109/ICASSP.2008.4517715","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517715","url":null,"abstract":"Motivated by the fact that age information can often be observed from local evidence on the human face, we contribute to the age estimation problem in two aspects. On the one hand, we present a new feature descriptor, called spatially flexible patch (SFP), which encodes the local appearance and position information simultaneously. SFP has the potential to alleviate the problem of insufficient samples owing to that SFPs similar in appearance yet slightly different in position can still provide similar confidence for age estimation. One the other hand, the SFP associated with age label is modeled with Gaussian Mixture Model, and then age estimation is conducted by maximizing the sum of likelihoods from all the SFPs associated with the hypothetic age. Experiments are conducted on the YAMAHA database with 8,000 face images and ages ranging from 0 to 93. Compared with the latest reported results, our new algorithm brings encouraging reduction in mean absolute error for age estimation.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125561277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517796
Dongbo Min, K. Sohn
This paper presents a novel method for stereo matching with occlusion handling. In order to estimate optimal cost, we define an energy function and solve the iterative equation with the numerical method. We improve performance and convergence rate by using several acceleration techniques. The proposed method is computationally efficient since it does not use color segmentation or any global optimization techniques. For occlusion handling, which has not been performed effectively by any conventional cost aggregation approaches, we combine the occlusion problem with the proposed minimization scheme. Asymmetric information is used so that few additional computational loads are necessary. Experimental results show that performance is comparable to that of many state-of-the-art methods.
{"title":"Stereo matching with asymmetric occlusion handling in weighted least square framework","authors":"Dongbo Min, K. Sohn","doi":"10.1109/ICASSP.2008.4517796","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517796","url":null,"abstract":"This paper presents a novel method for stereo matching with occlusion handling. In order to estimate optimal cost, we define an energy function and solve the iterative equation with the numerical method. We improve performance and convergence rate by using several acceleration techniques. The proposed method is computationally efficient since it does not use color segmentation or any global optimization techniques. For occlusion handling, which has not been performed effectively by any conventional cost aggregation approaches, we combine the occlusion problem with the proposed minimization scheme. Asymmetric information is used so that few additional computational loads are necessary. Experimental results show that performance is comparable to that of many state-of-the-art methods.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115099164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518034
Anchalee Puengnim, N. Thomas, J. Tourneret, Herve Guillon
This paper studies a Bayesian classifier which recognizes Gaussian minimum shift keying (GMSK) modulated signals with different bandwiths. We focus on identifying two different GMSK signals with BT = 0.25 and BT = 0.5 standardized by the consultative committee for space data system (CCSDS) for future space missions. The main idea of the proposed classifier is to compute the posterior probability of the observation sequence given each possible model by a modified Baum-Welch (BW) algorithm. The received GMSK signals are then classified according to the maximum a posteriori (MAP) rule.
{"title":"Classification of GMSK signals with different bandwidths","authors":"Anchalee Puengnim, N. Thomas, J. Tourneret, Herve Guillon","doi":"10.1109/ICASSP.2008.4518034","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518034","url":null,"abstract":"This paper studies a Bayesian classifier which recognizes Gaussian minimum shift keying (GMSK) modulated signals with different bandwiths. We focus on identifying two different GMSK signals with BT = 0.25 and BT = 0.5 standardized by the consultative committee for space data system (CCSDS) for future space missions. The main idea of the proposed classifier is to compute the posterior probability of the observation sequence given each possible model by a modified Baum-Welch (BW) algorithm. The received GMSK signals are then classified according to the maximum a posteriori (MAP) rule.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}