Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518336
G. N. Karystinos, A. Liavas
Cooperative communications is a rapidly evolving research area. Most of the cooperative protocols that have appeared in the literature assume slow flat fading channels and Gaussian codebooks. In many cases the relays must fully decode their input. It is well known that cooperation is most effective at low SNR where binary input is optimal. Furthermore, energy and cost effectiveness make simple relays most attractive. Motivated by these two facts, we consider a half-duplex orthogonal cooperation protocol with binary input and relays that simply forward their symbol-by-symbol decisions to the destination which performs algebraic decoding; we call it demodulate-and-forward (DmF). We assume independent slow Rayleigh flat fading channels with full channel state information (CSI) at the destination and compute an upper bound for the outage capacity of the DmF protocol. For low SNR and small outage probability, we derive a simple approximation to this bound. For comparison purposes, we compute the outage capacity of direct binary transmission and a simple low-SNR small-outage-probability approximation. We observe that for very small outage probability the DmF protocol significantly outperforms direct transmission. However, for (relatively) high outage probability, the opposite may happen.
{"title":"Outage capacity of a cooperative scheme with binary input and a simple relay","authors":"G. N. Karystinos, A. Liavas","doi":"10.1109/ICASSP.2008.4518336","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518336","url":null,"abstract":"Cooperative communications is a rapidly evolving research area. Most of the cooperative protocols that have appeared in the literature assume slow flat fading channels and Gaussian codebooks. In many cases the relays must fully decode their input. It is well known that cooperation is most effective at low SNR where binary input is optimal. Furthermore, energy and cost effectiveness make simple relays most attractive. Motivated by these two facts, we consider a half-duplex orthogonal cooperation protocol with binary input and relays that simply forward their symbol-by-symbol decisions to the destination which performs algebraic decoding; we call it demodulate-and-forward (DmF). We assume independent slow Rayleigh flat fading channels with full channel state information (CSI) at the destination and compute an upper bound for the outage capacity of the DmF protocol. For low SNR and small outage probability, we derive a simple approximation to this bound. For comparison purposes, we compute the outage capacity of direct binary transmission and a simple low-SNR small-outage-probability approximation. We observe that for very small outage probability the DmF protocol significantly outperforms direct transmission. However, for (relatively) high outage probability, the opposite may happen.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518351
Celso H. H. Ribas, J. Bermudez, N. Bershad
This paper presents a novel implementation for identifying sparse telephone network echo channels. The new scheme follows the approach used in [1] in that the location of the channel response peak is estimated in the wavelet domain. A short time-domain adaptive filter is then located about the estimated peak to identify the sparse response. The primary purpose of this paper is to present an efficient design of such system. The use of a new block wavelet transform results in both 70% less computational complexity and improved peak detection. A new robust time-domain adaptive filtering is also proposed which significantly reduces the jitter problem in [1]. Monte Carlo simulations show excellent echo cancellation for a typical ITU-T channel.
{"title":"Low-complexity robust sparse channel identification using partial block wavelet transforms-analysis and implementation","authors":"Celso H. H. Ribas, J. Bermudez, N. Bershad","doi":"10.1109/ICASSP.2008.4518351","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518351","url":null,"abstract":"This paper presents a novel implementation for identifying sparse telephone network echo channels. The new scheme follows the approach used in [1] in that the location of the channel response peak is estimated in the wavelet domain. A short time-domain adaptive filter is then located about the estimated peak to identify the sparse response. The primary purpose of this paper is to present an efficient design of such system. The use of a new block wavelet transform results in both 70% less computational complexity and improved peak detection. A new robust time-domain adaptive filtering is also proposed which significantly reduces the jitter problem in [1]. Monte Carlo simulations show excellent echo cancellation for a typical ITU-T channel.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121133706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518617
Sha Meng, YU Peng, Jia Liu, F. Seide
We examine the task of spoken term detection in Chinese spontaneous speech with a lattice-based approach. We first compare lattices generated with different units: word, character, tonal and toneless syllables, and also lattices converted from one unit to another unit. Then we combine lattices from multiple systems into a single lattice. By fully exploiting the redundant information in the combined lattice with a time-based node/arc merging, we achieve the result of a compact lattice index with the accuracy improved to 79.2% from 73.9% using the best subsystem.
{"title":"Fusing multiple systems into a compact lattice index for chinese spoken term detection","authors":"Sha Meng, YU Peng, Jia Liu, F. Seide","doi":"10.1109/ICASSP.2008.4518617","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518617","url":null,"abstract":"We examine the task of spoken term detection in Chinese spontaneous speech with a lattice-based approach. We first compare lattices generated with different units: word, character, tonal and toneless syllables, and also lattices converted from one unit to another unit. Then we combine lattices from multiple systems into a single lattice. By fully exploiting the redundant information in the combined lattice with a time-based node/arc merging, we achieve the result of a compact lattice index with the accuracy improved to 79.2% from 73.9% using the best subsystem.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121268443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518290
S. Serbetli
In OFDM systems, Doppler spreading due to the mobility of the receiver distorts the orthogonality among the subcarriers, and results in intercarrier interference that degrades the performance. In this paper, we investigate how Doppler spreading can be mitigated by using multiple antennas. In this context, we propose a novel antenna combining framework exploiting the correlation among the time varying channels seen by the multiple antennas. Depending on the computational complexity requirements, the scheme can take the form of beamforming, beamforming with frequency offset correction and simple time-varying combining schemes. We derive the optimum combining scheme in each context, and show that by using the proposed combining schemes, the performance of the mobile OFDM systems can be greatly enhanced.
{"title":"A simple antenna combining framework for Doppler compensation in mobile OFDM systems","authors":"S. Serbetli","doi":"10.1109/ICASSP.2008.4518290","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518290","url":null,"abstract":"In OFDM systems, Doppler spreading due to the mobility of the receiver distorts the orthogonality among the subcarriers, and results in intercarrier interference that degrades the performance. In this paper, we investigate how Doppler spreading can be mitigated by using multiple antennas. In this context, we propose a novel antenna combining framework exploiting the correlation among the time varying channels seen by the multiple antennas. Depending on the computational complexity requirements, the scheme can take the form of beamforming, beamforming with frequency offset correction and simple time-varying combining schemes. We derive the optimum combining scheme in each context, and show that by using the proposed combining schemes, the performance of the mobile OFDM systems can be greatly enhanced.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121342951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518223
Paschalis Tsiaflakis, M. Moonen
Modern DSL networks suffer from crosstalk between different lines in the same cable bundle. By carefully choosing the transmit power spectra, the impact of crosstalk can be minimized leading to spectacular performance gains. This is also referred to as dynamic spectrum management (DSM). This paper presents three novel low-complexity DSM algorithms with a different level of required message-passing. This level ranges from fully autonomous and distributed to semi-centralized execution. Simulations show good performances compared to existing state-of-the-art DSM algorithms.
{"title":"Low-complexity dynamic spectrum management algorithms for digital subscriber lines","authors":"Paschalis Tsiaflakis, M. Moonen","doi":"10.1109/ICASSP.2008.4518223","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518223","url":null,"abstract":"Modern DSL networks suffer from crosstalk between different lines in the same cable bundle. By carefully choosing the transmit power spectra, the impact of crosstalk can be minimized leading to spectacular performance gains. This is also referred to as dynamic spectrum management (DSM). This paper presents three novel low-complexity DSM algorithms with a different level of required message-passing. This level ranges from fully autonomous and distributed to semi-centralized execution. Simulations show good performances compared to existing state-of-the-art DSM algorithms.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518711
Jun Du, Qiang Huo
This paper presents a new feature compensation approach to noisy speech recognition by using piecewise linear approximation (PLA) of an explicit model of environmental distortions. Two traditional approaches, namely vector Taylor series (VTS) and MAX approximations, are two special cases of our proposed approach. Formulations for maximum likelihood (ML) estimation of noise model parameters and minimum mean square error (MMSE) estimation of clean speech are derived. A hybrid approach of using different approximations for different types of noisy speech segments is also proposed. Experimental results on Aurora2 and Aurora3 databases demonstrate that the proposed approaches achieve consistently significant improvements in recognition accuracy compared to the traditional VTS-based feature compensation approach.
{"title":"A feature compensation approach using piecewise linear approximation of an explicit distortion model for noisy speech recognition","authors":"Jun Du, Qiang Huo","doi":"10.1109/ICASSP.2008.4518711","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518711","url":null,"abstract":"This paper presents a new feature compensation approach to noisy speech recognition by using piecewise linear approximation (PLA) of an explicit model of environmental distortions. Two traditional approaches, namely vector Taylor series (VTS) and MAX approximations, are two special cases of our proposed approach. Formulations for maximum likelihood (ML) estimation of noise model parameters and minimum mean square error (MMSE) estimation of clean speech are derived. A hybrid approach of using different approximations for different types of noisy speech segments is also proposed. Experimental results on Aurora2 and Aurora3 databases demonstrate that the proposed approaches achieve consistently significant improvements in recognition accuracy compared to the traditional VTS-based feature compensation approach.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518202
J. Jaldén, D. Seethaler, G. Matz
Lattice reduction by means of the LLL algorithm has been previously suggested as a powerful preprocessing tool that allows to improve the performance of suboptimal detectors and to reduce the complexity of optimal MIMO detectors. The complexity of the LLL algorithm is often cited as polynomial in the dimension of the lattice. In this paper we argue that this statement is not correct when made in the MIMO context. Specifically, we demonstrate that in typical communication scenarios the worst-case complexity of the LLL algorithm is not even finite. For i.i.d. Rayleigh fading channels, we further prove that the average LLL complexity is polynomial and that the probability for an atypically large number of LLL iterations decays exponentially.
{"title":"Worst- and average-case complexity of LLL lattice reduction in MIMO wireless systems","authors":"J. Jaldén, D. Seethaler, G. Matz","doi":"10.1109/ICASSP.2008.4518202","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518202","url":null,"abstract":"Lattice reduction by means of the LLL algorithm has been previously suggested as a powerful preprocessing tool that allows to improve the performance of suboptimal detectors and to reduce the complexity of optimal MIMO detectors. The complexity of the LLL algorithm is often cited as polynomial in the dimension of the lattice. In this paper we argue that this statement is not correct when made in the MIMO context. Specifically, we demonstrate that in typical communication scenarios the worst-case complexity of the LLL algorithm is not even finite. For i.i.d. Rayleigh fading channels, we further prove that the average LLL complexity is polynomial and that the probability for an atypically large number of LLL iterations decays exponentially.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114013674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517946
Zhenhua Qu, Weiqi Luo, Jiwu Huang
The artifacts by JPEG recompression have been demonstrated to be useful in passive image authentication. In this paper, we focus on the shifted double JPEG problem, aiming at identifying if a given JPEG image has ever been compressed twice with inconsistent block segmentation. We formulated the shifted double JPEG compression (SD-JPEG) as a noisy convolutive mixing model mostly studied in blind source separation (BSS). In noise free condition, the model can be solved by directly applying the independent component analysis (ICA) method with minor constraint to the contents of natural images. In order to achieve robust identification in noisy condition, the asymmetry of the independent value map (IVM) is exploited to obtain a normalized criteria of the independency. We generate a total of 13 features to fully represent the asymmetric characteristic of the independent value map and then feed to a support vector machine (SVM) classifier. Experiment results on a set of 1000 images, with various parameter settings, demonstrated the effectiveness of our method.
{"title":"A convolutive mixing model for shifted double JPEG compression with application to passive image authentication","authors":"Zhenhua Qu, Weiqi Luo, Jiwu Huang","doi":"10.1109/ICASSP.2008.4517946","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517946","url":null,"abstract":"The artifacts by JPEG recompression have been demonstrated to be useful in passive image authentication. In this paper, we focus on the shifted double JPEG problem, aiming at identifying if a given JPEG image has ever been compressed twice with inconsistent block segmentation. We formulated the shifted double JPEG compression (SD-JPEG) as a noisy convolutive mixing model mostly studied in blind source separation (BSS). In noise free condition, the model can be solved by directly applying the independent component analysis (ICA) method with minor constraint to the contents of natural images. In order to achieve robust identification in noisy condition, the asymmetry of the independent value map (IVM) is exploited to obtain a normalized criteria of the independency. We generate a total of 13 features to fully represent the asymmetric characteristic of the independent value map and then feed to a support vector machine (SVM) classifier. Experiment results on a set of 1000 images, with various parameter settings, demonstrated the effectiveness of our method.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114867628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518568
Wade Shen, D. Reynolds
In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.
{"title":"Improved GMM-based language recognition using constrained MLLR transforms","authors":"Wade Shen, D. Reynolds","doi":"10.1109/ICASSP.2008.4518568","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518568","url":null,"abstract":"In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124024933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517932
T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.
{"title":"Age and gender recognition for telephone applications based on GMM supervectors and support vector machines","authors":"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth","doi":"10.1109/ICASSP.2008.4517932","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517932","url":null,"abstract":"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126238661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}