首页 > 最新文献

2008 IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

英文 中文
Outage capacity of a cooperative scheme with binary input and a simple relay 具有二进制输入和简单继电器的合作方案的停电容量
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518336
G. N. Karystinos, A. Liavas
Cooperative communications is a rapidly evolving research area. Most of the cooperative protocols that have appeared in the literature assume slow flat fading channels and Gaussian codebooks. In many cases the relays must fully decode their input. It is well known that cooperation is most effective at low SNR where binary input is optimal. Furthermore, energy and cost effectiveness make simple relays most attractive. Motivated by these two facts, we consider a half-duplex orthogonal cooperation protocol with binary input and relays that simply forward their symbol-by-symbol decisions to the destination which performs algebraic decoding; we call it demodulate-and-forward (DmF). We assume independent slow Rayleigh flat fading channels with full channel state information (CSI) at the destination and compute an upper bound for the outage capacity of the DmF protocol. For low SNR and small outage probability, we derive a simple approximation to this bound. For comparison purposes, we compute the outage capacity of direct binary transmission and a simple low-SNR small-outage-probability approximation. We observe that for very small outage probability the DmF protocol significantly outperforms direct transmission. However, for (relatively) high outage probability, the opposite may happen.
协作通信是一个快速发展的研究领域。文献中出现的大多数合作协议都假设了缓慢的平坦衰落信道和高斯码本。在许多情况下,继电器必须完全解码它们的输入。众所周知,在二进制输入最优的低信噪比条件下,合作是最有效的。此外,能源和成本效益使简单继电器最有吸引力。基于这两个事实,我们考虑了一种具有二进制输入和中继的半双工正交合作协议,该协议简单地将其逐个符号的决定转发到执行代数解码的目的地;我们称之为解调前转(DmF)。我们假设独立的慢瑞利平坦衰落信道在目的地具有完整的信道状态信息(CSI),并计算DmF协议的中断容量的上界。对于低信噪比和小中断概率,我们推导了这个边界的简单近似。为了比较,我们计算了直接二进制传输的中断容量和一个简单的低信噪比小中断概率近似。我们观察到,在非常小的中断概率下,DmF协议明显优于直接传输。然而,对于(相对)高的中断概率,可能会发生相反的情况。
{"title":"Outage capacity of a cooperative scheme with binary input and a simple relay","authors":"G. N. Karystinos, A. Liavas","doi":"10.1109/ICASSP.2008.4518336","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518336","url":null,"abstract":"Cooperative communications is a rapidly evolving research area. Most of the cooperative protocols that have appeared in the literature assume slow flat fading channels and Gaussian codebooks. In many cases the relays must fully decode their input. It is well known that cooperation is most effective at low SNR where binary input is optimal. Furthermore, energy and cost effectiveness make simple relays most attractive. Motivated by these two facts, we consider a half-duplex orthogonal cooperation protocol with binary input and relays that simply forward their symbol-by-symbol decisions to the destination which performs algebraic decoding; we call it demodulate-and-forward (DmF). We assume independent slow Rayleigh flat fading channels with full channel state information (CSI) at the destination and compute an upper bound for the outage capacity of the DmF protocol. For low SNR and small outage probability, we derive a simple approximation to this bound. For comparison purposes, we compute the outage capacity of direct binary transmission and a simple low-SNR small-outage-probability approximation. We observe that for very small outage probability the DmF protocol significantly outperforms direct transmission. However, for (relatively) high outage probability, the opposite may happen.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120876079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low-complexity robust sparse channel identification using partial block wavelet transforms-analysis and implementation 基于部分块小波变换的低复杂度鲁棒稀疏信道识别分析与实现
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518351
Celso H. H. Ribas, J. Bermudez, N. Bershad
This paper presents a novel implementation for identifying sparse telephone network echo channels. The new scheme follows the approach used in [1] in that the location of the channel response peak is estimated in the wavelet domain. A short time-domain adaptive filter is then located about the estimated peak to identify the sparse response. The primary purpose of this paper is to present an efficient design of such system. The use of a new block wavelet transform results in both 70% less computational complexity and improved peak detection. A new robust time-domain adaptive filtering is also proposed which significantly reduces the jitter problem in [1]. Monte Carlo simulations show excellent echo cancellation for a typical ITU-T channel.
提出了一种识别稀疏电话网回波信道的新方法。新方案遵循[1]中使用的方法,在小波域中估计信道响应峰的位置。然后在估计的峰值附近放置一个短时域自适应滤波器来识别稀疏响应。本文的主要目的是提出一个有效的设计这样的系统。使用新的块小波变换,计算复杂度降低了70%,峰值检测也得到了改进。提出了一种新的鲁棒时域自适应滤波方法,显著降低了[1]的抖动问题。蒙特卡罗模拟显示典型ITU-T信道的回波消除效果很好。
{"title":"Low-complexity robust sparse channel identification using partial block wavelet transforms-analysis and implementation","authors":"Celso H. H. Ribas, J. Bermudez, N. Bershad","doi":"10.1109/ICASSP.2008.4518351","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518351","url":null,"abstract":"This paper presents a novel implementation for identifying sparse telephone network echo channels. The new scheme follows the approach used in [1] in that the location of the channel response peak is estimated in the wavelet domain. A short time-domain adaptive filter is then located about the estimated peak to identify the sparse response. The primary purpose of this paper is to present an efficient design of such system. The use of a new block wavelet transform results in both 70% less computational complexity and improved peak detection. A new robust time-domain adaptive filtering is also proposed which significantly reduces the jitter problem in [1]. Monte Carlo simulations show excellent echo cancellation for a typical ITU-T channel.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121133706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fusing multiple systems into a compact lattice index for chinese spoken term detection 多系统融合成紧凑的点阵索引用于汉语口语词检测
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518617
Sha Meng, YU Peng, Jia Liu, F. Seide
We examine the task of spoken term detection in Chinese spontaneous speech with a lattice-based approach. We first compare lattices generated with different units: word, character, tonal and toneless syllables, and also lattices converted from one unit to another unit. Then we combine lattices from multiple systems into a single lattice. By fully exploiting the redundant information in the combined lattice with a time-based node/arc merging, we achieve the result of a compact lattice index with the accuracy improved to 79.2% from 73.9% using the best subsystem.
本文采用基于格子的方法研究汉语自发语音中的口语术语检测任务。我们首先比较由不同单位生成的格:词、字、声调音节和无声调音节,以及从一个单位转换到另一个单位的格。然后我们将多个系统中的晶格组合成一个单一的晶格。通过基于时间的节点/弧合并,充分利用组合格中的冗余信息,得到了一个紧凑的格索引结果,使用最佳子系统,准确率从73.9%提高到79.2%。
{"title":"Fusing multiple systems into a compact lattice index for chinese spoken term detection","authors":"Sha Meng, YU Peng, Jia Liu, F. Seide","doi":"10.1109/ICASSP.2008.4518617","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518617","url":null,"abstract":"We examine the task of spoken term detection in Chinese spontaneous speech with a lattice-based approach. We first compare lattices generated with different units: word, character, tonal and toneless syllables, and also lattices converted from one unit to another unit. Then we combine lattices from multiple systems into a single lattice. By fully exploiting the redundant information in the combined lattice with a time-based node/arc merging, we achieve the result of a compact lattice index with the accuracy improved to 79.2% from 73.9% using the best subsystem.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121268443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
A simple antenna combining framework for Doppler compensation in mobile OFDM systems 一种用于移动OFDM系统多普勒补偿的简单天线组合框架
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518290
S. Serbetli
In OFDM systems, Doppler spreading due to the mobility of the receiver distorts the orthogonality among the subcarriers, and results in intercarrier interference that degrades the performance. In this paper, we investigate how Doppler spreading can be mitigated by using multiple antennas. In this context, we propose a novel antenna combining framework exploiting the correlation among the time varying channels seen by the multiple antennas. Depending on the computational complexity requirements, the scheme can take the form of beamforming, beamforming with frequency offset correction and simple time-varying combining schemes. We derive the optimum combining scheme in each context, and show that by using the proposed combining schemes, the performance of the mobile OFDM systems can be greatly enhanced.
在OFDM系统中,由于接收机的移动性,多普勒扩频会扭曲子载波之间的正交性,并导致载波间干扰,从而降低性能。在本文中,我们研究了如何通过使用多天线来减轻多普勒扩频。在这种情况下,我们提出了一种新的天线组合框架,利用多天线看到的时变信道之间的相关性。根据计算复杂度的要求,该方案可以采用波束形成、频率偏移校正波束形成和简单时变组合方案。给出了各种情况下的最优组合方案,并表明采用所提出的组合方案可以大大提高移动OFDM系统的性能。
{"title":"A simple antenna combining framework for Doppler compensation in mobile OFDM systems","authors":"S. Serbetli","doi":"10.1109/ICASSP.2008.4518290","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518290","url":null,"abstract":"In OFDM systems, Doppler spreading due to the mobility of the receiver distorts the orthogonality among the subcarriers, and results in intercarrier interference that degrades the performance. In this paper, we investigate how Doppler spreading can be mitigated by using multiple antennas. In this context, we propose a novel antenna combining framework exploiting the correlation among the time varying channels seen by the multiple antennas. Depending on the computational complexity requirements, the scheme can take the form of beamforming, beamforming with frequency offset correction and simple time-varying combining schemes. We derive the optimum combining scheme in each context, and show that by using the proposed combining schemes, the performance of the mobile OFDM systems can be greatly enhanced.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121342951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low-complexity dynamic spectrum management algorithms for digital subscriber lines 数字用户线路的低复杂度动态频谱管理算法
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518223
Paschalis Tsiaflakis, M. Moonen
Modern DSL networks suffer from crosstalk between different lines in the same cable bundle. By carefully choosing the transmit power spectra, the impact of crosstalk can be minimized leading to spectacular performance gains. This is also referred to as dynamic spectrum management (DSM). This paper presents three novel low-complexity DSM algorithms with a different level of required message-passing. This level ranges from fully autonomous and distributed to semi-centralized execution. Simulations show good performances compared to existing state-of-the-art DSM algorithms.
现代DSL网络受到同一电缆束中不同线路之间的串扰的困扰。通过仔细选择发射功率谱,可以将串扰的影响降至最低,从而获得惊人的性能增益。这也称为动态频谱管理(DSM)。本文提出了三种新的低复杂度的DSM算法,它们具有不同级别的消息传递要求。这个级别的范围从完全自主和分布式到半集中式执行。仿真结果表明,与现有最先进的DSM算法相比,该算法具有良好的性能。
{"title":"Low-complexity dynamic spectrum management algorithms for digital subscriber lines","authors":"Paschalis Tsiaflakis, M. Moonen","doi":"10.1109/ICASSP.2008.4518223","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518223","url":null,"abstract":"Modern DSL networks suffer from crosstalk between different lines in the same cable bundle. By carefully choosing the transmit power spectra, the impact of crosstalk can be minimized leading to spectacular performance gains. This is also referred to as dynamic spectrum management (DSM). This paper presents three novel low-complexity DSM algorithms with a different level of required message-passing. This level ranges from fully autonomous and distributed to semi-centralized execution. Simulations show good performances compared to existing state-of-the-art DSM algorithms.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114110894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A feature compensation approach using piecewise linear approximation of an explicit distortion model for noisy speech recognition 使用分段线性逼近显式失真模型的特征补偿方法用于噪声语音识别
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518711
Jun Du, Qiang Huo
This paper presents a new feature compensation approach to noisy speech recognition by using piecewise linear approximation (PLA) of an explicit model of environmental distortions. Two traditional approaches, namely vector Taylor series (VTS) and MAX approximations, are two special cases of our proposed approach. Formulations for maximum likelihood (ML) estimation of noise model parameters and minimum mean square error (MMSE) estimation of clean speech are derived. A hybrid approach of using different approximations for different types of noisy speech segments is also proposed. Experimental results on Aurora2 and Aurora3 databases demonstrate that the proposed approaches achieve consistently significant improvements in recognition accuracy compared to the traditional VTS-based feature compensation approach.
本文提出了一种基于环境畸变显式模型的分段线性逼近特征补偿方法。两种传统方法,即向量泰勒级数(VTS)和MAX近似,是我们提出的方法的两种特殊情况。推导了噪声模型参数的最大似然估计公式和干净语音的最小均方误差估计公式。本文还提出了一种针对不同类型的噪声语音片段使用不同近似的混合方法。在Aurora2和Aurora3数据库上的实验结果表明,与传统的基于vts的特征补偿方法相比,本文提出的方法在识别精度上取得了一致的显著提高。
{"title":"A feature compensation approach using piecewise linear approximation of an explicit distortion model for noisy speech recognition","authors":"Jun Du, Qiang Huo","doi":"10.1109/ICASSP.2008.4518711","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518711","url":null,"abstract":"This paper presents a new feature compensation approach to noisy speech recognition by using piecewise linear approximation (PLA) of an explicit model of environmental distortions. Two traditional approaches, namely vector Taylor series (VTS) and MAX approximations, are two special cases of our proposed approach. Formulations for maximum likelihood (ML) estimation of noise model parameters and minimum mean square error (MMSE) estimation of clean speech are derived. A hybrid approach of using different approximations for different types of noisy speech segments is also proposed. Experimental results on Aurora2 and Aurora3 databases demonstrate that the proposed approaches achieve consistently significant improvements in recognition accuracy compared to the traditional VTS-based feature compensation approach.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121638475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Worst- and average-case complexity of LLL lattice reduction in MIMO wireless systems MIMO无线系统中最小晶格约简的最坏情况和平均情况复杂度
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518202
J. Jaldén, D. Seethaler, G. Matz
Lattice reduction by means of the LLL algorithm has been previously suggested as a powerful preprocessing tool that allows to improve the performance of suboptimal detectors and to reduce the complexity of optimal MIMO detectors. The complexity of the LLL algorithm is often cited as polynomial in the dimension of the lattice. In this paper we argue that this statement is not correct when made in the MIMO context. Specifically, we demonstrate that in typical communication scenarios the worst-case complexity of the LLL algorithm is not even finite. For i.i.d. Rayleigh fading channels, we further prove that the average LLL complexity is polynomial and that the probability for an atypically large number of LLL iterations decays exponentially.
通过LLL算法的晶格约简已经被认为是一种强大的预处理工具,可以提高次优检测器的性能并降低最优MIMO检测器的复杂性。LLL算法的复杂度通常被引用为晶格维数的多项式。在本文中,我们认为这种说法是不正确的,当在MIMO上下文中。具体来说,我们证明了在典型的通信场景中,LLL算法的最坏情况复杂度甚至不是有限的。对于i.i.d Rayleigh衰落信道,我们进一步证明了平均LLL复杂度是多项式的,并且非典型大量LLL迭代的概率呈指数衰减。
{"title":"Worst- and average-case complexity of LLL lattice reduction in MIMO wireless systems","authors":"J. Jaldén, D. Seethaler, G. Matz","doi":"10.1109/ICASSP.2008.4518202","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518202","url":null,"abstract":"Lattice reduction by means of the LLL algorithm has been previously suggested as a powerful preprocessing tool that allows to improve the performance of suboptimal detectors and to reduce the complexity of optimal MIMO detectors. The complexity of the LLL algorithm is often cited as polynomial in the dimension of the lattice. In this paper we argue that this statement is not correct when made in the MIMO context. Specifically, we demonstrate that in typical communication scenarios the worst-case complexity of the LLL algorithm is not even finite. For i.i.d. Rayleigh fading channels, we further prove that the average LLL complexity is polynomial and that the probability for an atypically large number of LLL iterations decays exponentially.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114013674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
A convolutive mixing model for shifted double JPEG compression with application to passive image authentication 一种移位双JPEG压缩的卷积混合模型及其在被动图像认证中的应用
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4517946
Zhenhua Qu, Weiqi Luo, Jiwu Huang
The artifacts by JPEG recompression have been demonstrated to be useful in passive image authentication. In this paper, we focus on the shifted double JPEG problem, aiming at identifying if a given JPEG image has ever been compressed twice with inconsistent block segmentation. We formulated the shifted double JPEG compression (SD-JPEG) as a noisy convolutive mixing model mostly studied in blind source separation (BSS). In noise free condition, the model can be solved by directly applying the independent component analysis (ICA) method with minor constraint to the contents of natural images. In order to achieve robust identification in noisy condition, the asymmetry of the independent value map (IVM) is exploited to obtain a normalized criteria of the independency. We generate a total of 13 features to fully represent the asymmetric characteristic of the independent value map and then feed to a support vector machine (SVM) classifier. Experiment results on a set of 1000 images, with various parameter settings, demonstrated the effectiveness of our method.
通过JPEG再压缩得到的伪图像在被动图像认证中被证明是有用的。在本文中,我们专注于移位的双JPEG问题,旨在识别给定的JPEG图像是否曾经被两次压缩而块分割不一致。我们将移位双JPEG压缩(SD-JPEG)描述为一种主要用于盲源分离(BSS)的噪声卷积混合模型。在无噪声条件下,该模型可以直接应用独立分量分析(ICA)方法求解,对自然图像的内容约束较小。为了在噪声条件下实现鲁棒识别,利用独立值映射(IVM)的不对称性得到独立性的归一化准则。我们总共生成13个特征来充分表示独立值映射的不对称特征,然后将其馈送给支持向量机(SVM)分类器。在1000张不同参数设置的图像上的实验结果证明了该方法的有效性。
{"title":"A convolutive mixing model for shifted double JPEG compression with application to passive image authentication","authors":"Zhenhua Qu, Weiqi Luo, Jiwu Huang","doi":"10.1109/ICASSP.2008.4517946","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517946","url":null,"abstract":"The artifacts by JPEG recompression have been demonstrated to be useful in passive image authentication. In this paper, we focus on the shifted double JPEG problem, aiming at identifying if a given JPEG image has ever been compressed twice with inconsistent block segmentation. We formulated the shifted double JPEG compression (SD-JPEG) as a noisy convolutive mixing model mostly studied in blind source separation (BSS). In noise free condition, the model can be solved by directly applying the independent component analysis (ICA) method with minor constraint to the contents of natural images. In order to achieve robust identification in noisy condition, the asymmetry of the independent value map (IVM) is exploited to obtain a normalized criteria of the independency. We generate a total of 13 features to fully represent the asymmetric characteristic of the independent value map and then feed to a support vector machine (SVM) classifier. Experiment results on a set of 1000 images, with various parameter settings, demonstrated the effectiveness of our method.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114867628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Improved GMM-based language recognition using constrained MLLR transforms 使用约束MLLR变换改进基于gmm的语言识别
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4518568
Wade Shen, D. Reynolds
In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.
在本文中,我们描述了基于约束最大似然线性回归的特征空间变换在语言识别问题中的应用,用于信道和说话人可变性的无监督补偿。我们表明,在2005年NIST语言识别评估(LRE05)任务中,使用这种转换可以将基于gmm的基线语言识别性能提高38%。此外,cmlr的增益与其他建模增强(如声道长度归一化(VTLN))是相加的。使用判别训练得到了进一步的改进,并且表明仅使用cmlr自适应的系统比使用VTLN的系统产生了最先进的精度,并且减少了测试时间计算成本。
{"title":"Improved GMM-based language recognition using constrained MLLR transforms","authors":"Wade Shen, D. Reynolds","doi":"10.1109/ICASSP.2008.4518568","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518568","url":null,"abstract":"In this paper we describe the application of a feature-space transform based on constrained maximum likelihood linear regression for unsupervised compensation of channel and speaker variability to the language recognition problem. We show that use of such transforms can improve baseline GMM-based language recognition performance on the 2005 NIST Language Recognition Evaluation (LRE05) task by 38%. Furthermore, gains from CMLLR are additive with other modeling enhancements such as vocal tract length normalization (VTLN). Further improvement is obtained using discriminative training, and it is shown that a system using only CMLLR adaption produces state-of-the-art accuracy with decreased test-time computational cost than systems using VTLN.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124024933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Age and gender recognition for telephone applications based on GMM supervectors and support vector machines 基于GMM超向量和支持向量机的电话应用年龄和性别识别
Pub Date : 2008-05-12 DOI: 10.1109/ICASSP.2008.4517932
T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth
This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.
本文比较了两种7类年龄性别自动分类方法。第一种方法是高斯混合模型(GMMs)和通用背景模型(ubm),它以说话人识别/验证任务而闻名。分别采用EM算法和MAP自适应算法进行训练。对于第二种方法,对测试和训练集的每个说话者训练一个GMM模型。对每个模型的均值进行提取和连接,得到每个说话人的GMM超向量。然后将这些超向量用于支持向量机(SVM)。支持向量机方法采用了三种不同的核:多项式核(具有不同的多项式),RBF核和基于KL散度的线性GMM距离核。使用SVM方法,我们将识别率提高到74% (p < 0.001),并且与人类处于相同的范围内。
{"title":"Age and gender recognition for telephone applications based on GMM supervectors and support vector machines","authors":"T. Bocklet, A. Maier, Josef G. Bauer, F. Burkhardt, E. Nöth","doi":"10.1109/ICASSP.2008.4517932","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517932","url":null,"abstract":"This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian mixture models (GMMs) with universal background models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different kernels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM distance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74% (p < 0.001) and are in the same range as humans.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126238661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
期刊
2008 IEEE International Conference on Acoustics, Speech and Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1