首页 > 最新文献

2012 8th International Symposium on Chinese Spoken Language Processing最新文献

英文 中文
Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation 噪声鲁棒低声语音识别使用一个无听杂音麦克风与VTS补偿
Pub Date : 2012-12-04 DOI: 10.1109/ISCSLP.2012.6423522
Chen-Yu Yang, Georgina Brown, Liang Lu, J. Yamagishi, Simon King
In this paper, we introduce a newly-created corpus of whispered speech simultaneously recorded via a close-talking microphone and a non-audible murmur (NAM) microphone in both clean and noisy conditions. To benchmark the corpus, which has been freely released recently, experiments on automatic recognition of continuous whispered speech were conducted. When training and test conditions are matched, the NAM microphone is found to be more robust against background noise than the close-talking microphone. In mismatched conditions (noisy data, models trained on clean speech), we found that Vector Taylor Series (VTS) compensation is particularly effective for the NAM signal.
在本文中,我们介绍了一个新创建的耳语语料库,该语料库通过近距离交谈麦克风和非可听杂音(NAM)麦克风在清洁和嘈杂条件下同时录制。为了对最近免费发布的语料库进行基准测试,对连续低语语音进行了自动识别实验。当训练和测试条件相匹配时,发现NAM麦克风比近距离说话麦克风对背景噪声的鲁棒性更强。在不匹配的条件下(有噪声的数据,在干净语音上训练的模型),我们发现向量泰勒级数(VTS)补偿对NAM信号特别有效。
{"title":"Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation","authors":"Chen-Yu Yang, Georgina Brown, Liang Lu, J. Yamagishi, Simon King","doi":"10.1109/ISCSLP.2012.6423522","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423522","url":null,"abstract":"In this paper, we introduce a newly-created corpus of whispered speech simultaneously recorded via a close-talking microphone and a non-audible murmur (NAM) microphone in both clean and noisy conditions. To benchmark the corpus, which has been freely released recently, experiments on automatic recognition of continuous whispered speech were conducted. When training and test conditions are matched, the NAM microphone is found to be more robust against background noise than the close-talking microphone. In mismatched conditions (noisy data, models trained on clean speech), we found that Vector Taylor Series (VTS) compensation is particularly effective for the NAM signal.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131471966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese 基于音素平均f比的汉语说话人识别高效特征提取
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423485
Chen Zhao, Hongcui Wang, Songgun Hyon, Jianguo Wei, J. Dang
The features used for speaker recognition should have more speaker individual information while attenuating the linguistic information. In order to discard the linguistic information effectively, in this paper, we employed the phoneme mean F-ratio method to investigate the different contributions of different frequency region from the point of view of Chinese phoneme, and apply it for speaker identification. It is found that the speaker individual information depending on the phonemes is distributed in different frequency regions of speech sound. Based on the contribution rate, we extracted the new features and combined with GMM model. The experiment for speaker identification task is conducted with a King-ASR Chinese database. Compared with the MFCC feature, the identification error rate with the proposed feature was reduced by 32.94%. The results confirmed that the efficiency of the phoneme mean F-ratio method for improving speaker recognition performance for Chinese.
用于说话人识别的特征应该在衰减语言信息的同时包含更多的说话人个体信息。为了有效地剔除语言信息,本文采用音素均值f比方法,从汉语音素的角度考察不同频率区域的不同贡献,并将其应用于说话人识别。研究发现,说话人的个体信息根据音素分布在语音的不同频率区域。基于贡献率提取新特征,并与GMM模型相结合。使用King-ASR中文数据库进行说话人识别实验。与MFCC特征相比,该特征的识别错误率降低了32.94%。结果证实了音素平均f比法提高汉语说话人识别性能的有效性。
{"title":"Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese","authors":"Chen Zhao, Hongcui Wang, Songgun Hyon, Jianguo Wei, J. Dang","doi":"10.1109/ISCSLP.2012.6423485","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423485","url":null,"abstract":"The features used for speaker recognition should have more speaker individual information while attenuating the linguistic information. In order to discard the linguistic information effectively, in this paper, we employed the phoneme mean F-ratio method to investigate the different contributions of different frequency region from the point of view of Chinese phoneme, and apply it for speaker identification. It is found that the speaker individual information depending on the phonemes is distributed in different frequency regions of speech sound. Based on the contribution rate, we extracted the new features and combined with GMM model. The experiment for speaker identification task is conducted with a King-ASR Chinese database. Compared with the MFCC feature, the identification error rate with the proposed feature was reduced by 32.94%. The results confirmed that the efficiency of the phoneme mean F-ratio method for improving speaker recognition performance for Chinese.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121059385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting 结合隐马尔可夫模型和人工神经网络的一种新的置信度方法,用于有效的关键字识别
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423455
S. Leow, T. S. Lau, Alvina Goh, Han Meng Peh, Teck Khim Ng, S. Siniscalchi, Chin-Hui Lee
In this paper, we present an acoustic keyword spotter that operates in two stages, detection and verification. In the detection stage, keywords are detected in the utterances, and in the verification stage, confidence measures are used to verify the detected keywords and reject false alarms. A new confidence measure, based on phoneme models trained on an Artificial Neural Network, is used in the verification stage to reduce false alarms. We have found that this ANN-based confidence, together with existing HMM-based confidence measures, is very effective in rejecting false alarms. Experiments are performed on two Mandarin databases and our results show that the proposed method is able to significantly reduce the number of false alarms.
在本文中,我们提出了一种声学关键字侦测器,它分为侦测和验证两个阶段。在检测阶段,在话语中检测关键字,在验证阶段,使用置信度度量对检测到的关键字进行验证,并拒绝虚警。在验证阶段,采用了一种基于人工神经网络训练的音素模型的置信度方法来减少误报。我们发现这种基于人工神经网络的置信度,与现有的基于hmm的置信度度量一起,在拒绝假警报方面非常有效。在两个中文数据库上进行了实验,结果表明所提出的方法能够显著减少误报的数量。
{"title":"A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting","authors":"S. Leow, T. S. Lau, Alvina Goh, Han Meng Peh, Teck Khim Ng, S. Siniscalchi, Chin-Hui Lee","doi":"10.1109/ISCSLP.2012.6423455","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423455","url":null,"abstract":"In this paper, we present an acoustic keyword spotter that operates in two stages, detection and verification. In the detection stage, keywords are detected in the utterances, and in the verification stage, confidence measures are used to verify the detected keywords and reject false alarms. A new confidence measure, based on phoneme models trained on an Artificial Neural Network, is used in the verification stage to reduce false alarms. We have found that this ANN-based confidence, together with existing HMM-based confidence measures, is very effective in rejecting false alarms. Experiments are performed on two Mandarin databases and our results show that the proposed method is able to significantly reduce the number of false alarms.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117291743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Perceptual similarity between audio clips and feature selection for its measurement 音频片段之间的感知相似性及其度量的特征选择
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423476
Qinghua Wu, Xiao-Lei Zhang, Ping Lv, Ji Wu
In this paper, we explore the retrieval of perceptually similar audio. It focuses on finding sounds according to human perceptions. Thus such retrieval is more “human-centered” [1] than previous audio retrievals which intend to find homologous sounds. We make comprehensive use of various acoustic features to measure the perceptual similarity. Since some acoustic features may be redundant or even adverse to the similarity measurement, we propose to find a complementary and effective combination of acoustic features via SFFS (Sequential Floating Forward Selection) method. Experimental results show that LSP, MFCC, and PLP are the three most effective acoustic features. Moreover, the optimal combination of features can improve the accuracy of similarity classification by about 2% compared with the best performance of a single acoustic feature.
在本文中,我们探讨了感知相似音频的检索。它专注于根据人类的感知来寻找声音。因此,这样的检索比以前的音频检索更“以人为中心”[1],后者旨在寻找同源声音。我们综合利用各种声学特征来测量感知相似度。由于某些声学特征可能是冗余的,甚至不利于相似性测量,我们建议通过SFFS (Sequential Floating Forward Selection)方法寻找互补的有效声学特征组合。实验结果表明,LSP、MFCC和PLP是三种最有效的声学特征。此外,与单个声学特征的最佳表现相比,特征的最优组合可将相似性分类的准确率提高约2%。
{"title":"Perceptual similarity between audio clips and feature selection for its measurement","authors":"Qinghua Wu, Xiao-Lei Zhang, Ping Lv, Ji Wu","doi":"10.1109/ISCSLP.2012.6423476","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423476","url":null,"abstract":"In this paper, we explore the retrieval of perceptually similar audio. It focuses on finding sounds according to human perceptions. Thus such retrieval is more “human-centered” [1] than previous audio retrievals which intend to find homologous sounds. We make comprehensive use of various acoustic features to measure the perceptual similarity. Since some acoustic features may be redundant or even adverse to the similarity measurement, we propose to find a complementary and effective combination of acoustic features via SFFS (Sequential Floating Forward Selection) method. Experimental results show that LSP, MFCC, and PLP are the three most effective acoustic features. Moreover, the optimal combination of features can improve the accuracy of similarity classification by about 2% compared with the best performance of a single acoustic feature.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128652647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS 基于hmm的TTS改进模型聚类的交叉验证和最小生成误差
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423459
Fenglong Xie, Yi-Jian Wu, F. Soong
In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.
在基于HMM的语音合成中,上下文相关的隐马尔可夫模型(HMM)因其能够合成高度可理解且相当流畅的语音而被广泛应用。然而,由于固有的训练数据覆盖不足的问题,训练所有可能上下文的hmm是困难的,甚至是不可能的。因此,这样训练的模型可能会过度拟合,并且它们在预测测试中任何未知上下文的能力受到高度限制。近年来,交叉验证(CV)已被探索并应用于基于决策树的最大似然(ML)准则聚类,并在TTS合成中显示出更好的鲁棒性。在本文中,我们将CV推广到决策树聚类,但使用了不同的最小生成误差(MGE)准则。实验结果表明,对MGE进行泛化后的TTS综合性能优于基线系统。
{"title":"Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS","authors":"Fenglong Xie, Yi-Jian Wu, F. Soong","doi":"10.1109/ISCSLP.2012.6423459","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423459","url":null,"abstract":"In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129084301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TDOA information based vad for robust speech recognition in directional and diffuse noise field 基于TDOA信息的vad在方向性和漫漫性噪声场下的鲁棒语音识别
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423514
Kuan-Lang Huang, T. Chi
A two-microphone algorithm is proposed to improve automatic speech recognition (ASR) rates when target speech is corrupted by directional interferences and diffuse noise simultaneously. The algorithm adopts the time difference of arrival (TDOA) to suppress directional interferences and a TDOA-information based voice activity detector (VAD) to suppress diffuse noise. Simulation results show the proposed algorithm is effective in improving ASR rates in a sound field mixed with a directional interference and diffuse noise. Compared with the phase difference (PD) algorithm, the proposed method gives comparable recognition rates when facing a directional interference and much higher and more robust recognition rates when diffuse noise emerges.
为了提高目标语音同时受到方向干扰和漫射噪声干扰时的自动语音识别率,提出了一种双麦克风算法。该算法采用到达时间差(TDOA)来抑制方向干扰,采用基于TDOA信息的语音活动检测器(VAD)来抑制漫射噪声。仿真结果表明,该算法可以有效地提高定向干扰和漫射噪声混合声场中的自共振率。与相位差(PD)算法相比,该方法在有方向性干扰时具有相当的识别率,在有漫射噪声时具有更高的鲁棒性。
{"title":"TDOA information based vad for robust speech recognition in directional and diffuse noise field","authors":"Kuan-Lang Huang, T. Chi","doi":"10.1109/ISCSLP.2012.6423514","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423514","url":null,"abstract":"A two-microphone algorithm is proposed to improve automatic speech recognition (ASR) rates when target speech is corrupted by directional interferences and diffuse noise simultaneously. The algorithm adopts the time difference of arrival (TDOA) to suppress directional interferences and a TDOA-information based voice activity detector (VAD) to suppress diffuse noise. Simulation results show the proposed algorithm is effective in improving ASR rates in a sound field mixed with a directional interference and diffuse noise. Compared with the phase difference (PD) algorithm, the proposed method gives comparable recognition rates when facing a directional interference and much higher and more robust recognition rates when diffuse noise emerges.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127040670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Power-normalized PLP (PNPLP) feature for robust speech recognition 功率归一化PLP (PNPLP)特征用于鲁棒语音识别
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423529
Lichun Fan, Dengfeng Ke, Xiaoyin Fu, Shixiang Lu, Bo Xu
In this paper, we first review several approaches of feature extraction algorithms in robust speech recognition, e.g. Mel frequency cepstral coefficients (MFCC) [1], perceptual linear prediction (PLP) [2] and power-normalized cepstral coefficients (PNCC) [3]. A new feature extraction algorithm for noise robust speech recognition is proposed, in which medium-time processing works as noise suppression module. The details will be described to show that the algorithm is superior. The experimental results prove that our proposed method significantly outperforms state-of-the-art algorithms.
在本文中,我们首先回顾了鲁棒语音识别中的几种特征提取算法,如Mel频率倒谱系数(MFCC)[1]、感知线性预测(PLP)[2]和功率归一化倒谱系数(PNCC)[3]。提出了一种新的噪声鲁棒语音识别特征提取算法,该算法以中间时间处理作为噪声抑制模块。细节将被描述,以表明该算法是优越的。实验结果证明,我们提出的方法明显优于目前最先进的算法。
{"title":"Power-normalized PLP (PNPLP) feature for robust speech recognition","authors":"Lichun Fan, Dengfeng Ke, Xiaoyin Fu, Shixiang Lu, Bo Xu","doi":"10.1109/ISCSLP.2012.6423529","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423529","url":null,"abstract":"In this paper, we first review several approaches of feature extraction algorithms in robust speech recognition, e.g. Mel frequency cepstral coefficients (MFCC) [1], perceptual linear prediction (PLP) [2] and power-normalized cepstral coefficients (PNCC) [3]. A new feature extraction algorithm for noise robust speech recognition is proposed, in which medium-time processing works as noise suppression module. The details will be described to show that the algorithm is superior. The experimental results prove that our proposed method significantly outperforms state-of-the-art algorithms.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133125397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Break index labeling of mandarin text via syntactic-to-prosodic tree mapping 基于句法-韵律树映射的汉语断续索引标注
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423468
Xiaotian Zhang, Yao Qian, Hai Zhao, F. Soong
In this study, we investigate the break index labeling problem with a syntactic-to-prosodic structure conversion. The statistical relationship between the mapped syntactic tree structure and prosodic tree structure of sentences in the training set is used to generate a Synchronous Tree Substitution Grammar (STSG) which can describe the probabilistic mapping (substitution) rules between them. For a given test sentence and the corresponding parsed syntactic tree structure, thus generated STSG can convert the syntactic tree to a prosodic tree statistically. We compare the labeling results with other approaches and show the probabilistic mapping can indeed benefit break index labeling performance.
在本研究中,我们研究了基于句法-韵律结构转换的断续索引标注问题。利用训练集中句子的映射句法树结构和韵律树结构之间的统计关系,生成同步树替换语法(STSG),该语法可以描述它们之间的概率映射(替换)规则。对于给定的测试句子和相应的语法树结构,由此生成的STSG可以在统计上将句法树转换为韵律树。我们将标记结果与其他方法进行了比较,并表明概率映射确实可以提高中断索引标记的性能。
{"title":"Break index labeling of mandarin text via syntactic-to-prosodic tree mapping","authors":"Xiaotian Zhang, Yao Qian, Hai Zhao, F. Soong","doi":"10.1109/ISCSLP.2012.6423468","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423468","url":null,"abstract":"In this study, we investigate the break index labeling problem with a syntactic-to-prosodic structure conversion. The statistical relationship between the mapped syntactic tree structure and prosodic tree structure of sentences in the training set is used to generate a Synchronous Tree Substitution Grammar (STSG) which can describe the probabilistic mapping (substitution) rules between them. For a given test sentence and the corresponding parsed syntactic tree structure, thus generated STSG can convert the syntactic tree to a prosodic tree statistically. We compare the labeling results with other approaches and show the probabilistic mapping can indeed benefit break index labeling performance.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117354779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Discriminant local information distance preserving projection for text-independent speaker recognition 独立文本说话人识别的判别局部信息距离保持投影
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423466
Liang He, Jia Li
A novel method is presented based on a statistical manifold for text-independent speaker recognition. After feature extraction, speaker recognition becomes a sequence classification problem. By discarding time information, the core task is the comparison of multiple sample sets. Each set is assumed to be governed by a probability density function (PDF). We estimate the PDFs and place the estimated statistical models on a statistical manifold. Fisher information distance is applied to compute distance between adjacent PDFs. Discriminant local preserving projection is used to push adjacent PDFs which belong to different classes apart to further improve the recognition accuracy. Experiments were carried out on the NIST SRE08 tel-tel database. Our presented method gave an excellent performance.
提出了一种基于统计流形的独立文本说话人识别方法。经过特征提取,说话人识别成为一个序列分类问题。通过丢弃时间信息,核心任务是多个样本集的比较。假设每个集合都由概率密度函数(PDF)控制。我们估计pdf并将估计的统计模型放在统计流形上。Fisher信息距离用于计算相邻pdf之间的距离。采用判别局部保留投影将相邻的不同类别的pdf推开,进一步提高识别精度。实验在NIST的SRE08电话-电话数据库上进行。我们提出的方法取得了很好的效果。
{"title":"Discriminant local information distance preserving projection for text-independent speaker recognition","authors":"Liang He, Jia Li","doi":"10.1109/ISCSLP.2012.6423466","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423466","url":null,"abstract":"A novel method is presented based on a statistical manifold for text-independent speaker recognition. After feature extraction, speaker recognition becomes a sequence classification problem. By discarding time information, the core task is the comparison of multiple sample sets. Each set is assumed to be governed by a probability density function (PDF). We estimate the PDFs and place the estimated statistical models on a statistical manifold. Fisher information distance is applied to compute distance between adjacent PDFs. Discriminant local preserving projection is used to push adjacent PDFs which belong to different classes apart to further improve the recognition accuracy. Experiments were carried out on the NIST SRE08 tel-tel database. Our presented method gave an excellent performance.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"482 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121160930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A comparative study of fMPE and RDLT approaches to LVCSR fMPE与RDLT方法在LVCSR中的比较研究
Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423511
Jian Xu, Zhijie Yan, Qiang Huo
This paper presents a comparative study of two discriminatively trained feature transform approaches, namely feature-space minimum phone error (fMPE) and region-dependent linear transform (RDLT), to large vocabulary continuous speech recognition (LVCSR). Experiments are performed on an LVCSR task of conversational telephone speech transcription using about 2,000 hours training data. Starting from a maximum likelihood (ML) trained GMM-HMM based baseline system, recognition accuracy and run-time efficiency of different variants of the above two methods are evaluated, and a specific RDLT approach is identified and recommended for deployment in LVCSR applications.
本文对两种判别训练的特征变换方法——特征空间最小电话误差(fMPE)和区域相关线性变换(RDLT)在大词汇量连续语音识别(LVCSR)中的应用进行了比较研究。实验进行了LVCSR任务会话电话语音转录使用约2000小时的训练数据。从基于最大似然(ML)训练的GMM-HMM基线系统出发,评估了上述两种方法的不同变体的识别精度和运行时效率,并确定了一种特定的RDLT方法,并推荐了在LVCSR应用中部署的方法。
{"title":"A comparative study of fMPE and RDLT approaches to LVCSR","authors":"Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ISCSLP.2012.6423511","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423511","url":null,"abstract":"This paper presents a comparative study of two discriminatively trained feature transform approaches, namely feature-space minimum phone error (fMPE) and region-dependent linear transform (RDLT), to large vocabulary continuous speech recognition (LVCSR). Experiments are performed on an LVCSR task of conversational telephone speech transcription using about 2,000 hours training data. Starting from a maximum likelihood (ML) trained GMM-HMM based baseline system, recognition accuracy and run-time efficiency of different variants of the above two methods are evaluated, and a specific RDLT approach is identified and recommended for deployment in LVCSR applications.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"38 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114033087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 8th International Symposium on Chinese Spoken Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1