Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423460
Ying-Lang Chang, Jen-Tzung Chien
Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.
{"title":"Bayesian nonparametric language models","authors":"Ying-Lang Chang, Jen-Tzung Chien","doi":"10.1109/ISCSLP.2012.6423460","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423460","url":null,"abstract":"Backoff smoothing and topic modeling are crucial issues in n-gram language model. This paper presents a Bayesian non-parametric learning approach to tackle these two issues. We develop a topic-based language model where the numbers of topics and n-grams are automatically determined from data. To cope with this model selection problem, we introduce the nonparametric priors for topics and backoff n-grams. The infinite language models are constructed through the hierarchical Dirichlet process compound Pitman-Yor (PY) process. We develop the topic-based hierarchical PY language model (THPY-LM) with power-law behavior. This model can be simplified to the hierarchical PY (HPY) LM by disregarding the topic information and also the modified Kneser-Ney (MKN) LM by further disregarding the Bayesian treatment. In the experiments, the proposed THPY-LM outperforms state-of-art methods using MKN-LM and HPY-LM.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121702641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423534
Ye Tian, Jia Jia, Yongxin Wang, Lianhong Cai
Chinese Mandarin is a tonal language. Tone perception ability of people with sensorineural hearing loss (SNHL) is often weaker than normal people. To help the SNHL people better perceive and distinguish tone information in Chinese speech, we focus on real-time tone enhancement method for mandarin continuous speeches. In this paper, based on the experimental investigation on the acoustic features most related to tone perception, we propose a practical tone enhancing model which employs the unified features independent of Chinese tonal patterns. Using this model, we further implement a real-time tone enhancement method which can avoid syllable segmentation and tonal pattern recognition. By the tone identification test for the normal and SNHL people under both quiet and noisy backgrounds, it is found that the enhanced speeches with the proposed method gains an average 5% higher correct rate compared to original speeches. And the time delay of the enhancement method can be controlled within 800ms, which can be further used in hearing aids to benefit the SNHL people in their daily life.
{"title":"A real-time tone enhancement method for continuous Mandarin speeches","authors":"Ye Tian, Jia Jia, Yongxin Wang, Lianhong Cai","doi":"10.1109/ISCSLP.2012.6423534","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423534","url":null,"abstract":"Chinese Mandarin is a tonal language. Tone perception ability of people with sensorineural hearing loss (SNHL) is often weaker than normal people. To help the SNHL people better perceive and distinguish tone information in Chinese speech, we focus on real-time tone enhancement method for mandarin continuous speeches. In this paper, based on the experimental investigation on the acoustic features most related to tone perception, we propose a practical tone enhancing model which employs the unified features independent of Chinese tonal patterns. Using this model, we further implement a real-time tone enhancement method which can avoid syllable segmentation and tonal pattern recognition. By the tone identification test for the normal and SNHL people under both quiet and noisy backgrounds, it is found that the enhanced speeches with the proposed method gains an average 5% higher correct rate compared to original speeches. And the time delay of the enhancement method can be controlled within 800ms, which can be further used in hearing aids to benefit the SNHL people in their daily life.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122422139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423535
Chiu-yu Tseng, Chao-yu Su
In addition to discourse association and assuming that allocation of key information is an important feature of prosodic expressiveness of continuous speech, the common accentuation patterns across 3 Mandarin speech genres through 4 degrees of perceived emphases are derived. Using frequency count as another control, it is found that only 6 types of emphasis patterns are needed account for 70% of the speech data regardless of genre. The 6 emphasis types are further compared for the distribution of (1) discourse units and emphasis tokens by speech genre, (2) emphasis pattern by phrase and (3) with respect to discourse positions to see if genre-specific features could be found. Results reveal that genre-dependent features can also be accounted for. In addition, individual genre properties are found to also be correlated with phrase length and specific emphasis patterns.
{"title":"Information allocation and prosodic expressiveness in continuous speech: A Mandarin cross-genre analysis","authors":"Chiu-yu Tseng, Chao-yu Su","doi":"10.1109/ISCSLP.2012.6423535","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423535","url":null,"abstract":"In addition to discourse association and assuming that allocation of key information is an important feature of prosodic expressiveness of continuous speech, the common accentuation patterns across 3 Mandarin speech genres through 4 degrees of perceived emphases are derived. Using frequency count as another control, it is found that only 6 types of emphasis patterns are needed account for 70% of the speech data regardless of genre. The 6 emphasis types are further compared for the distribution of (1) discourse units and emphasis tokens by speech genre, (2) emphasis pattern by phrase and (3) with respect to discourse positions to see if genre-specific features could be found. Results reveal that genre-dependent features can also be accounted for. In addition, individual genre properties are found to also be correlated with phrase length and specific emphasis patterns.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124163753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423480
Yong Xu, Wu Guo, Shan Su, Lirong Dai
The search for out of vocabulary (OOV) query terms in spoken term detection (STD) task is addressed in this paper. The phone level fragment with word-position marker is naturally adopted as the speech recognition decoding unit. Then the triphone confusion matrix (TriCM) is used to expand the query space to compensate for speech recognition errors. And we also propose a new approach to construct triphone confusion matrix using a smoothing method similar with the Katz method to solve the data sparseness problem. Experimental result on the NIST STD06 eval-set conversational telephone speech (CTS) corpus indicates that triphone confusion matrix can provide a relative improvement of 12% in actual term weighted value (ATWV).
{"title":"Spoken term detection for OOV terms based on triphone confusion matrix","authors":"Yong Xu, Wu Guo, Shan Su, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423480","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423480","url":null,"abstract":"The search for out of vocabulary (OOV) query terms in spoken term detection (STD) task is addressed in this paper. The phone level fragment with word-position marker is naturally adopted as the speech recognition decoding unit. Then the triphone confusion matrix (TriCM) is used to expand the query space to compensate for speech recognition errors. And we also propose a new approach to construct triphone confusion matrix using a smoothing method similar with the Katz method to solve the data sparseness problem. Experimental result on the NIST STD06 eval-set conversational telephone speech (CTS) corpus indicates that triphone confusion matrix can provide a relative improvement of 12% in actual term weighted value (ATWV).","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127393849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423541
S. Dhakhwa, J. Allwood
Several minority languages are on the verge of extinction in Nepal, especially when they don't have a generally accepted writing system and occur in an area where Nepali (the official language) is predominantly used. Lohorung is an example, which is spoken among the Lohroung Rai communities of Sankhuwasabha, a hilly district of eastern Nepal. Older generations of Lohorung are experts in Lohorung but they have limitations in reading and writing English or Nepali. The documentation of Lohorung and other similar endangered languages is important. If the right tools and techniques are used, we believe that self documentation is one of the best ways, to document a language. We have developed an online platform which community members can use to collaboratively self document their language. The platform is a multimodal dictionary authoring and browsing tool and it has been developed with a focus on usability, ease of use and productivity.
{"title":"Self documentation of endangered languages","authors":"S. Dhakhwa, J. Allwood","doi":"10.1109/ISCSLP.2012.6423541","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423541","url":null,"abstract":"Several minority languages are on the verge of extinction in Nepal, especially when they don't have a generally accepted writing system and occur in an area where Nepali (the official language) is predominantly used. Lohorung is an example, which is spoken among the Lohroung Rai communities of Sankhuwasabha, a hilly district of eastern Nepal. Older generations of Lohorung are experts in Lohorung but they have limitations in reading and writing English or Nepali. The documentation of Lohorung and other similar endangered languages is important. If the right tools and techniques are used, we believe that self documentation is one of the best ways, to document a language. We have developed an online platform which community members can use to collaboratively self document their language. The platform is a multimodal dictionary authoring and browsing tool and it has been developed with a focus on usability, ease of use and productivity.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130474597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423546
Ai-jun Li, Qiang Fang, Yuan Jia, J. Dang
This study attempts to evaluate the validity of PENTA model in simulating the emotional intonation of Mandarin Chinese. Based on the previous analysis on the Mandarin emotional intonation, it suggested that two target tones are needed in some conditions, such as the condition for the successive addition boundary tone (SUABT). It is a new encoding scheme in Chinese PENTA model, which means one syllable should have two tonal targets, one for lexical tone and the other is for the expressive tone. The numeric and perceptual assessments on the performances of simulating Mandarin Chinese emotional intonation with the new encoding schemes are evaluated. The results indicated that two targets are necessary and efficient to simulate boundary tones, in other words, the new encoding scheme is required to realize some kinds of emotional boundary tones by setting two target tones to convey expressive emotions.
{"title":"More targets? Simulating emotional intonation of mandarin with PENTA","authors":"Ai-jun Li, Qiang Fang, Yuan Jia, J. Dang","doi":"10.1109/ISCSLP.2012.6423546","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423546","url":null,"abstract":"This study attempts to evaluate the validity of PENTA model in simulating the emotional intonation of Mandarin Chinese. Based on the previous analysis on the Mandarin emotional intonation, it suggested that two target tones are needed in some conditions, such as the condition for the successive addition boundary tone (SUABT). It is a new encoding scheme in Chinese PENTA model, which means one syllable should have two tonal targets, one for lexical tone and the other is for the expressive tone. The numeric and perceptual assessments on the performances of simulating Mandarin Chinese emotional intonation with the new encoding schemes are evaluated. The results indicated that two targets are necessary and efficient to simulate boundary tones, in other words, the new encoding scheme is required to realize some kinds of emotional boundary tones by setting two target tones to convey expressive emotions.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125135184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423528
Chen-Yu Chiang, S. Siniscalchi, Yih-Ru Wang, Sin-Horng Chen, Chin-Hui Lee
We present a cross-language knowledge integration framework to improve the performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, cross-lingual attribute detectors trained with an American English corpus (WSJ0) are utilized to verify and rescore hypothesized Mandarin syllables in word lattices obtained with state-of-the-art systems. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling technique using a Mandarin corpus (TCC300) are used in lattice rescoring. Experimental results on Mandarin syllable, character and word recognition with the TCC300 corpus show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. It also demonstrates a potential of utilizing results from cross-lingual attribute detectors as a language-universal frontend for automatic speech recognition.
{"title":"A study on cross-language knowledge integration in Mandarin LVCSR","authors":"Chen-Yu Chiang, S. Siniscalchi, Yih-Ru Wang, Sin-Horng Chen, Chin-Hui Lee","doi":"10.1109/ISCSLP.2012.6423528","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423528","url":null,"abstract":"We present a cross-language knowledge integration framework to improve the performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, cross-lingual attribute detectors trained with an American English corpus (WSJ0) are utilized to verify and rescore hypothesized Mandarin syllables in word lattices obtained with state-of-the-art systems. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling technique using a Mandarin corpus (TCC300) are used in lattice rescoring. Experimental results on Mandarin syllable, character and word recognition with the TCC300 corpus show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. It also demonstrates a potential of utilizing results from cross-lingual attribute detectors as a language-universal frontend for automatic speech recognition.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133755937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423519
Y. Kanai, M. Unoki
Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.
{"title":"Robust voice activity detection using empirical mode decomposition and modulation spectrum analysis","authors":"Y. Kanai, M. Unoki","doi":"10.1109/ISCSLP.2012.6423519","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423519","url":null,"abstract":"Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G.729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"64 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132836777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423488
H. Ding, Tan Lee, I. Soon
Among all the existing objective measures, few are able to give a clearly specific indication on speech distortion or noise reduction although speech distortion and noise reduction are two key metrics to evaluate the enhanced speech quality. In this paper, two objective measurement tools are proposed to separately evaluate the capability of a speech enhancement filter in terms of recovering the clean speech and reducing the noise. Several common speech enhancement algorithms are evaluated by these objective measures as well as subjective listening test. Correlations between the results of objective measure and subjective measure clearly show the effectiveness of the proposed objective measures in evaluating the quality of enhanced speech signals.
{"title":"Two objective measures for speech distortion and noise reduction evaluation of enhanced speech signals","authors":"H. Ding, Tan Lee, I. Soon","doi":"10.1109/ISCSLP.2012.6423488","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423488","url":null,"abstract":"Among all the existing objective measures, few are able to give a clearly specific indication on speech distortion or noise reduction although speech distortion and noise reduction are two key metrics to evaluate the enhanced speech quality. In this paper, two objective measurement tools are proposed to separately evaluate the capability of a speech enhancement filter in terms of recovering the clean speech and reducing the noise. Several common speech enhancement algorithms are evaluated by these objective measures as well as subjective listening test. Correlations between the results of objective measure and subjective measure clearly show the effectiveness of the proposed objective measures in evaluating the quality of enhanced speech signals.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124357906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/ISCSLP.2012.6423509
C. Leung, B. Ma, Haizhou Li
In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.
{"title":"Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers","authors":"C. Leung, B. Ma, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423509","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423509","url":null,"abstract":"In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116409523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}