Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373326
U. Chaudhari, M. Picheny
We study the use of Support Vector Machines (SVM) for detecting the occurrence of articulatory features in speech audio data and using the information contained in the detector outputs to improve phone and speech recognition. Our expectation is that an SVM should be able to appropriately model the separation of the classes which may have complex distributions in feature space. We show that performance improves markedly when using discriminatively trained speaker dependent parameters for the SVM inputs, and compares quite well to results in the literature using other classifiers, namely Artificial Neural Networks (ANN). Further, we show that the resulting detector outputs can be successfully integrated into a state of the art speech recognition system, with consequent performance gains. Notably, we test our system on English broadcast news data from dev04f.
{"title":"Articulatory feature detection with Support Vector Machines for integration into ASR and phone recognition","authors":"U. Chaudhari, M. Picheny","doi":"10.1109/ASRU.2009.5373326","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373326","url":null,"abstract":"We study the use of Support Vector Machines (SVM) for detecting the occurrence of articulatory features in speech audio data and using the information contained in the detector outputs to improve phone and speech recognition. Our expectation is that an SVM should be able to appropriately model the separation of the classes which may have complex distributions in feature space. We show that performance improves markedly when using discriminatively trained speaker dependent parameters for the SVM inputs, and compares quite well to results in the literature using other classifiers, namely Artificial Neural Networks (ANN). Further, we show that the resulting detector outputs can be successfully integrated into a state of the art speech recognition system, with consequent performance gains. Notably, we test our system on English broadcast news data from dev04f.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116095019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373547
Tara N. Sainath
Most speech recognizers do not differentiate between reliable and unreliable portions of the speech signal during search. As a result, most of the search effort is concentrated in unreliable areas. Island-driven search addresses this problem by first identifying reliable islands and directing the search out from these islands towards unreliable gaps. In this paper, we develop a technique to detect islands from knowledge of hypothesized broad phonetic classes (BPCs). Using this island/gap knowledge, we explore a method to prune the search space to limit computational effort in unreliable areas. In addition, we also investigate scoring less detailed BPC models in gap regions and more detailed phonetic models in islands. Experiments on both small and large scale vocabulary tasks indicate that our island-driven search strategy results in an improvement in recognition accuracy and computation time.
{"title":"Island-driven search using broad phonetic classes","authors":"Tara N. Sainath","doi":"10.1109/ASRU.2009.5373547","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373547","url":null,"abstract":"Most speech recognizers do not differentiate between reliable and unreliable portions of the speech signal during search. As a result, most of the search effort is concentrated in unreliable areas. Island-driven search addresses this problem by first identifying reliable islands and directing the search out from these islands towards unreliable gaps. In this paper, we develop a technique to detect islands from knowledge of hypothesized broad phonetic classes (BPCs). Using this island/gap knowledge, we explore a method to prune the search space to limit computational effort in unreliable areas. In addition, we also investigate scoring less detailed BPC models in gap regions and more detailed phonetic models in islands. Experiments on both small and large scale vocabulary tasks indicate that our island-driven search strategy results in an improvement in recognition accuracy and computation time.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130078332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373248
Vladimir Magdin, Hui Jiang
This paper presents a novel discriminative training algorithm for n-gram language models for use in large vocabulary continuous speech recognition. The algorithm uses Maximum Mutual Information Estimation (MMIE) to build an objective function that involves a metric computed between correct transcriptions and their competing hypotheses, which are encoded as word graphs generated from the Viterbi decoding process. The nonlinear MMIE objective function is approximated by a linear one using an EM-style auxiliary function, thus converting the discriminative training of n-gram language models into a linear programing problem, which can be efficiently solved by many convex optimization tools. Experimental results on the SPINE1 speech recognition corpus have shown that the proposed discriminative training method can outperform the conventional discounting-based maximum likelihood estimation methods. A relative reduction in word error rate of close to 3% has been observed on the SPINE1 speech recognition task.
{"title":"Discriminative training of n-gram language models for speech recognition via linear programming","authors":"Vladimir Magdin, Hui Jiang","doi":"10.1109/ASRU.2009.5373248","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373248","url":null,"abstract":"This paper presents a novel discriminative training algorithm for n-gram language models for use in large vocabulary continuous speech recognition. The algorithm uses Maximum Mutual Information Estimation (MMIE) to build an objective function that involves a metric computed between correct transcriptions and their competing hypotheses, which are encoded as word graphs generated from the Viterbi decoding process. The nonlinear MMIE objective function is approximated by a linear one using an EM-style auxiliary function, thus converting the discriminative training of n-gram language models into a linear programing problem, which can be efficiently solved by many convex optimization tools. Experimental results on the SPINE1 speech recognition corpus have shown that the proposed discriminative training method can outperform the conventional discounting-based maximum likelihood estimation methods. A relative reduction in word error rate of close to 3% has been observed on the SPINE1 speech recognition task.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122881868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373401
Puyang Xu, D. Karakos, S. Khudanpur
A novel self-supervised discriminative training method for estimating language models for automatic speech recognition (ASR) is proposed. Unlike traditional discriminative training methods that require transcribed speech, only untranscribed speech and a large text corpus is required. An exponential form is assumed for the language model, as done in maximum entropy estimation, but the model is trained from the text using a discriminative criterion that targets word confusions actually witnessed in first-pass ASR output lattices. Specifically, model parameters are estimated to maximize the likelihood ratio between words w in the text corpus and w's cohorts in the test speech, i.e. other words that w competes with in the test lattices. Empirical results are presented to demonstrate statistically significant improvements over a 4-gram language model on a large vocabulary ASR task.
{"title":"Self-supervised discriminative training of statistical language models","authors":"Puyang Xu, D. Karakos, S. Khudanpur","doi":"10.1109/ASRU.2009.5373401","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373401","url":null,"abstract":"A novel self-supervised discriminative training method for estimating language models for automatic speech recognition (ASR) is proposed. Unlike traditional discriminative training methods that require transcribed speech, only untranscribed speech and a large text corpus is required. An exponential form is assumed for the language model, as done in maximum entropy estimation, but the model is trained from the text using a discriminative criterion that targets word confusions actually witnessed in first-pass ASR output lattices. Specifically, model parameters are estimated to maximize the likelihood ratio between words w in the text corpus and w's cohorts in the test speech, i.e. other words that w competes with in the test lattices. Empirical results are presented to demonstrate statistically significant improvements over a 4-gram language model on a large vocabulary ASR task.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373398
Wooil Kim, J. Hansen
This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method [1]. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.
{"title":"Mask estimation employing Posterior-based Representative Mean for missing-feature speech recognition with time-varying background noise","authors":"Wooil Kim, J. Hansen","doi":"10.1109/ASRU.2009.5373398","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373398","url":null,"abstract":"This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method [1]. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123704436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5372928
Richard Dufour, Y. Estève, P. Deléglise, Frédéric Béchet
Processing spontaneous speech is one of the many challenges that automatic speech recognition (ASR) systems have to deal with. The main evidences characterizing spontaneous speech are disfluencies (filled pause, repetition, repair and false start) and many studies have focused on the detection and the correction of these disfluencies. In this study we define spontaneous speech as unprepared speech, in opposition to prepared speech where utterances contain well-formed sentences close to those that can be found in written documents. Disfluencies are of course very good indicators of unprepared speech, however they are not the only ones: ungrammaticality and language register are also important as well as prosodic patterns. This paper proposes a set of acoustic and linguistic features that can be used for characterizing and detecting spontaneous speech segments from large audio databases. More, we introduce a strategy that takes advantage of a global classification procfalseess using a probabilistic model which significantly improves the spontaneous speech detection.
{"title":"Local and global models for spontaneous speech segment detection and characterization","authors":"Richard Dufour, Y. Estève, P. Deléglise, Frédéric Béchet","doi":"10.1109/ASRU.2009.5372928","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372928","url":null,"abstract":"Processing spontaneous speech is one of the many challenges that automatic speech recognition (ASR) systems have to deal with. The main evidences characterizing spontaneous speech are disfluencies (filled pause, repetition, repair and false start) and many studies have focused on the detection and the correction of these disfluencies. In this study we define spontaneous speech as unprepared speech, in opposition to prepared speech where utterances contain well-formed sentences close to those that can be found in written documents. Disfluencies are of course very good indicators of unprepared speech, however they are not the only ones: ungrammaticality and language register are also important as well as prosodic patterns. This paper proposes a set of acoustic and linguistic features that can be used for characterizing and detecting spontaneous speech segments from large audio databases. More, we introduce a strategy that takes advantage of a global classification procfalseess using a probabilistic model which significantly improves the spontaneous speech detection.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"127 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114035570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373470
H. Kuo, L. Mangu, Ahmad Emami, I. Zitouni, Young-suk Lee
We report word error rate improvements with syntactic features using a neural probabilistic language model through N-best re-scoring. The syntactic features we use include exposed head words and their non-terminal labels both before and after the predicted word. Neural network LMs generalize better to unseen events by modeling words and other context features in continuous space. They are suitable for incorporating many different types of features, including syntactic features, where there is no pre-defined back-off order. We choose an N-best re-scoring framework to be able to take full advantage of the complete parse tree of the entire sentence. Using syntactic features, along with morphological features, improves the word error rate (WER) by up to 5.5% relative, from 9.4% to 8.6%, on the latest GALE evaluation test set.
{"title":"Syntactic features for Arabic speech recognition","authors":"H. Kuo, L. Mangu, Ahmad Emami, I. Zitouni, Young-suk Lee","doi":"10.1109/ASRU.2009.5373470","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373470","url":null,"abstract":"We report word error rate improvements with syntactic features using a neural probabilistic language model through N-best re-scoring. The syntactic features we use include exposed head words and their non-terminal labels both before and after the predicted word. Neural network LMs generalize better to unseen events by modeling words and other context features in continuous space. They are suitable for incorporating many different types of features, including syntactic features, where there is no pre-defined back-off order. We choose an N-best re-scoring framework to be able to take full advantage of the complete parse tree of the entire sentence. Using syntactic features, along with morphological features, improves the word error rate (WER) by up to 5.5% relative, from 9.4% to 8.6%, on the latest GALE evaluation test set.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115876611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373284
Florian Müller, Eugene Belilovsky, A. Mertins
A feature extraction method is presented that is robust against vocal tract length changes. It uses the generalized cyclic transformations primarily used within the field of pattern recognition. In matching training and testing conditions the resulting accuracies are comparable to the ones of MFCCs. However, in mismatching training and testing conditions with respect to the mean vocal tract length the presented features significantly outperform the MFCCs.
{"title":"Generalized cyclic transformations in speaker-independent speech recognition","authors":"Florian Müller, Eugene Belilovsky, A. Mertins","doi":"10.1109/ASRU.2009.5373284","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373284","url":null,"abstract":"A feature extraction method is presented that is robust against vocal tract length changes. It uses the generalized cyclic transformations primarily used within the field of pattern recognition. In matching training and testing conditions the resulting accuracies are comparable to the ones of MFCCs. However, in mismatching training and testing conditions with respect to the mean vocal tract length the presented features significantly outperform the MFCCs.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373299
Y. Oh, H. Kim
In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.
{"title":"MLLR/MAP adaptation using pronunciation variation for non-native speech recognition","authors":"Y. Oh, H. Kim","doi":"10.1109/ASRU.2009.5373299","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373299","url":null,"abstract":"In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115062302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373311
Kris Demuynck, Antti Puurula, Dirk Van Compernolle, P. Wambacq
This paper describes the ESAT 2008 Broadcast News transcription system for the N-Best 2008 benchmark, developed in part for testing the recent SPRAAK Speech Recognition Toolkit. ESAT system was developed for the Southern Dutch Broadcast News subtask of N-Best using standard methods of modern speech recognition. A combination of improvements were made in commonly overlooked areas such as text normalization, pronunciation modeling, lexicon selection and morphological modeling, virtually solving the out-of-vocabulary (OOV) problem for Dutch by reducing OOV-rate to 0.06% on the N-Best development data and 0.23% on the evaluation data. Recognition experiments were run with several configurations comparing one-pass vs. two-pass decoding, high-order vs. low-order n-gram models, lexicon sizes and different types of morphological modeling. The system achieved 7.23% word error rate (WER) on the broadcast news development data and 20.3% on the much more difficult evaluation data of N-Best.
{"title":"The ESAT 2008 system for N-Best Dutch speech recognition benchmark","authors":"Kris Demuynck, Antti Puurula, Dirk Van Compernolle, P. Wambacq","doi":"10.1109/ASRU.2009.5373311","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373311","url":null,"abstract":"This paper describes the ESAT 2008 Broadcast News transcription system for the N-Best 2008 benchmark, developed in part for testing the recent SPRAAK Speech Recognition Toolkit. ESAT system was developed for the Southern Dutch Broadcast News subtask of N-Best using standard methods of modern speech recognition. A combination of improvements were made in commonly overlooked areas such as text normalization, pronunciation modeling, lexicon selection and morphological modeling, virtually solving the out-of-vocabulary (OOV) problem for Dutch by reducing OOV-rate to 0.06% on the N-Best development data and 0.23% on the evaluation data. Recognition experiments were run with several configurations comparing one-pass vs. two-pass decoding, high-order vs. low-order n-gram models, lexicon sizes and different types of morphological modeling. The system achieved 7.23% word error rate (WER) on the broadcast news development data and 20.3% on the much more difficult evaluation data of N-Best.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122538037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}