Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430103
Ciro Martins, A. Teixeira, J. Neto
When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.
{"title":"Dynamic language modeling for a daily broadcast news transcription system","authors":"Ciro Martins, A. Teixeira, J. Neto","doi":"10.1109/ASRU.2007.4430103","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430103","url":null,"abstract":"When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"386 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129789070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430130
Yan Yin, Hui Jiang
In this paper, we study a new semidefinite programming (SDP) formulation to improve optimization efficiency for large margin estimation (LME) of HMMs in speech recognition. We re-formulate the same LME problem as smaller-scale SDP problems to speed up the SDP-based LME training, especially for large model sets. In the new formulation, instead of building the SDP problem from a single huge variable matrix, we consider to formulate the SDP problem based on many small independent variable matrices, each of which is built separately from a Gaussian mean vector. Moreover, we propose to further decompose feature vectors and Gaussian mean vectors according to static, delta and accelerate components to build even more compact variable matrices. This method can significantly reduce the total number of free variables and result in much smaller SDP problem even for the same model set. The proposed new LME/SDP methods have been evaluated on a connected digit string recognition task using the TIDIGITS database. Experimental results show that it can significantly improve optimization efficiency (about 30-50 times faster for large model sets) and meanwhile it can provide slightly better optimization accuracy and recognition performance than our previous SDP formulation.
{"title":"A compact semidefinite programming (SDP) formulation for large margin estimation of HMMS in speech recognition","authors":"Yan Yin, Hui Jiang","doi":"10.1109/ASRU.2007.4430130","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430130","url":null,"abstract":"In this paper, we study a new semidefinite programming (SDP) formulation to improve optimization efficiency for large margin estimation (LME) of HMMs in speech recognition. We re-formulate the same LME problem as smaller-scale SDP problems to speed up the SDP-based LME training, especially for large model sets. In the new formulation, instead of building the SDP problem from a single huge variable matrix, we consider to formulate the SDP problem based on many small independent variable matrices, each of which is built separately from a Gaussian mean vector. Moreover, we propose to further decompose feature vectors and Gaussian mean vectors according to static, delta and accelerate components to build even more compact variable matrices. This method can significantly reduce the total number of free variables and result in much smaller SDP problem even for the same model set. The proposed new LME/SDP methods have been evaluated on a connected digit string recognition task using the TIDIGITS database. Experimental results show that it can significantly improve optimization efficiency (about 30-50 times faster for large model sets) and meanwhile it can provide slightly better optimization accuracy and recognition performance than our previous SDP formulation.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430181
Teppei Nakano, S. Fujie, Tetsunori Kobayashi
This paper presents an extension framework for a speech recognition system. This framework is designed to use "proxy-agent," a software component located between applications, speech recognition engines, and input devices. By taking advantage of its structural characteristics, proxy-agent can provide supplementary services for speech recognition systems as well as user extensions. A monitoring capability, a feedback capability, and an extension capability are implemented and presented in this paper. For the first prototype, we developed a data collection application and an application control system using proxy-agent. Through these developments, we verified the effectiveness of the data collection capability of proxy-agent, and the framework extension capability.
{"title":"Extensible speech recognition system using proxy-agent","authors":"Teppei Nakano, S. Fujie, Tetsunori Kobayashi","doi":"10.1109/ASRU.2007.4430181","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430181","url":null,"abstract":"This paper presents an extension framework for a speech recognition system. This framework is designed to use \"proxy-agent,\" a software component located between applications, speech recognition engines, and input devices. By taking advantage of its structural characteristics, proxy-agent can provide supplementary services for speech recognition systems as well as user extensions. A monitoring capability, a feedback capability, and an extension capability are implemented and presented in this paper. For the first prototype, we developed a data collection application and an application control system using proxy-agent. Through these developments, we verified the effectiveness of the data collection capability of proxy-agent, and the framework extension capability.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128882319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430172
Jussi Leppänen, Jilei Tian
Large-vocabulary speech recognition systems have mainly been developed for fast processors and large amounts of memory that are available on desktop computers and network servers. Much progress has been made towards running these systems on portable devices. Challenges still exist, however, when developing highly efficient algorithms for real-time speech recognition on resource-limited embedded platforms. In this paper, a dynamic vocabulary prediction approach is proposed to decrease the memory footprint of the speech recognizer decoder by keeping the decoder vocabulary small. This leads to reduced acoustic confusion as well as achieving very efficient use of computational resources. Experiments on an isolated-word SMS dictation task have shown that 40% of the vocabulary prediction errors can be eliminated compared to the baseline system.
{"title":"Dynamic vocabulary prediction for isolated-word dictation on embedded devices","authors":"Jussi Leppänen, Jilei Tian","doi":"10.1109/ASRU.2007.4430172","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430172","url":null,"abstract":"Large-vocabulary speech recognition systems have mainly been developed for fast processors and large amounts of memory that are available on desktop computers and network servers. Much progress has been made towards running these systems on portable devices. Challenges still exist, however, when developing highly efficient algorithms for real-time speech recognition on resource-limited embedded platforms. In this paper, a dynamic vocabulary prediction approach is proposed to decrease the memory footprint of the speech recognizer decoder by keeping the decoder vocabulary small. This leads to reduced acoustic confusion as well as achieving very efficient use of computational resources. Experiments on an isolated-word SMS dictation task have shown that 40% of the vocabulary prediction errors can be eliminated compared to the baseline system.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430176
Y. Qiao, S. Asakawa, N. Minematsu
The universal structure of speech [1, 2], proves to be invariant to transformations in feature space, and thus provides a robust representation for speech recognition. One of the difficulties of using structure representation is due to its high dimensionality. This not only increases computational cost but also easily suffers from the curse of dimensionality [3, 4]. In this paper, we introduce random discriminant structure analysis (RDSA) to deal with this problem. Based on the observation that structural features are highly correlated and include large redundancy, the RDSA combines random feature selection and discriminative analysis to calculate several low dimensional and discriminative representations from an input structure. Then an individual classifier is trained for each representation and the outputs of each classifier are integrated for the final classification decision. Experimental results on connected Japanese vowel utterances show that our approach achieves a recognition rate of 98.3% based on the training data of 8 speakers, which is higher than that (97.4%) of HMMs trained with the utterances of 4,130 speakers.
{"title":"Random discriminant structure analysis for automatic recognition of connected vowels","authors":"Y. Qiao, S. Asakawa, N. Minematsu","doi":"10.1109/ASRU.2007.4430176","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430176","url":null,"abstract":"The universal structure of speech [1, 2], proves to be invariant to transformations in feature space, and thus provides a robust representation for speech recognition. One of the difficulties of using structure representation is due to its high dimensionality. This not only increases computational cost but also easily suffers from the curse of dimensionality [3, 4]. In this paper, we introduce random discriminant structure analysis (RDSA) to deal with this problem. Based on the observation that structural features are highly correlated and include large redundancy, the RDSA combines random feature selection and discriminative analysis to calculate several low dimensional and discriminative representations from an input structure. Then an individual classifier is trained for each representation and the outputs of each classifier are integrated for the final classification decision. Experimental results on connected Japanese vowel utterances show that our approach achieves a recognition rate of 98.3% based on the training data of 8 speakers, which is higher than that (97.4%) of HMMs trained with the utterances of 4,130 speakers.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430123
Hung-An Chang, James R. Glass
In this paper we present a hierarchical large-margin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A two-stage hierarchical classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overall classification. Since the loss function required in the training is convex to the parameter space the problem of spurious local minima is avoided. The model achieves good performance with fewer parameters than single-level classifiers. In the TIMIT benchmark task of context-independent phonetic classification, the proposed modeling scheme achieves a state-of-the-art phonetic classification error of 16.7% on the core test set. This is an absolute reduction of 1.6% from the best previously reported result on this task, and 4-5% lower than a variety of classifiers that have been recently examined on this task.
{"title":"Hierarchical large-margin Gaussian mixture models for phonetic classification","authors":"Hung-An Chang, James R. Glass","doi":"10.1109/ASRU.2007.4430123","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430123","url":null,"abstract":"In this paper we present a hierarchical large-margin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A two-stage hierarchical classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overall classification. Since the loss function required in the training is convex to the parameter space the problem of spurious local minima is avoided. The model achieves good performance with fewer parameters than single-level classifiers. In the TIMIT benchmark task of context-independent phonetic classification, the proposed modeling scheme achieves a state-of-the-art phonetic classification error of 16.7% on the core test set. This is an absolute reduction of 1.6% from the best previously reported result on this task, and 4-5% lower than a variety of classifiers that have been recently examined on this task.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115220513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430106
Alessandro Moschitti, G. Riccardi, C. Raymond
Automatic concept segmentation and labeling are the fundamental problems of spoken language understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with support vector machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with off-the-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-the-art models when combined with n-gram based models.
{"title":"Spoken language understanding with kernels for syntactic/semantic structures","authors":"Alessandro Moschitti, G. Riccardi, C. Raymond","doi":"10.1109/ASRU.2007.4430106","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430106","url":null,"abstract":"Automatic concept segmentation and labeling are the fundamental problems of spoken language understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with support vector machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with off-the-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-the-art models when combined with n-gram based models.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430136
Yun-Hsuan Sung, Constantinos Boulis, Christopher D. Manning, Dan Jurafsky
We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply Maximum a Posteriori adaptation to HCRFs, decreasing the phone classification error rate in the Switchboard corpus by around 1% -5% given only small amounts of adaptation data.
{"title":"Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification","authors":"Yun-Hsuan Sung, Constantinos Boulis, Christopher D. Manning, Dan Jurafsky","doi":"10.1109/ASRU.2007.4430136","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430136","url":null,"abstract":"We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply Maximum a Posteriori adaptation to HCRFs, decreasing the phone classification error rate in the Switchboard corpus by around 1% -5% given only small amounts of adaptation data.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121569977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430150
Frank Diehl, A. Moreno, E. Monte‐Moreno
In this work we discuss the development of two cross-lingual acoustic model sets for automatic speech recognition (ASR). The starting point is a set of multilingual Spanish-English-German hidden Markov models (HMMs). The target languages are Slovenian and French. During the discussion the problem of defining a multilingual phoneme set and the associated dictionary mapping is considered. A method is described to circumvent related problems. The impact of the acoustic source models on the performance of the target systems is analyzed in detail. Several cross-lingual defined target systems are built and compared to their monolingual counterparts. It is shown that cross-lingual build acoustic models clearly outperform pure monolingual models if only a limited amount of target data is available.
{"title":"Crosslingual acoustic model development for automatics speech recognition","authors":"Frank Diehl, A. Moreno, E. Monte‐Moreno","doi":"10.1109/ASRU.2007.4430150","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430150","url":null,"abstract":"In this work we discuss the development of two cross-lingual acoustic model sets for automatic speech recognition (ASR). The starting point is a set of multilingual Spanish-English-German hidden Markov models (HMMs). The target languages are Slovenian and French. During the discussion the problem of defining a multilingual phoneme set and the associated dictionary mapping is considered. A method is described to circumvent related problems. The impact of the acoustic source models on the performance of the target systems is analyzed in detail. Several cross-lingual defined target systems are built and compared to their monolingual counterparts. It is shown that cross-lingual build acoustic models clearly outperform pure monolingual models if only a limited amount of target data is available.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132692773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-12-01DOI: 10.1109/ASRU.2007.4430156
J. McDonough, Emilian Stoimenov, D. Klakow
In automatic speech recognition based on weighted-finite transducers, a static decoding graph HC o L o G is typically constructed. In this work, we first show how the size of the decoding graph can be reduced and the necessity of determinizing it can be eliminated by removing the ambiguity associated with transitions to the backoff state or states in G. We then show how the static construction can be avoided entirely by performing fast on-the-fly composition of HC and L o G. We demonstrate that speech recognition based on this on-the-fly composition approximately 80% more run-time than recognition based on the statically-expanded network R, which makes it competitive compared with other dynamic expansion algorithms that have appeared in the literature. Moreover, the dynamic algorithm requires a factor of approximately seven less main memory as the recognition based on the static decoding graph.
在基于加权有限换能器的自动语音识别中,通常构造静态解码图HC ~ L ~ G。在这项工作中,首先展示如何解码图像的大小可以减少和determinizing可以被删除的必要性与过渡到相关的模糊补偿国家或州g .然后我们展示静态结构可以避免完全由执行快速动态组成HC和L o g .我们证明了基于语音识别的动态组成大约80%比识别基于statically-expanded网络运行时R,这使得它与文献中出现的其他动态展开算法相比具有竞争力。此外,基于静态解码图的识别,动态算法所需的主存储器大约减少了7倍。
{"title":"An algorithm for fast composition of weighted finite-state transducers","authors":"J. McDonough, Emilian Stoimenov, D. Klakow","doi":"10.1109/ASRU.2007.4430156","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430156","url":null,"abstract":"In automatic speech recognition based on weighted-finite transducers, a static decoding graph HC o L o G is typically constructed. In this work, we first show how the size of the decoding graph can be reduced and the necessity of determinizing it can be eliminated by removing the ambiguity associated with transitions to the backoff state or states in G. We then show how the static construction can be avoided entirely by performing fast on-the-fly composition of HC and L o G. We demonstrate that speech recognition based on this on-the-fly composition approximately 80% more run-time than recognition based on the statically-expanded network R, which makes it competitive compared with other dynamic expansion algorithms that have appeared in the literature. Moreover, the dynamic algorithm requires a factor of approximately seven less main memory as the recognition based on the static decoding graph.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134065202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}