Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163968
Dilek Z. Hakkani-Tür, Gökhan Tür, Larry Heck, Asli Celikyilmaz, Ashley Fidler, D. Hillard, R. Iyer, S. Parthasarathy
Logs of user queries from a search engine (such as Bing or Google) together with the links clicked provide valuable implicit feedback to improve statistical spoken language understanding (SLU) models. In this work, we propose to enrich the existing classification feature set for domain detection with features computed using the click distribution over a set of clicked URLs from search query click logs (QCLs) of user utterances. Since the form of natural language utterances differs stylistically from that of keyword search queries, to be able to match natural language utterances with related search queries, we perform a syntax-based transformation of the original utterances, after filtering out domain-independent salient phrases. This approach results in significant improvements for domain detection, especially when detecting the domains of web-related user utterances.
{"title":"Employing web search query click logs for multi-domain spoken language understanding","authors":"Dilek Z. Hakkani-Tür, Gökhan Tür, Larry Heck, Asli Celikyilmaz, Ashley Fidler, D. Hillard, R. Iyer, S. Parthasarathy","doi":"10.1109/ASRU.2011.6163968","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163968","url":null,"abstract":"Logs of user queries from a search engine (such as Bing or Google) together with the links clicked provide valuable implicit feedback to improve statistical spoken language understanding (SLU) models. In this work, we propose to enrich the existing classification feature set for domain detection with features computed using the click distribution over a set of clicked URLs from search query click logs (QCLs) of user utterances. Since the form of natural language utterances differs stylistically from that of keyword search queries, to be able to match natural language utterances with related search queries, we perform a syntax-based transformation of the original utterances, after filtering out domain-independent salient phrases. This approach results in significant improvements for domain detection, especially when detecting the domains of web-related user utterances.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133287442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163937
Stanley F. Chen, A. Sethy, B. Ramabhadran
Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been little work on the pruning of these models. In this paper, we propose several pruning algorithms for general exponential language models. We show that our best algorithm applied to an exponential n-gram model outperforms existing n-gram model pruning algorithms by up to 0.4% absolute in speech recognition word-error rate on Wall Street Journal and Broadcast News data sets. In addition, we show that Model M, an exponential class-based language model, retains its performance improvement over conventional word n-gram models when pruned to equal size, with gains of up to 2.5% absolute in word-error rate.
{"title":"Pruning exponential language models","authors":"Stanley F. Chen, A. Sethy, B. Ramabhadran","doi":"10.1109/ASRU.2011.6163937","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163937","url":null,"abstract":"Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been little work on the pruning of these models. In this paper, we propose several pruning algorithms for general exponential language models. We show that our best algorithm applied to an exponential n-gram model outperforms existing n-gram model pruning algorithms by up to 0.4% absolute in speech recognition word-error rate on Wall Street Journal and Broadcast News data sets. In addition, we show that Model M, an exponential class-based language model, retains its performance improvement over conventional word n-gram models when pruned to equal size, with gains of up to 2.5% absolute in word-error rate.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133299306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163908
D. Gillick, L. Gillick, S. Wegmann
We describe a series of experiments simulating data from the standard Hidden Markov Model (HMM) framework used for speech recognition. Starting with a set of test transcriptions, we begin by simulating every step of the generative process. In each subsequent experiment, we substitute a real component for a simulated component (real state durations rather than simulating from the transition models, for example), and compare the word error rates of the resulting data, thus quantifying the relative costs of each modeling assumption. A novel sampling process allows us to test the independence assumptions of the HMM, which appear to present far more serious problems than the other data/model mismatches.
{"title":"Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition","authors":"D. Gillick, L. Gillick, S. Wegmann","doi":"10.1109/ASRU.2011.6163908","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163908","url":null,"abstract":"We describe a series of experiments simulating data from the standard Hidden Markov Model (HMM) framework used for speech recognition. Starting with a set of test transcriptions, we begin by simulating every step of the generative process. In each subsequent experiment, we substitute a real component for a simulated component (real state durations rather than simulating from the transition models, for example), and compare the word error rates of the resulting data, thus quantifying the relative costs of each modeling assumption. A novel sampling process allows us to test the independence assumptions of the HMM, which appear to present far more serious problems than the other data/model mismatches.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134537896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163949
José Lopes, M. Eskénazi, I. Trancoso
When humans and computers use the same terms (primes, when they entrain to one another), spoken dialogs proceed more smoothly. The goal of this paper is to describe initial steps we have found that will enable us to eventually automatically choose better primes in spoken dialog system prompts. Two different sets of prompts were used to understand what makes one prime more suitable than another. The impact of the primes chosen in speech recognition was evaluated. In addition, results reveal that users did adopt the new vocabulary introduced in the new system prompts. As a result of this, performance of the system improved, providing clues for the trade off needed when choosing between adequate primes in prompts and speech recognition performance.
{"title":"Towards choosing better primes for spoken dialog systems","authors":"José Lopes, M. Eskénazi, I. Trancoso","doi":"10.1109/ASRU.2011.6163949","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163949","url":null,"abstract":"When humans and computers use the same terms (primes, when they entrain to one another), spoken dialogs proceed more smoothly. The goal of this paper is to describe initial steps we have found that will enable us to eventually automatically choose better primes in spoken dialog system prompts. Two different sets of prompts were used to understand what makes one prime more suitable than another. The impact of the primes chosen in speech recognition was evaluated. In addition, results reveal that users did adopt the new vocabulary introduced in the new system prompts. As a result of this, performance of the system improved, providing clues for the trade off needed when choosing between adequate primes in prompts and speech recognition performance.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114477784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163886
Md. Jahangir Alam, T. Kinnunen, P. Kenny, P. Ouellet, D. O'Shaughnessy
This paper studies the low-variance multi-taper mel-frequency cepstral coefficient (MFCC) features in the state-of-the-art speaker verification. The MFCC features are usually computed using a Hamming-windowed DFT spectrum. Windowing reduces the bias of the spectrum but variance remains high. Recently, low-variance multi-taper MFCC features were studied in speaker verification with promising preliminary results on the NIST 2002 SRE data using a simple GMM-UBM recognizer. In this study our goal is to validate those findings using a up-to-date i-vector classifier on the latest NIST 2010 SRE data. Our experiment on the telephone (det5) and microphone speech (det1, det2, det3 and det4) indicate that the multi-taper approaches perform better than the conventional Hamming window technique.
{"title":"Multi-taper MFCC features for speaker verification using I-vectors","authors":"Md. Jahangir Alam, T. Kinnunen, P. Kenny, P. Ouellet, D. O'Shaughnessy","doi":"10.1109/ASRU.2011.6163886","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163886","url":null,"abstract":"This paper studies the low-variance multi-taper mel-frequency cepstral coefficient (MFCC) features in the state-of-the-art speaker verification. The MFCC features are usually computed using a Hamming-windowed DFT spectrum. Windowing reduces the bias of the spectrum but variance remains high. Recently, low-variance multi-taper MFCC features were studied in speaker verification with promising preliminary results on the NIST 2002 SRE data using a simple GMM-UBM recognizer. In this study our goal is to validate those findings using a up-to-date i-vector classifier on the latest NIST 2010 SRE data. Our experiment on the telephone (det5) and microphone speech (det1, det2, det3 and det4) indicate that the multi-taper approaches perform better than the conventional Hamming window technique.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117075641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163934
A. Rastrow, Mark Dredze, S. Khudanpur
Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.
{"title":"Adapting n-gram maximum entropy language models with conditional entropy regularization","authors":"A. Rastrow, Mark Dredze, S. Khudanpur","doi":"10.1109/ASRU.2011.6163934","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163934","url":null,"abstract":"Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115247229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163931
H. Sak, M. Saraçlar, Tunga Güngör
This paper explores rich morphological and novel n-best-list features for reranking automatic speech recognition hypotheses. The morpholexical features are defined over the morphological features obtained by using an n-gram language model over lexical and grammatical morphemes in the first-pass. The n-best-list features for each hypothesis are defined using that hypothesis and other alternate hypotheses in an n-best list. Our methodology is to align each hypothesis with other hypotheses one by one using minimum edit distance alignment. This gives us a set of edit operations - substitution, addition and deletion as seen in these alignments. These edit operations constitute our n-best-list features as indicator features. The reranking model is trained using a word error rate sensitive averaged perceptron algorithm introduced in this paper. The proposed methods are evaluated on a Turkish broadcast news transcription task. The baseline systems are word and statistical sub-word systems which also employ morphological features for reranking. We show that morpholexical and n-best-list features are effective in improving the accuracy of the system (0.8%).
{"title":"Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features","authors":"H. Sak, M. Saraçlar, Tunga Güngör","doi":"10.1109/ASRU.2011.6163931","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163931","url":null,"abstract":"This paper explores rich morphological and novel n-best-list features for reranking automatic speech recognition hypotheses. The morpholexical features are defined over the morphological features obtained by using an n-gram language model over lexical and grammatical morphemes in the first-pass. The n-best-list features for each hypothesis are defined using that hypothesis and other alternate hypotheses in an n-best list. Our methodology is to align each hypothesis with other hypotheses one by one using minimum edit distance alignment. This gives us a set of edit operations - substitution, addition and deletion as seen in these alignments. These edit operations constitute our n-best-list features as indicator features. The reranking model is trained using a word error rate sensitive averaged perceptron algorithm introduced in this paper. The proposed methods are evaluated on a Turkish broadcast news transcription task. The baseline systems are word and statistical sub-word systems which also employ morphological features for reranking. We show that morpholexical and n-best-list features are effective in improving the accuracy of the system (0.8%).","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125028263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163936
Jia Cui, Stanley F. Chen, Bowen Zhou
Word class information has long been proven useful in language modeling (LM). However, the improved performance of class-based LMs over word n-gram models generally comes at the cost of increased decoding complexity and model size. In this paper, we propose a modified version of the Maximum Entropy token-based language model of [1] that matches the performance of the best existing class-based models, but which is as fast for decoding as a word n-gram model. In addition, while it is easy to statically combine word n-gram models built on different corpora into a single word n-gram model for fast decoding, it is unknown how to statically combine class-based LMs effectively. Another contribution of this paper is to propose a novel combination method that retains the gain of class-based LMs over word n-gram models. Experimental results on several spoken language translation tasks show that our model performs significantly better than word n-gram models with comparable decoding speed and only a modest increase in model size.
{"title":"Efficient representation and fast look-up of Maximum Entropy language models","authors":"Jia Cui, Stanley F. Chen, Bowen Zhou","doi":"10.1109/ASRU.2011.6163936","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163936","url":null,"abstract":"Word class information has long been proven useful in language modeling (LM). However, the improved performance of class-based LMs over word n-gram models generally comes at the cost of increased decoding complexity and model size. In this paper, we propose a modified version of the Maximum Entropy token-based language model of [1] that matches the performance of the best existing class-based models, but which is as fast for decoding as a word n-gram model. In addition, while it is easy to statically combine word n-gram models built on different corpora into a single word n-gram model for fast decoding, it is unknown how to statically combine class-based LMs effectively. Another contribution of this paper is to propose a novel combination method that retains the gain of class-based LMs over word n-gram models. Experimental results on several spoken language translation tasks show that our model performs significantly better than word n-gram models with comparable decoding speed and only a modest increase in model size.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121729343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163904
Wenlin Zhang, Weiqiang Zhang, Bi-cheng Li
Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.
{"title":"Speaker adaptation based on speaker-dependent eigenphone estimation","authors":"Wenlin Zhang, Weiqiang Zhang, Bi-cheng Li","doi":"10.1109/ASRU.2011.6163904","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163904","url":null,"abstract":"Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124227970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-12-01DOI: 10.1109/ASRU.2011.6163944
Fethi Bougares, Y. Estève, P. Deléglise, G. Linarès
This paper focuses on automatic speech recognition systems combination based on driven decoding paradigms. The driven decoding algorithm (DDA) involves the use of a 1-best hypothesis provided by an auxiliary system as another knowledge source in the search algorithm of a primary system. In previous studies, it was shown that DDA outperforms ROVER when the primary system is guided by a more accurate system. In this paper we propose a new method to manage auxiliary transcriptions which are presented as a bag-of-n-grams (BONG) without temporal matching. These modifications allow to make easier the combination of several hypotheses given by different auxiliary systems. Using BONG combination with hypotheses provided by two auxiliary systems, each of which obtained more than 23% of WER on the same data, our experiments show that a CMU Sphinx based ASR system can reduce its WER from 19.85% to 18.66% which is better than the results reached with DDA or classical ROVER combination.
{"title":"Bag of n-gram driven decoding for LVCSR system harnessing","authors":"Fethi Bougares, Y. Estève, P. Deléglise, G. Linarès","doi":"10.1109/ASRU.2011.6163944","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163944","url":null,"abstract":"This paper focuses on automatic speech recognition systems combination based on driven decoding paradigms. The driven decoding algorithm (DDA) involves the use of a 1-best hypothesis provided by an auxiliary system as another knowledge source in the search algorithm of a primary system. In previous studies, it was shown that DDA outperforms ROVER when the primary system is guided by a more accurate system. In this paper we propose a new method to manage auxiliary transcriptions which are presented as a bag-of-n-grams (BONG) without temporal matching. These modifications allow to make easier the combination of several hypotheses given by different auxiliary systems. Using BONG combination with hypotheses provided by two auxiliary systems, each of which obtained more than 23% of WER on the same data, our experiments show that a CMU Sphinx based ASR system can reduce its WER from 19.85% to 18.66% which is better than the results reached with DDA or classical ROVER combination.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130279166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}