Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373242
Cheongjae Lee, Sungjin Lee, Sangkeun Jung, Kyungduk Kim, Donghyeon Lee, G. G. Lee
Query relaxation refers to the process of reducing the number of constraints on a query if it returns no result when searching a database. This is an important process to enable extraction of an appropriate number of query results because queries that are too strictly constrained may return no result, whereas queries that are too loosely constrained may return too many results. This paper proposes an automated method of correlation-based query relaxation (CBQR) to select an appropriate constraint subset. The example-based dialog modeling framework was used to validate our algorithm. Preliminary results show that the proposed method facilitates the automation of query relaxation. We believe that the CBQR algorithm effectively relaxes constraints on failed queries to return more dialog examples.
{"title":"Correlation-based query relaxation for example-based dialog modeling","authors":"Cheongjae Lee, Sungjin Lee, Sangkeun Jung, Kyungduk Kim, Donghyeon Lee, G. G. Lee","doi":"10.1109/ASRU.2009.5373242","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373242","url":null,"abstract":"Query relaxation refers to the process of reducing the number of constraints on a query if it returns no result when searching a database. This is an important process to enable extraction of an appropriate number of query results because queries that are too strictly constrained may return no result, whereas queries that are too loosely constrained may return too many results. This paper proposes an automated method of correlation-based query relaxation (CBQR) to select an appropriate constraint subset. The example-based dialog modeling framework was used to validate our algorithm. Preliminary results show that the proposed method facilitates the automation of query relaxation. We believe that the CBQR algorithm effectively relaxes constraints on failed queries to return more dialog examples.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114983670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373320
Chih-Chieh Cheng, Fei Sha, L. Saul
We consider how to optimize the acoustic features used by hidden Markov models (HMMs) for automatic speech recognition (ASR). We investigate a mistake-driven algorithm that discriminatively reweights the acoustic features in order to separate the log-likelihoods of correct and incorrect transcriptions by a large margin. The algorithm simultaneously optimizes the HMM parameters in the back end by adapting them to the reweighted features computed by the front end. Using an online approach, we incrementally update feature weights and model parameters after the decoding of each training utterance. To mitigate the strongly biased gradients from individual training utterances, we train several different recognizers in parallel while tying the feature transformations in their front ends. We show that this parameter-tying across different recognizers leads to more stable updates and generally fewer recognition errors.
{"title":"Large-margin feature adaptation for automatic speech recognition","authors":"Chih-Chieh Cheng, Fei Sha, L. Saul","doi":"10.1109/ASRU.2009.5373320","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373320","url":null,"abstract":"We consider how to optimize the acoustic features used by hidden Markov models (HMMs) for automatic speech recognition (ASR). We investigate a mistake-driven algorithm that discriminatively reweights the acoustic features in order to separate the log-likelihoods of correct and incorrect transcriptions by a large margin. The algorithm simultaneously optimizes the HMM parameters in the back end by adapting them to the reweighted features computed by the front end. Using an online approach, we incrementally update feature weights and model parameters after the decoding of each training utterance. To mitigate the strongly biased gradients from individual training utterances, we train several different recognizers in parallel while tying the feature transformations in their front ends. We show that this parameter-tying across different recognizers leads to more stable updates and generally fewer recognition errors.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116933556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373302
Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu
Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.
{"title":"Integrating prosodic features in extractive meeting summarization","authors":"Shasha Xie, Dilek Z. Hakkani-Tür, Benoit Favre, Yang Liu","doi":"10.1109/ASRU.2009.5373302","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373302","url":null,"abstract":"Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129787891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373374
M. Feld, E. Barnard, C. V. Heerden, Christian A. Müller
Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speech-processing systems are developed with a single language in mind. As a step towards improved understanding of multilingual speech processing, the current contribution investigates how an important para-linguistic aspect of speech, namely speaker age, depends on the language spoken. In particular, we study how certain speech features affect the performance of an age recognition system for different South African languages in the Lwazi corpus. By optimizing our feature set and performing language-specific tuning, we are working towards true multilingual classifiers. As they are closely related, ASR and dialog systems are likely to benefit from an improved classification of the speaker. In a comprehensive corpus analysis on long-term features, we have identified features that exhibit characteristic behaviors for particular languages. In a follow-up regression experiment, we confirm the suitability of our feature selection for age recognition and present cross-language error rates. The mean absolute error ranges between 7.7 and 12.8 years for same-language predictors and rises to 14.5 years for cross-language predictors.
{"title":"Multilingual speaker age recognition: Regression analyses on the Lwazi corpus","authors":"M. Feld, E. Barnard, C. V. Heerden, Christian A. Müller","doi":"10.1109/ASRU.2009.5373374","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373374","url":null,"abstract":"Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speech-processing systems are developed with a single language in mind. As a step towards improved understanding of multilingual speech processing, the current contribution investigates how an important para-linguistic aspect of speech, namely speaker age, depends on the language spoken. In particular, we study how certain speech features affect the performance of an age recognition system for different South African languages in the Lwazi corpus. By optimizing our feature set and performing language-specific tuning, we are working towards true multilingual classifiers. As they are closely related, ASR and dialog systems are likely to benefit from an improved classification of the speaker. In a comprehensive corpus analysis on long-term features, we have identified features that exhibit characteristic behaviors for particular languages. In a follow-up regression experiment, we confirm the suitability of our feature selection for age recognition and present cross-language error rates. The mean absolute error ranges between 7.7 and 12.8 years for same-language predictors and rises to 14.5 years for cross-language predictors.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373368
A. Sangwan, J. Hansen
This study presents a novel phonological methodology for speech recognition based on phonological features (PFs) which leverages the relationship between speech phonology and phonetics. In particular, the proposed scheme estimates the likelihood of observing speech phonology given an associative lexicon. In this manner, the scheme is capable of choosing the most likely hypothesis (word candidate) among a group of competing alternative hypotheses. The framework employs the Maximum Entropy (ME) model to learn the relationship between phonetics and phonology. Subsequently, we extend the ME model to a ME-HMM (maximum entropy-hidden Markov model) which captures the speech production and linguistic relationship between phonology and words. The proposed ME-HMM model is applied to the task of re-processing N-best lists where an absolute WRA (word recognition rate) increase of 1.7%, 1.9% and 1% are reported for TIMIT, NTIMIT, and the SPINE (speech in noise) corpora (15.5% and 22.5% relative reduction in word error rate for TIMIT and NTIMIT).
{"title":"Leveraging speech production knowledge for improved speech recognition","authors":"A. Sangwan, J. Hansen","doi":"10.1109/ASRU.2009.5373368","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373368","url":null,"abstract":"This study presents a novel phonological methodology for speech recognition based on phonological features (PFs) which leverages the relationship between speech phonology and phonetics. In particular, the proposed scheme estimates the likelihood of observing speech phonology given an associative lexicon. In this manner, the scheme is capable of choosing the most likely hypothesis (word candidate) among a group of competing alternative hypotheses. The framework employs the Maximum Entropy (ME) model to learn the relationship between phonetics and phonology. Subsequently, we extend the ME model to a ME-HMM (maximum entropy-hidden Markov model) which captures the speech production and linguistic relationship between phonology and words. The proposed ME-HMM model is applied to the task of re-processing N-best lists where an absolute WRA (word recognition rate) increase of 1.7%, 1.9% and 1% are reported for TIMIT, NTIMIT, and the SPINE (speech in noise) corpora (15.5% and 22.5% relative reduction in word error rate for TIMIT and NTIMIT).","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130450010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373260
S. Varges, G. Riccardi, S. Quarteroni, A. Ivanov
Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy.
{"title":"The exploration/exploitation trade-off in Reinforcement Learning for dialogue management","authors":"S. Varges, G. Riccardi, S. Quarteroni, A. Ivanov","doi":"10.1109/ASRU.2009.5373260","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373260","url":null,"abstract":"Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125354298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373257
F. Eyben, M. Wöllmer, Björn Schuller, Alex Graves
Main-stream automatic speech recognition systems are based on modelling acoustic sub-word units such as phonemes. Phonemisation dictionaries and language model based decoding techniques are applied to transform the phoneme hypothesis into orthographic transcriptions. Direct modelling of graphemes as sub-word units using HMM has not been successful. We investigate a novel ASR approach using Bidirectional Long Short-Term Memory Recurrent Neural Networks and Connectionist Temporal Classification, which is capable of transcribing graphemes directly and yields results highly competitive with phoneme transcription. In design of such a grapheme based speech recognition system phonemisation dictionaries are no longer required. All that is needed is text transcribed on the sentence level, which greatly simplifies the training procedure. The novel approach is evaluated extensively on the Wall Street Journal 1 corpus.
{"title":"From speech to letters - using a novel neural network architecture for grapheme based ASR","authors":"F. Eyben, M. Wöllmer, Björn Schuller, Alex Graves","doi":"10.1109/ASRU.2009.5373257","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373257","url":null,"abstract":"Main-stream automatic speech recognition systems are based on modelling acoustic sub-word units such as phonemes. Phonemisation dictionaries and language model based decoding techniques are applied to transform the phoneme hypothesis into orthographic transcriptions. Direct modelling of graphemes as sub-word units using HMM has not been successful. We investigate a novel ASR approach using Bidirectional Long Short-Term Memory Recurrent Neural Networks and Connectionist Temporal Classification, which is capable of transcribing graphemes directly and yields results highly competitive with phoneme transcription. In design of such a grapheme based speech recognition system phonemisation dictionaries are no longer required. All that is needed is text transcribed on the sentence level, which greatly simplifies the training procedure. The novel approach is evaluated extensively on the Wall Street Journal 1 corpus.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5373500
S. Quarteroni, Marco Dinarelli, G. Riccardi
Current Spoken Language Understanding models rely on either hand-written semantic grammars or flat attribute-value sequence labeling. In most cases, no relations between concepts are modeled, and both concepts and relations are domain-specific, making it difficult to expand or port the domain model. In contrast, we expand our previous work on a domain model based on an ontology where concepts follow the predicate-argument semantics and domain-independent classical relations are defined on such concepts. We conduct a thorough study on a spoken dialog corpus collected within a customer care problem-solving domain, and we evaluate the coverage and impact of the ontology for the interpretation, grounding and re-ranking of spoken language understanding interpretations.
{"title":"Ontology-based grounding of Spoken Language Understanding","authors":"S. Quarteroni, Marco Dinarelli, G. Riccardi","doi":"10.1109/ASRU.2009.5373500","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5373500","url":null,"abstract":"Current Spoken Language Understanding models rely on either hand-written semantic grammars or flat attribute-value sequence labeling. In most cases, no relations between concepts are modeled, and both concepts and relations are domain-specific, making it difficult to expand or port the domain model. In contrast, we expand our previous work on a domain model based on an ontology where concepts follow the predicate-argument semantics and domain-independent classical relations are defined on such concepts. We conduct a thorough study on a spoken dialog corpus collected within a customer care problem-solving domain, and we evaluate the coverage and impact of the ontology for the interpretation, grounding and re-ranking of spoken language understanding interpretations.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121052310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5372892
Bing Xiang, Bowen Zhou, Martin Cmejrek
In current statistical machine translation, IBM model based word alignment is widely used as a starting point to build phrase-based machine translation systems. However, such alignment model is separated from the rest of machine translation pipeline and optimized independently. Furthermore, structural information is not taken into account in the alignment model, which sometimes leads to incorrect alignments. In this paper, we present a novel method to connect a re-alignment model with a translation model in an integrated framework. We conduct bilingual chart parsing based on syntax-augmented synchronous context-free grammar. A Viterbi derivation tree is generated for each sentence pair with multiple features employed in a log-linear model. A new word alignment is created under the structural constraint from the Viterbi tree. Extensive experiments are conducted in a Farsi-to-English translation task in conversational speech domain and also a German-to-English translation task in text domain. Systems trained on the new alignment provide significant higher BLEU scores compared to a state-of-the-art baseline.
{"title":"Towards integrated machine translation using structural alignment from syntax-augmented synchronous parsing","authors":"Bing Xiang, Bowen Zhou, Martin Cmejrek","doi":"10.1109/ASRU.2009.5372892","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372892","url":null,"abstract":"In current statistical machine translation, IBM model based word alignment is widely used as a starting point to build phrase-based machine translation systems. However, such alignment model is separated from the rest of machine translation pipeline and optimized independently. Furthermore, structural information is not taken into account in the alignment model, which sometimes leads to incorrect alignments. In this paper, we present a novel method to connect a re-alignment model with a translation model in an integrated framework. We conduct bilingual chart parsing based on syntax-augmented synchronous context-free grammar. A Viterbi derivation tree is generated for each sentence pair with multiple features employed in a log-linear model. A new word alignment is created under the structural constraint from the Viterbi tree. Extensive experiments are conducted in a Farsi-to-English translation task in conversational speech domain and also a German-to-English translation task in text domain. Systems trained on the new alignment provide significant higher BLEU scores compared to a state-of-the-art baseline.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132745338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-12-01DOI: 10.1109/ASRU.2009.5372898
Ryota Nishimura, S. Nakagawa
If a dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a dialog system that emulates human behavior in a chat-like dialog. The proposed system makes use of a decision tree to generate chat-like responses at the appropriate times. These responses include “aizuchi” (back-channel), “repetition”, “collaborative completion”, etc. The system also reacts robustly to the user's overlapping utterances (barge-in) and disfluencies. The subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses, overlap, and aizuchi, and that the dialog system exhibits user-friendly behavior. The recorded voices system was preferred, and almost all subjects felt familiarity with aizuchi, and the barge-in was also useful.
{"title":"Response timing generation and response type selection for a spontaneous spoken dialog system","authors":"Ryota Nishimura, S. Nakagawa","doi":"10.1109/ASRU.2009.5372898","DOIUrl":"https://doi.org/10.1109/ASRU.2009.5372898","url":null,"abstract":"If a dialog system can respond to a user as naturally as a human, the interaction will appear smoother. In this research, we aim to develop a dialog system that emulates human behavior in a chat-like dialog. The proposed system makes use of a decision tree to generate chat-like responses at the appropriate times. These responses include “aizuchi” (back-channel), “repetition”, “collaborative completion”, etc. The system also reacts robustly to the user's overlapping utterances (barge-in) and disfluencies. The subjective evaluation shows that there is a high degree of naturalness in the timing of ordinary responses, overlap, and aizuchi, and that the dialog system exhibits user-friendly behavior. The recorded voices system was preferred, and almost all subjects felt familiarity with aizuchi, and the barge-in was also useful.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114281868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}