Improvements in the quality, usability and acceptability of spoken dialog systems can be facilitated by better evaluation methods. To support early and efficient evaluation of dialog systems and their components, this paper presents a tripartite framework describing the evaluation problem. One part models the behavior of user and system during the interaction, the second one the perception and judgment processes taking place inside the user, and the third part models what matters to system designers and service providers. The paper reviews available approaches for some of the model parts, and indicates how anticipated improvements may serve not only developers and users but also researchers working on advanced dialog functions and features.
{"title":"A Framework for Model-based Evaluation of Spoken Dialog Systems","authors":"S. Möller, Nigel G. Ward","doi":"10.3115/1622064.1622099","DOIUrl":"https://doi.org/10.3115/1622064.1622099","url":null,"abstract":"Improvements in the quality, usability and acceptability of spoken dialog systems can be facilitated by better evaluation methods. To support early and efficient evaluation of dialog systems and their components, this paper presents a tripartite framework describing the evaluation problem. One part models the behavior of user and system during the interaction, the second one the perception and judgment processes taking place inside the user, and the third part models what matters to system designers and service providers. The paper reviews available approaches for some of the model parts, and indicates how anticipated improvements may serve not only developers and users but also researchers working on advanced dialog functions and features.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127971696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we define agreement in terms of shared public commitments, and implicit agreement is conditioned on the semantics of the relational speech acts (e.g., Narration, Explanation) that each agent performs. We provide a consistent interpretation of disputes, and updating a logical form with the current utterance always involves extending it and not revising it, even if the current utterance denies earlier content.
{"title":"Agreement and Disputes in Dialogue","authors":"A. Lascarides, Nicholas Asher","doi":"10.3115/1622064.1622070","DOIUrl":"https://doi.org/10.3115/1622064.1622070","url":null,"abstract":"In this paper we define agreement in terms of shared public commitments, and implicit agreement is conditioned on the semantics of the relational speech acts (e.g., Narration, Explanation) that each agent performs. We provide a consistent interpretation of disputes, and updating a logical form with the current utterance always involves extending it and not revising it, even if the current utterance denies earlier content.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130029422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We explore the role of redundancy, both in anticipation of and in response to listener confusion, in task-oriented dialogue. We find that direction-givers provide redundant utterances in response to both verbal and non-verbal signals of listener confusion. We also examine the effects of prior acquaintance and visibility upon redundancy. As expected, givers use more redundant utterances overall, and more redundant utterances in response to listener questions, when communicating with strangers. We discuss our findings in relation to theories of redundancy, the balance of speaker and listener effort, and potential applications.
{"title":"Reactive Redundancy and Listener Comprehension in Direction-Giving","authors":"R. Baker, Alastair J. Gill, Justine Cassell","doi":"10.3115/1622064.1622071","DOIUrl":"https://doi.org/10.3115/1622064.1622071","url":null,"abstract":"We explore the role of redundancy, both in anticipation of and in response to listener confusion, in task-oriented dialogue. We find that direction-givers provide redundant utterances in response to both verbal and non-verbal signals of listener confusion. We also examine the effects of prior acquaintance and visibility upon redundancy. As expected, givers use more redundant utterances overall, and more redundant utterances in response to listener questions, when communicating with strangers. We discuss our findings in relation to theories of redundancy, the balance of speaker and listener effort, and potential applications.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125526707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grammar-based approaches to spoken language understanding are utilized to a great extent in industry, particularly when developers are confronted with data sparsity. In order to ensure wide grammar coverage, developers typically modify their grammars in an iterative process of deploying the application, collecting and transcribing user utterances, and adjusting the grammar. In this paper, we explore enhancing this iterative process by leveraging active learning with back-off grammars. Because the back-off grammars expand coverage of user utterances, developers have a safety net for deploying applications earlier. Furthermore, the statistics related to the back-off can be used for active learning, thus reducing the effort and cost of data transcription. In experiments conducted on a commercially deployed application, the approach achieved levels of semantic accuracy comparable to transcribing all failed utterances with 87% less transcriptions.
{"title":"Rapidly Deploying Grammar-Based Speech Applications with Active Learning and Back-off Grammars","authors":"Tim Paek, Sudeep Gandhe, D. M. Chickering","doi":"10.3115/1622064.1622075","DOIUrl":"https://doi.org/10.3115/1622064.1622075","url":null,"abstract":"Grammar-based approaches to spoken language understanding are utilized to a great extent in industry, particularly when developers are confronted with data sparsity. In order to ensure wide grammar coverage, developers typically modify their grammars in an iterative process of deploying the application, collecting and transcribing user utterances, and adjusting the grammar. In this paper, we explore enhancing this iterative process by leveraging active learning with back-off grammars. Because the back-off grammars expand coverage of user utterances, developers have a safety net for deploying applications earlier. Furthermore, the statistics related to the back-off can be used for active learning, thus reducing the effort and cost of data transcription. In experiments conducted on a commercially deployed application, the approach achieved levels of semantic accuracy comparable to transcribing all failed utterances with 87% less transcriptions.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129144260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present the ADAMACH data centric dialog system, that allows to perform on- and off-line mining of dialog context, speech recognition results and other system-generated representations, both within and across dialogs. The architecture implements a "fat pipeline" for speech and language processing. We detail how the approach integrates domain knowledge and evolving empirical data, based on a user study in the University Helpdesk domain.
{"title":"Persistent Information State in a Data-Centric Architecture","authors":"S. Varges, G. Riccardi, S. Quarteroni","doi":"10.3115/1622064.1622076","DOIUrl":"https://doi.org/10.3115/1622064.1622076","url":null,"abstract":"We present the ADAMACH data centric dialog system, that allows to perform on- and off-line mining of dialog context, speech recognition results and other system-generated representations, both within and across dialogs. The architecture implements a \"fat pipeline\" for speech and language processing. We detail how the approach integrates domain knowledge and evolving empirical data, based on a user study in the University Helpdesk domain.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123157456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a novel algorithm to dynamically set endpointing thresholds based on a rich set of dialogue features to detect the end of user utterances in a dialogue system. By analyzing the relationship between silences in user's speech to a spoken dialogue system and a wide range of automatically extracted features from discourse, semantics, prosody, timing and speaker characteristics, we found that all features correlate with pause duration and with whether a silence indicates the end of the turn, with semantics and timing being the most informative. Based on these features, the proposed method reduces latency by up to 24% over a fixed threshold baseline. Offline evaluation results were confirmed by implementing the proposed algorithm in the Let's Go system.
{"title":"Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System","authors":"Antoine Raux, M. Eskénazi","doi":"10.3115/1622064.1622066","DOIUrl":"https://doi.org/10.3115/1622064.1622066","url":null,"abstract":"This paper describes a novel algorithm to dynamically set endpointing thresholds based on a rich set of dialogue features to detect the end of user utterances in a dialogue system. By analyzing the relationship between silences in user's speech to a spoken dialogue system and a wide range of automatically extracted features from discourse, semantics, prosody, timing and speaker characteristics, we found that all features correlate with pause duration and with whether a silence indicates the end of the turn, with semantics and timing being the most informative. Based on these features, the proposed method reduces latency by up to 24% over a fixed threshold baseline. Offline evaluation results were confirmed by implementing the proposed algorithm in the Let's Go system.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115195077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyungduk Kim, Cheongjae Lee, Sangkeun Jung, G. G. Lee
This paper proposes a probabilistic framework for spoken dialog management using dialog examples. To overcome the complexity problems of the classic partially observable Markov decision processes (POMDPs) based dialog manager, we use a frame-based belief state representation that reduces the complexity of belief update. We also used dialog examples to maintain a reasonable number of system actions to reduce the complexity of the optimizing policy. We developed weather information and car navigation dialog system that employed a frame-based probabilistic framework. This framework enables people to develop a spoken dialog system using a probabilistic approach without complexity problem of POMDP.
{"title":"A Frame-Based Probabilistic Framework for Spoken Dialog Management Using Dialog Examples","authors":"Kyungduk Kim, Cheongjae Lee, Sangkeun Jung, G. G. Lee","doi":"10.3115/1622064.1622088","DOIUrl":"https://doi.org/10.3115/1622064.1622088","url":null,"abstract":"This paper proposes a probabilistic framework for spoken dialog management using dialog examples. To overcome the complexity problems of the classic partially observable Markov decision processes (POMDPs) based dialog manager, we use a frame-based belief state representation that reduces the complexity of belief update. We also used dialog examples to maintain a reasonable number of system actions to reduce the complexity of the optimizing policy. We developed weather information and car navigation dialog system that employed a frame-based probabilistic framework. This framework enables people to develop a spoken dialog system using a probabilistic approach without complexity problem of POMDP.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Significant research efforts have been devoted to speech summarization, including automatic approaches and evaluation metrics. However, a fundamental problem about what summaries are for the speech data and whether humans agree with each other remains unclear. This paper performs an analysis of human annotated extractive summaries using the ICSI meeting corpus with an aim to examine their consistency and the factors impacting human agreement. In addition to using Kappa statistics and ROUGE scores, we also proposed a sentence distance score and divergence distance as a quantitative measure. This study is expected to help better define the speech summarization problem.
{"title":"What Are Meeting Summaries? An Analysis of Human Extractive Summaries in Meeting Corpus","authors":"Fei Liu, Yang Liu","doi":"10.3115/1622064.1622079","DOIUrl":"https://doi.org/10.3115/1622064.1622079","url":null,"abstract":"Significant research efforts have been devoted to speech summarization, including automatic approaches and evaluation metrics. However, a fundamental problem about what summaries are for the speech data and whether humans agree with each other remains unclear. This paper performs an analysis of human annotated extractive summaries using the ICSI meeting corpus with an aim to examine their consistency and the factors impacting human agreement. In addition to using Kappa statistics and ROUGE scores, we also proposed a sentence distance score and divergence distance as a quantitative measure. This study is expected to help better define the speech summarization problem.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"27 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132467822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe a novel n-best correction model that can leverage implicit user feedback (in the form of clicks) to improve performance in a multi-modal speech-search application. The proposed model works in two stages. First, the n-best list generated by the speech recognizer is expanded with additional candidates, based on confusability information captured via user click statistics. In the second stage, this expanded list is rescored and pruned to produce a more accurate and compact n-best list. Results indicate that the proposed n-best correction model leads to significant improvements over the existing baseline, as well as other traditional n-best rescoring approaches.
{"title":"Learning N-Best Correction Models from Implicit User Feedback in a Multi-Modal Local Search Application","authors":"D. Bohus, Xiao Li, Patrick Nguyen, G. Zweig","doi":"10.3115/1622064.1622068","DOIUrl":"https://doi.org/10.3115/1622064.1622068","url":null,"abstract":"We describe a novel n-best correction model that can leverage implicit user feedback (in the form of clicks) to improve performance in a multi-modal speech-search application. The proposed model works in two stages. First, the n-best list generated by the speech recognizer is expanded with additional candidates, based on confusability information captured via user click statistics. In the second stage, this expanded list is rescored and pruned to produce a more accurate and compact n-best list. Results indicate that the proposed n-best correction model leads to significant improvements over the existing baseline, as well as other traditional n-best rescoring approaches.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"15 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132836551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.
{"title":"Discourse Level Opinion Relations: An Annotation Study","authors":"Swapna Somasundaran, Josef Ruppenhofer, J. Wiebe","doi":"10.3115/1622064.1622092","DOIUrl":"https://doi.org/10.3115/1622064.1622092","url":null,"abstract":"This work proposes opinion frames as a representation of discourse-level associations that arise from related opinion targets and which are common in task-oriented meeting dialogs. We define the opinion frames and explain their interpretation. Additionally we present an annotation scheme that realizes the opinion frames and via human annotation studies, we show that these can be reliably identified.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"23 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113979088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}