用户模拟作为口语对话系统的测试

SIGDIAL Workshop Pub Date : 2008-06-19 DOI:10.3115/1622064.1622097

H. Ai, F. Weng

{"title":"用户模拟作为口语对话系统的测试","authors":"H. Ai, F. Weng","doi":"10.3115/1622064.1622097","DOIUrl":null,"url":null,"abstract":"We propose to use user simulation for testing during the development of a sophisticated dialog system. While the limited behaviors of the state-of-the-art user simulation may not cover important aspects in the dialog system testing, our proposed approach extends the functionality of the simulation so that it can be used at least for the early stage testing before the system reaches stable performance for evaluation involving human users. The proposed approach includes a set of evaluation measures that can be computed automatically from the interaction logs between the user simulator and the dialog system. We first validate these measures on human user dialogs using user satisfaction scores. We also build a regression model to estimate the user satisfaction scores using these evaluation measures. Then, we apply the evaluation measures on a simulated dialog corpus trained from the real user corpus. We show that the user satisfaction scores estimated from the simulated corpus are not statistically different from the real users' satisfaction scores.","PeriodicalId":426429,"journal":{"name":"SIGDIAL Workshop","volume":"87 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"User Simulation as Testing for Spoken Dialog Systems\",\"authors\":\"H. Ai, F. Weng\",\"doi\":\"10.3115/1622064.1622097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose to use user simulation for testing during the development of a sophisticated dialog system. While the limited behaviors of the state-of-the-art user simulation may not cover important aspects in the dialog system testing, our proposed approach extends the functionality of the simulation so that it can be used at least for the early stage testing before the system reaches stable performance for evaluation involving human users. The proposed approach includes a set of evaluation measures that can be computed automatically from the interaction logs between the user simulator and the dialog system. We first validate these measures on human user dialogs using user satisfaction scores. We also build a regression model to estimate the user satisfaction scores using these evaluation measures. Then, we apply the evaluation measures on a simulated dialog corpus trained from the real user corpus. We show that the user satisfaction scores estimated from the simulated corpus are not statistically different from the real users' satisfaction scores.\",\"PeriodicalId\":426429,\"journal\":{\"name\":\"SIGDIAL Workshop\",\"volume\":\"87 5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGDIAL Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1622064.1622097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGDIAL Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1622064.1622097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

摘要

我们建议在开发复杂的对话系统期间使用用户模拟进行测试。虽然最先进的用户模拟的有限行为可能不会覆盖对话系统测试中的重要方面，但我们提出的方法扩展了模拟的功能，因此它至少可以用于系统达到稳定性能之前的早期测试，以进行涉及人类用户的评估。所提出的方法包括一组可以从用户模拟器和对话系统之间的交互日志中自动计算的评估度量。我们首先使用用户满意度分数在人类用户对话中验证这些度量。我们还建立了一个回归模型来估计使用这些评价指标的用户满意度得分。然后，我们将评估方法应用于从真实用户语料库中训练出来的模拟对话语料库。我们表明，从模拟语料库估计的用户满意度得分与真实用户的满意度得分没有统计学差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

User Simulation as Testing for Spoken Dialog Systems

We propose to use user simulation for testing during the development of a sophisticated dialog system. While the limited behaviors of the state-of-the-art user simulation may not cover important aspects in the dialog system testing, our proposed approach extends the functionality of the simulation so that it can be used at least for the early stage testing before the system reaches stable performance for evaluation involving human users. The proposed approach includes a set of evaluation measures that can be computed automatically from the interaction logs between the user simulator and the dialog system. We first validate these measures on human user dialogs using user satisfaction scores. We also build a regression model to estimate the user satisfaction scores using these evaluation measures. Then, we apply the evaluation measures on a simulated dialog corpus trained from the real user corpus. We show that the user satisfaction scores estimated from the simulated corpus are not statistically different from the real users' satisfaction scores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIGDIAL Workshop

自引率

0.00%

发文量