DialTest:自动测试循环神经网络驱动的对话系统

Zixi Liu, Yang Feng, Zhenyu Chen
{"title":"DialTest:自动测试循环神经网络驱动的对话系统","authors":"Zixi Liu, Yang Feng, Zhenyu Chen","doi":"10.1145/3460319.3464829","DOIUrl":null,"url":null,"abstract":"With the tremendous advancement of recurrent neural networks(RNN), dialogue systems have achieved significant development. Many RNN-driven dialogue systems, such as Siri, Google Home, and Alexa, have been deployed to assist various tasks. However, accompanying this outstanding performance, RNN-driven dialogue systems, which are essentially a kind of software, could also produce erroneous behaviors and result in massive losses. Meanwhile, the complexity and intractability of RNN models that power the dialogue systems make their testing challenging. In this paper, we design and implement DialTest, the first RNN-driven dialogue system testing tool. DialTest employs a series of transformation operators to make realistic changes on seed data while preserving their oracle information properly. To improve the efficiency of detecting faults, DialTest further adopts Gini impurity to guide the test generation process. We conduct extensive experiments to validate DialTest. We first experiment it on two fundamental tasks, i.e., intent detection and slot filling, of natural language understanding. The experiment results show that DialTest can effectively detect hundreds of erroneous behaviors for different RNN-driven natural language understanding (NLU) modules of dialogue systems and improve their accuracy via retraining with the generated data. Further, we conduct a case study on an industrial dialogue system to investigate the performance of DialTest under the real usage scenario. The study shows DialTest can detect errors and improve the robustness of RNN-driven dialogue systems effectively.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"DialTest: automated testing for recurrent-neural-network-driven dialogue systems\",\"authors\":\"Zixi Liu, Yang Feng, Zhenyu Chen\",\"doi\":\"10.1145/3460319.3464829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the tremendous advancement of recurrent neural networks(RNN), dialogue systems have achieved significant development. Many RNN-driven dialogue systems, such as Siri, Google Home, and Alexa, have been deployed to assist various tasks. However, accompanying this outstanding performance, RNN-driven dialogue systems, which are essentially a kind of software, could also produce erroneous behaviors and result in massive losses. Meanwhile, the complexity and intractability of RNN models that power the dialogue systems make their testing challenging. In this paper, we design and implement DialTest, the first RNN-driven dialogue system testing tool. DialTest employs a series of transformation operators to make realistic changes on seed data while preserving their oracle information properly. To improve the efficiency of detecting faults, DialTest further adopts Gini impurity to guide the test generation process. We conduct extensive experiments to validate DialTest. We first experiment it on two fundamental tasks, i.e., intent detection and slot filling, of natural language understanding. The experiment results show that DialTest can effectively detect hundreds of erroneous behaviors for different RNN-driven natural language understanding (NLU) modules of dialogue systems and improve their accuracy via retraining with the generated data. Further, we conduct a case study on an industrial dialogue system to investigate the performance of DialTest under the real usage scenario. The study shows DialTest can detect errors and improve the robustness of RNN-driven dialogue systems effectively.\",\"PeriodicalId\":188008,\"journal\":{\"name\":\"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460319.3464829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

随着循环神经网络(RNN)的巨大进步,对话系统得到了长足的发展。许多rnn驱动的对话系统,如Siri、Google Home和Alexa,已经被部署来协助各种任务。然而,伴随着这种优异的表现,rnn驱动的对话系统本质上是一种软件,也可能产生错误的行为,造成巨大的损失。同时,为对话系统提供动力的RNN模型的复杂性和难处使其测试具有挑战性。在本文中,我们设计并实现了DialTest,第一个rnn驱动的对话系统测试工具。DialTest使用一系列转换操作符对种子数据进行实际更改,同时正确地保留其oracle信息。为了提高检测故障的效率,DialTest进一步采用基尼杂质来指导测试生成过程。我们进行了大量的实验来验证DialTest。我们首先在自然语言理解的两个基本任务上进行了实验,即意图检测和槽填充。实验结果表明,DialTest可以有效地检测出对话系统中不同的rnn驱动的自然语言理解(NLU)模块的数百种错误行为,并通过对生成的数据进行再训练来提高其准确性。此外,我们对一个工业对话系统进行了案例研究,以调查DialTest在实际使用场景下的性能。研究表明,DialTest可以有效地检测错误,提高rnn驱动对话系统的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DialTest: automated testing for recurrent-neural-network-driven dialogue systems
With the tremendous advancement of recurrent neural networks(RNN), dialogue systems have achieved significant development. Many RNN-driven dialogue systems, such as Siri, Google Home, and Alexa, have been deployed to assist various tasks. However, accompanying this outstanding performance, RNN-driven dialogue systems, which are essentially a kind of software, could also produce erroneous behaviors and result in massive losses. Meanwhile, the complexity and intractability of RNN models that power the dialogue systems make their testing challenging. In this paper, we design and implement DialTest, the first RNN-driven dialogue system testing tool. DialTest employs a series of transformation operators to make realistic changes on seed data while preserving their oracle information properly. To improve the efficiency of detecting faults, DialTest further adopts Gini impurity to guide the test generation process. We conduct extensive experiments to validate DialTest. We first experiment it on two fundamental tasks, i.e., intent detection and slot filling, of natural language understanding. The experiment results show that DialTest can effectively detect hundreds of erroneous behaviors for different RNN-driven natural language understanding (NLU) modules of dialogue systems and improve their accuracy via retraining with the generated data. Further, we conduct a case study on an industrial dialogue system to investigate the performance of DialTest under the real usage scenario. The study shows DialTest can detect errors and improve the robustness of RNN-driven dialogue systems effectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Semantic table structure identification in spreadsheets Parema: an unpacking framework for demystifying VM-based Android packers TERA: optimizing stochastic regression tests in machine learning projects Empirically evaluating readily available information for regression test optimization in continuous integration RESTest: automated black-box testing of RESTful web APIs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1