{"title":"DialTest:自动测试循环神经网络驱动的对话系统","authors":"Zixi Liu, Yang Feng, Zhenyu Chen","doi":"10.1145/3460319.3464829","DOIUrl":null,"url":null,"abstract":"With the tremendous advancement of recurrent neural networks(RNN), dialogue systems have achieved significant development. Many RNN-driven dialogue systems, such as Siri, Google Home, and Alexa, have been deployed to assist various tasks. However, accompanying this outstanding performance, RNN-driven dialogue systems, which are essentially a kind of software, could also produce erroneous behaviors and result in massive losses. Meanwhile, the complexity and intractability of RNN models that power the dialogue systems make their testing challenging. In this paper, we design and implement DialTest, the first RNN-driven dialogue system testing tool. DialTest employs a series of transformation operators to make realistic changes on seed data while preserving their oracle information properly. To improve the efficiency of detecting faults, DialTest further adopts Gini impurity to guide the test generation process. We conduct extensive experiments to validate DialTest. We first experiment it on two fundamental tasks, i.e., intent detection and slot filling, of natural language understanding. The experiment results show that DialTest can effectively detect hundreds of erroneous behaviors for different RNN-driven natural language understanding (NLU) modules of dialogue systems and improve their accuracy via retraining with the generated data. Further, we conduct a case study on an industrial dialogue system to investigate the performance of DialTest under the real usage scenario. The study shows DialTest can detect errors and improve the robustness of RNN-driven dialogue systems effectively.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"DialTest: automated testing for recurrent-neural-network-driven dialogue systems\",\"authors\":\"Zixi Liu, Yang Feng, Zhenyu Chen\",\"doi\":\"10.1145/3460319.3464829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the tremendous advancement of recurrent neural networks(RNN), dialogue systems have achieved significant development. Many RNN-driven dialogue systems, such as Siri, Google Home, and Alexa, have been deployed to assist various tasks. However, accompanying this outstanding performance, RNN-driven dialogue systems, which are essentially a kind of software, could also produce erroneous behaviors and result in massive losses. Meanwhile, the complexity and intractability of RNN models that power the dialogue systems make their testing challenging. In this paper, we design and implement DialTest, the first RNN-driven dialogue system testing tool. DialTest employs a series of transformation operators to make realistic changes on seed data while preserving their oracle information properly. To improve the efficiency of detecting faults, DialTest further adopts Gini impurity to guide the test generation process. We conduct extensive experiments to validate DialTest. We first experiment it on two fundamental tasks, i.e., intent detection and slot filling, of natural language understanding. The experiment results show that DialTest can effectively detect hundreds of erroneous behaviors for different RNN-driven natural language understanding (NLU) modules of dialogue systems and improve their accuracy via retraining with the generated data. Further, we conduct a case study on an industrial dialogue system to investigate the performance of DialTest under the real usage scenario. The study shows DialTest can detect errors and improve the robustness of RNN-driven dialogue systems effectively.\",\"PeriodicalId\":188008,\"journal\":{\"name\":\"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460319.3464829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DialTest: automated testing for recurrent-neural-network-driven dialogue systems
With the tremendous advancement of recurrent neural networks(RNN), dialogue systems have achieved significant development. Many RNN-driven dialogue systems, such as Siri, Google Home, and Alexa, have been deployed to assist various tasks. However, accompanying this outstanding performance, RNN-driven dialogue systems, which are essentially a kind of software, could also produce erroneous behaviors and result in massive losses. Meanwhile, the complexity and intractability of RNN models that power the dialogue systems make their testing challenging. In this paper, we design and implement DialTest, the first RNN-driven dialogue system testing tool. DialTest employs a series of transformation operators to make realistic changes on seed data while preserving their oracle information properly. To improve the efficiency of detecting faults, DialTest further adopts Gini impurity to guide the test generation process. We conduct extensive experiments to validate DialTest. We first experiment it on two fundamental tasks, i.e., intent detection and slot filling, of natural language understanding. The experiment results show that DialTest can effectively detect hundreds of erroneous behaviors for different RNN-driven natural language understanding (NLU) modules of dialogue systems and improve their accuracy via retraining with the generated data. Further, we conduct a case study on an industrial dialogue system to investigate the performance of DialTest under the real usage scenario. The study shows DialTest can detect errors and improve the robustness of RNN-driven dialogue systems effectively.