Generating Predictable and Adaptive Dialog Policies in Single- and Multi-domain Goal-oriented Dialog Systems

Int. J. Semantic Comput. Pub Date : 2021-12-01 DOI:10.1142/s1793351x21400109

Nhat X. T. Le, A.B. Siddique, Fuad Jamour, Samet Oymak, Vagelis Hristidis

{"title":"Generating Predictable and Adaptive Dialog Policies in Single- and Multi-domain Goal-oriented Dialog Systems","authors":"Nhat X. T. Le, A.B. Siddique, Fuad Jamour, Samet Oymak, Vagelis Hristidis","doi":"10.1142/s1793351x21400109","DOIUrl":null,"url":null,"abstract":"Most existing commercial goal-oriented chatbots are diagram-based; i.e. they follow a rigid dialog flow to fill the slot values needed to achieve a user’s goal. Diagram-based chatbots are predictable, thus their adoption in commercial settings; however, their lack of flexibility may cause many users to leave the conversation before achieving their goal. On the other hand, state-of-the-art research chatbots use Reinforcement Learning (RL) to generate flexible dialog policies. However, such chatbots can be unpredictable, may violate the intended business constraints, and require large training datasets to produce a mature policy. We propose a framework that achieves a middle ground between the diagram-based and RL-based chatbots: we constrain the space of possible chatbot responses using a novel structure, the chatbot dependency graph, and use RL to dynamically select the best valid responses. Dependency graphs are directed graphs that conveniently express a chatbot’s logic by defining the dependencies among slots: all valid dialog flows are encapsulated in one dependency graph. Our experiments in both single-domain and multi-domain settings show that our framework quickly adapts to user characteristics and achieves up to 23.77% improved success rate compared to a state-of-the-art RL model.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1793351x21400109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Most existing commercial goal-oriented chatbots are diagram-based; i.e. they follow a rigid dialog flow to fill the slot values needed to achieve a user’s goal. Diagram-based chatbots are predictable, thus their adoption in commercial settings; however, their lack of flexibility may cause many users to leave the conversation before achieving their goal. On the other hand, state-of-the-art research chatbots use Reinforcement Learning (RL) to generate flexible dialog policies. However, such chatbots can be unpredictable, may violate the intended business constraints, and require large training datasets to produce a mature policy. We propose a framework that achieves a middle ground between the diagram-based and RL-based chatbots: we constrain the space of possible chatbot responses using a novel structure, the chatbot dependency graph, and use RL to dynamically select the best valid responses. Dependency graphs are directed graphs that conveniently express a chatbot’s logic by defining the dependencies among slots: all valid dialog flows are encapsulated in one dependency graph. Our experiments in both single-domain and multi-domain settings show that our framework quickly adapts to user characteristics and achieves up to 23.77% improved success rate compared to a state-of-the-art RL model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在单域和多域目标导向对话系统中生成可预测和自适应的对话策略

大多数现有的商业目标导向聊天机器人都是基于图表的;也就是说，它们遵循严格的对话流程来填充实现用户目标所需的槽值。基于图表的聊天机器人是可预测的，因此它们在商业环境中的采用;然而，它们缺乏灵活性可能会导致许多用户在实现目标之前离开对话。另一方面，最先进的研究聊天机器人使用强化学习(RL)来生成灵活的对话策略。然而，这种聊天机器人可能是不可预测的，可能违反预期的业务约束，并且需要大量的训练数据集来生成成熟的策略。我们提出了一个介于基于图的聊天机器人和基于强化学习的聊天机器人之间的框架:我们使用一种新的结构，即聊天机器人依赖图来约束可能的聊天机器人响应的空间，并使用强化学习来动态选择最佳有效响应。依赖图是有向图，它通过定义插槽之间的依赖关系来方便地表达聊天机器人的逻辑:所有有效的对话流都封装在一个依赖图中。我们在单域和多域设置下的实验表明，我们的框架可以快速适应用户特征，与最先进的强化学习模型相比，成功率提高了23.77%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Int. J. Semantic Comput.

自引率

0.00%

发文量