无状态空间压缩的基于pomdp的对话系统策略优化

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424165

Milica Gasic, Matthew Henderson, Blaise Thomson, P. Tsiakoulis, S. Young

{"title":"无状态空间压缩的基于pomdp的对话系统策略优化","authors":"Milica Gasic, Matthew Henderson, Blaise Thomson, P. Tsiakoulis, S. Young","doi":"10.1109/SLT.2012.6424165","DOIUrl":null,"url":null,"abstract":"The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Policy optimisation of POMDP-based dialogue systems without state space compression\",\"authors\":\"Milica Gasic, Matthew Henderson, Blaise Thomson, P. Tsiakoulis, S. Young\",\"doi\":\"10.1109/SLT.2012.6424165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance.\",\"PeriodicalId\":375378,\"journal\":{\"name\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"212 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2012.6424165\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

部分可观察马尔可夫决策过程(POMDP)作为一种对话模型被提出，它能够自动改进对话策略和对语音理解错误的鲁棒性。然而，这需要大量的对话来训练对话政策。高斯过程(GP)最近被应用于POMDP对话管理优化，显示出大大提高学习速度的能力。在这里，我们使用对话状态对话管理器的贝叶斯更新来进一步研究这个问题。我们证明了直接将高斯过程应用于信念状态是可能的，从而消除了对参数策略表示的需要。此外，生成的策略在保持操作性能的同时学习速度显著加快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Policy optimisation of POMDP-based dialogue systems without state space compression

The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量