On-line policy optimisation of spoken dialogue systems via live interaction with human subjects

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163950

Milica Gasic, Filip Jurcícek, Blaise Thomson, Kai Yu, S. Young

引用次数: 83

Abstract

Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过与人类受试者的实时互动，在线策略优化口语对话系统

统计对话模型需要大量的对话来优化对话策略，依赖于模拟用户的使用。这导致了训练和实际条件之间的不匹配，以及模拟器的重大开发成本，从而减轻了此类模型所声称的许多好处。最近对高斯过程强化学习的研究表明，学习可以大大加速。本文报告了一个实验，使用用户提供的奖励直接从人类交互中学习现实世界任务的策略。它表明，一个可用的策略可以在几百个对话中学习，而不需要用户模拟器，并且使用一种降低采取不良行为风险的学习策略。本文还研究了系统继续学习数千个对话时的适应行为，并强调了对噪声奖励的鲁棒性的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量

期刊最新文献

Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech