使用基于作者-主题的表示跟踪对话框状态

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846316

Richard Dufour, Mohamed Morchid, Titouan Parcollet

{"title":"使用基于作者-主题的表示跟踪对话框状态","authors":"Richard Dufour, Mohamed Morchid, Titouan Parcollet","doi":"10.1109/SLT.2016.7846316","DOIUrl":null,"url":null,"abstract":"Automatically translating textual documents from one language to another inevitably results in translation errors. In addition to language specificities, this automatic translation appears more difficult in the context of spoken dialogues since, for example, the language register is far from “clean speech”. Speech analytics suffer from these translation errors. To tackle this difficulty, a solution consists in mapping translations into a space of hidden topics. In the classical topic-based representation obtained from a Latent Dirichlet Allocation (LDA), distribution of words into each topic is estimated automatically. Nonetheless, the targeted classes are ignored in the particular context of a classification task. In the DSTC5 main task, this targeted class information is crucial, the main objective being to track dialog states for sub-dialog segments. For this challenge, we propose to apply an original topic-based representation for each sub-dialogue based not only on the sub-dialogue content itself (words), but also on the dialogue state related to the sub-dialogue. This original representation is based on the Author-Topic (AT) model, previously successfully applied on a different classification task. Promising results confirmed the interest of such a method, the AT model reaching performance slightly better in terms of F-measure than baseline ones given by the task's organizers.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Tracking dialog states using an Author-Topic based representation\",\"authors\":\"Richard Dufour, Mohamed Morchid, Titouan Parcollet\",\"doi\":\"10.1109/SLT.2016.7846316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically translating textual documents from one language to another inevitably results in translation errors. In addition to language specificities, this automatic translation appears more difficult in the context of spoken dialogues since, for example, the language register is far from “clean speech”. Speech analytics suffer from these translation errors. To tackle this difficulty, a solution consists in mapping translations into a space of hidden topics. In the classical topic-based representation obtained from a Latent Dirichlet Allocation (LDA), distribution of words into each topic is estimated automatically. Nonetheless, the targeted classes are ignored in the particular context of a classification task. In the DSTC5 main task, this targeted class information is crucial, the main objective being to track dialog states for sub-dialog segments. For this challenge, we propose to apply an original topic-based representation for each sub-dialogue based not only on the sub-dialogue content itself (words), but also on the dialogue state related to the sub-dialogue. This original representation is based on the Author-Topic (AT) model, previously successfully applied on a different classification task. Promising results confirmed the interest of such a method, the AT model reaching performance slightly better in terms of F-measure than baseline ones given by the task's organizers.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846316\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

自动将文本文档从一种语言翻译成另一种语言不可避免地会导致翻译错误。除了语言的特殊性之外，这种自动翻译在口语对话的语境中显得更加困难，例如，语言域远非“干净的语言”。语音分析深受这些翻译错误之苦。要解决这个困难，一个解决方案是将翻译映射到隐藏主题的空间。在经典的基于主题的表示中，潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)自动估计单词在每个主题中的分布。尽管如此，目标类在分类任务的特定上下文中被忽略。在DSTC5的主要任务中，目标类信息是至关重要的，主要目标是跟踪子对话段的对话状态。对于这一挑战，我们建议对每个子对话应用原始的基于主题的表示，不仅基于子对话内容本身(单词)，而且基于与子对话相关的对话状态。这种原始表示基于作者-主题(AT)模型，该模型以前成功地应用于不同的分类任务。有希望的结果证实了这种方法的兴趣，AT模型在f测量方面达到的性能略好于任务组织者给出的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Tracking dialog states using an Author-Topic based representation

Automatically translating textual documents from one language to another inevitably results in translation errors. In addition to language specificities, this automatic translation appears more difficult in the context of spoken dialogues since, for example, the language register is far from “clean speech”. Speech analytics suffer from these translation errors. To tackle this difficulty, a solution consists in mapping translations into a space of hidden topics. In the classical topic-based representation obtained from a Latent Dirichlet Allocation (LDA), distribution of words into each topic is estimated automatically. Nonetheless, the targeted classes are ignored in the particular context of a classification task. In the DSTC5 main task, this targeted class information is crucial, the main objective being to track dialog states for sub-dialog segments. For this challenge, we propose to apply an original topic-based representation for each sub-dialogue based not only on the sub-dialogue content itself (words), but also on the dialogue state related to the sub-dialogue. This original representation is based on the Author-Topic (AT) model, previously successfully applied on a different classification task. Promising results confirmed the interest of such a method, the AT model reaching performance slightly better in terms of F-measure than baseline ones given by the task's organizers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量