Richard Dufour, Mohamed Morchid, Titouan Parcollet
{"title":"Tracking dialog states using an Author-Topic based representation","authors":"Richard Dufour, Mohamed Morchid, Titouan Parcollet","doi":"10.1109/SLT.2016.7846316","DOIUrl":null,"url":null,"abstract":"Automatically translating textual documents from one language to another inevitably results in translation errors. In addition to language specificities, this automatic translation appears more difficult in the context of spoken dialogues since, for example, the language register is far from “clean speech”. Speech analytics suffer from these translation errors. To tackle this difficulty, a solution consists in mapping translations into a space of hidden topics. In the classical topic-based representation obtained from a Latent Dirichlet Allocation (LDA), distribution of words into each topic is estimated automatically. Nonetheless, the targeted classes are ignored in the particular context of a classification task. In the DSTC5 main task, this targeted class information is crucial, the main objective being to track dialog states for sub-dialog segments. For this challenge, we propose to apply an original topic-based representation for each sub-dialogue based not only on the sub-dialogue content itself (words), but also on the dialogue state related to the sub-dialogue. This original representation is based on the Author-Topic (AT) model, previously successfully applied on a different classification task. Promising results confirmed the interest of such a method, the AT model reaching performance slightly better in terms of F-measure than baseline ones given by the task's organizers.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Automatically translating textual documents from one language to another inevitably results in translation errors. In addition to language specificities, this automatic translation appears more difficult in the context of spoken dialogues since, for example, the language register is far from “clean speech”. Speech analytics suffer from these translation errors. To tackle this difficulty, a solution consists in mapping translations into a space of hidden topics. In the classical topic-based representation obtained from a Latent Dirichlet Allocation (LDA), distribution of words into each topic is estimated automatically. Nonetheless, the targeted classes are ignored in the particular context of a classification task. In the DSTC5 main task, this targeted class information is crucial, the main objective being to track dialog states for sub-dialog segments. For this challenge, we propose to apply an original topic-based representation for each sub-dialogue based not only on the sub-dialogue content itself (words), but also on the dialogue state related to the sub-dialogue. This original representation is based on the Author-Topic (AT) model, previously successfully applied on a different classification task. Promising results confirmed the interest of such a method, the AT model reaching performance slightly better in terms of F-measure than baseline ones given by the task's organizers.