{"title":"基于语句重写的无监督对话主题分割模型","authors":"Xia Hou, Qifeng Li, Tongliang Li","doi":"arxiv-2409.07672","DOIUrl":null,"url":null,"abstract":"Dialogue topic segmentation plays a crucial role in various types of dialogue\nmodeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware\ndiscourse representations from conversation data through adjacent discourse\nmatching and pseudo segmentation to further mine useful clues in unlabeled\nconversational relations. However, in multi-round dialogs, discourses often\nhave co-references or omissions, leading to the fact that direct use of these\ndiscourses for representation learning may negatively affect the semantic\nsimilarity computation in the neighboring discourse matching task. In order to\nfully utilize the useful cues in conversational relations, this study proposes\na novel unsupervised dialog topic segmentation method that combines the\nUtterance Rewriting (UR) technique with an unsupervised learning algorithm to\nefficiently utilize the useful cues in unlabeled dialogs by rewriting the\ndialogs in order to recover the co-referents and omitted words. Compared with\nexisting unsupervised models, the proposed Discourse Rewriting Topic\nSegmentation Model (UR-DTS) significantly improves the accuracy of topic\nsegmentation. The main finding is that the performance on DialSeg711 improves\nby about 6% in terms of absolute error score and WD, achieving 11.42% in terms\nof absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute\nerror score and WD improves by about 3% and 2%, respectively, resulting in SOTA\nreaching 35.17% in terms of absolute error score and 38.49% in terms of WD.\nThis shows that the model is very effective in capturing the nuances of\nconversational topics, as well as the usefulness and challenges of utilizing\nunlabeled conversations.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting\",\"authors\":\"Xia Hou, Qifeng Li, Tongliang Li\",\"doi\":\"arxiv-2409.07672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dialogue topic segmentation plays a crucial role in various types of dialogue\\nmodeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware\\ndiscourse representations from conversation data through adjacent discourse\\nmatching and pseudo segmentation to further mine useful clues in unlabeled\\nconversational relations. However, in multi-round dialogs, discourses often\\nhave co-references or omissions, leading to the fact that direct use of these\\ndiscourses for representation learning may negatively affect the semantic\\nsimilarity computation in the neighboring discourse matching task. In order to\\nfully utilize the useful cues in conversational relations, this study proposes\\na novel unsupervised dialog topic segmentation method that combines the\\nUtterance Rewriting (UR) technique with an unsupervised learning algorithm to\\nefficiently utilize the useful cues in unlabeled dialogs by rewriting the\\ndialogs in order to recover the co-referents and omitted words. Compared with\\nexisting unsupervised models, the proposed Discourse Rewriting Topic\\nSegmentation Model (UR-DTS) significantly improves the accuracy of topic\\nsegmentation. The main finding is that the performance on DialSeg711 improves\\nby about 6% in terms of absolute error score and WD, achieving 11.42% in terms\\nof absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute\\nerror score and WD improves by about 3% and 2%, respectively, resulting in SOTA\\nreaching 35.17% in terms of absolute error score and 38.49% in terms of WD.\\nThis shows that the model is very effective in capturing the nuances of\\nconversational topics, as well as the usefulness and challenges of utilizing\\nunlabeled conversations.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07672\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting
Dialogue topic segmentation plays a crucial role in various types of dialogue
modeling tasks. The state-of-the-art unsupervised DTS methods learn topic-aware
discourse representations from conversation data through adjacent discourse
matching and pseudo segmentation to further mine useful clues in unlabeled
conversational relations. However, in multi-round dialogs, discourses often
have co-references or omissions, leading to the fact that direct use of these
discourses for representation learning may negatively affect the semantic
similarity computation in the neighboring discourse matching task. In order to
fully utilize the useful cues in conversational relations, this study proposes
a novel unsupervised dialog topic segmentation method that combines the
Utterance Rewriting (UR) technique with an unsupervised learning algorithm to
efficiently utilize the useful cues in unlabeled dialogs by rewriting the
dialogs in order to recover the co-referents and omitted words. Compared with
existing unsupervised models, the proposed Discourse Rewriting Topic
Segmentation Model (UR-DTS) significantly improves the accuracy of topic
segmentation. The main finding is that the performance on DialSeg711 improves
by about 6% in terms of absolute error score and WD, achieving 11.42% in terms
of absolute error score and 12.97% in terms of WD. on Doc2Dial the absolute
error score and WD improves by about 3% and 2%, respectively, resulting in SOTA
reaching 35.17% in terms of absolute error score and 38.49% in terms of WD.
This shows that the model is very effective in capturing the nuances of
conversational topics, as well as the usefulness and challenges of utilizing
unlabeled conversations.