无监督文本分割技术对白文本分割性能的比较研究

2020 Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2020-04-01 DOI:10.1109/SIEDS49339.2020.9106639

Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown

{"title":"无监督文本分割技术对白文本分割性能的比较研究","authors":"Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown","doi":"10.1109/SIEDS49339.2020.9106639","DOIUrl":null,"url":null,"abstract":"Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.","PeriodicalId":331495,"journal":{"name":"2020 Systems and Information Engineering Design Symposium (SIEDS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Comparative Study of the Performance of Unsupervised Text Segmentation Techniques on Dialogue Transcripts\",\"authors\":\"Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown\",\"doi\":\"10.1109/SIEDS49339.2020.9106639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.\",\"PeriodicalId\":331495,\"journal\":{\"name\":\"2020 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS49339.2020.9106639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS49339.2020.9106639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

联络中心为许多组织提供客户交互支持。2017年，呼叫中心行业在全球创造了2000亿美元的收入，占据了相当大的市场份额，但由于客户满意度不佳，企业损失了750亿美元。大约48%的消费者更喜欢使用电话作为他们与联络中心的沟通方式。对这些电话的分析可以洞察客户的观点，并帮助企业提高客户参与度。为了理解对话的结构和流程，对话记录可以被分割成有意义的部分，如“问候交流”、“问题描述”和“问题解决”等等。本文对各种无监督的对话分割方法进行了比较研究。我们选择了三种经典的无监督文本分割技术:TextTiling, TopicTiling和内容向量分割，并在50个手动标记的对话对话文本上评估了它们的性能。使用的文字记录涵盖了呼叫中心呼叫、实时聊天、与聊天机器人的互动以及脱口秀对话。此外，我们在TextTiling算法的基础上，结合了用于文本表示的语义词嵌入。我们表明，这种修改优于三种基准方法，其平均Pk值为0.31，表明平均69%的边界被准确识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Comparative Study of the Performance of Unsupervised Text Segmentation Techniques on Dialogue Transcripts

Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量

期刊最新文献

Measuring Automation Bias and Complacency in an X-Ray Screening Task Criminal Consistency and Distinctiveness Evaluating and Improving Attrition Models for the Retail Banking Industry SIEDS 2020 TOC Automated Rotor Assembly CNC Machine