Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown
{"title":"无监督文本分割技术对白文本分割性能的比较研究","authors":"Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown","doi":"10.1109/SIEDS49339.2020.9106639","DOIUrl":null,"url":null,"abstract":"Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.","PeriodicalId":331495,"journal":{"name":"2020 Systems and Information Engineering Design Symposium (SIEDS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Comparative Study of the Performance of Unsupervised Text Segmentation Techniques on Dialogue Transcripts\",\"authors\":\"Vidhi Gupta, Guangda Zhu, Andi Yu, Donald E. Brown\",\"doi\":\"10.1109/SIEDS49339.2020.9106639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.\",\"PeriodicalId\":331495,\"journal\":{\"name\":\"2020 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS49339.2020.9106639\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS49339.2020.9106639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Study of the Performance of Unsupervised Text Segmentation Techniques on Dialogue Transcripts
Contact centers provide customer interaction support to numerous organizations. In 2017, the contact center industry generated $200 billion in revenue worldwide, contributing to a significant proportion of market share, and yet businesses lost $75 billion due to poor customer satisfaction. Around 48% of consumers prefer using phones as their mode of communication with contact centers. Analysis of these calls can give insights into customer views and help businesses improve their customer engagement. To understand the structure and flow of the conversation, the conversation transcript can be segmented into meaningful sections such as “greeting exchange” “problem description” and “problem resolution”, to name a few. In this paper, we present a comparative study of various unsupervised methods of dialogue segmentation. We choose three classic unsupervised text segmentation techniques: TextTiling, TopicTiling, and Content Vector Segmentation, and evaluate their performance on 50 manually labeled dialogue conversation transcripts. The transcripts used span across contact center calls, live chat, interactions with chat-bots and talk show conversations. Additionally, we build on the TextTiling algorithm by incorporating semantic word embeddings for text representation. We show that this modification outperforms the three benchmarked approaches with a mean Pk value of 0.31, indicating that 69% of the boundaries are identified accurately at an average.