Topic Modeling as a Tool for Analyzing Library Chat Transcripts

IF 1.3 4区管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Technology and Libraries Pub Date : 2021-09-20 DOI:10.6017/ital.v40i3.13333

HyunSeung Koh, M. Fienup

{"title":"Topic Modeling as a Tool for Analyzing Library Chat Transcripts","authors":"HyunSeung Koh, M. Fienup","doi":"10.6017/ital.v40i3.13333","DOIUrl":null,"url":null,"abstract":"Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.","PeriodicalId":50361,"journal":{"name":"Information Technology and Libraries","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Libraries","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.6017/ital.v40i3.13333","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 4

Abstract

Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

主题建模作为分析图书馆聊天记录的工具

图书馆聊天服务是连接读者与图书馆资源和服务的一个日益重要的沟通渠道。对聊天记录的分析可以为图书馆员提供改进服务的见解。不幸的是，聊天记录由非结构化文本数据组成，这使得图书管理员无法使用现有工具进行简单的定量分析(例如，聊天持续时间、消息计数、词频)。作为更复杂的聊天记录分析工具的基石，本研究研究了不同类型的话题建模技术的应用，分析了一个学术图书馆从2015年4月10日到2019年5月31日收集的聊天参考数据，目的是提取最准确和最容易解释的话题。在本研究中，主题准确性和可解释性——主题结果的质量——用主题一致性指标进行了定量测量。此外，定性的准确性和可解释性是由本文的图书馆员作者根据主题是否与学术图书馆环境中的常见问题或容易推断的主题相一致的主观判断来衡量的。本研究发现，从人类的定性评价来看，概率潜在语义分析(pLSA)产生了更准确和可解释的主题，这与使用所有三种主题一致性指标的定量评估结果并不一定一致。有趣的是，通常使用的潜狄利克雷分配(LDA)技术并不一定比pLSA表现更好。此外，使用人工策划的关联解释锚词(CorEx)或引导LDA (GuidedLDA)的半监督技术并不一定比Dirichlet多项式混合(DMM)的无监督技术表现更好。最后，研究发现，在提高主题结果的质量方面，使用完整的文字记录，包括图书馆顾客和图书管理员之间互动的双方，比只使用图书馆顾客提出的最初问题表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Technology and Libraries 管理科学-计算机：信息系统

CiteScore

2.90

自引率

5.60%

发文量

审稿时长

1 months

期刊介绍： Information Technology and Libraries publishes original material related to all aspects of information technology in all types of libraries. Topic areas include, but are not limited to, library automation, digital libraries, metadata, identity management, distributed systems and networks, computer security, intellectual property rights, technical standards, geographic information systems, desktop applications, information discovery tools, web-scale library services, cloud computing, digital preservation, data curation, virtualization, search-engine optimization, emerging technologies, social networking, open data, the semantic web, mobile services and applications, usability, universal access to technology, library consortia, vendor relations, and digital humanities.

期刊最新文献

Response to "From ChatGPT to CatGPT" To Thine Own 3D Selfie Be True Towards an Open Source-first Praxis in Libraries Response to "From ChatGPT to CatGPT" Drained-pool Politics Versus Digital Libraries in U.S. Cyberspace