主题建模作为分析图书馆聊天记录的工具

IF 1.5 4区 管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Technology and Libraries Pub Date : 2021-09-20 DOI:10.6017/ital.v40i3.13333
HyunSeung Koh, M. Fienup
{"title":"主题建模作为分析图书馆聊天记录的工具","authors":"HyunSeung Koh, M. Fienup","doi":"10.6017/ital.v40i3.13333","DOIUrl":null,"url":null,"abstract":"Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.","PeriodicalId":50361,"journal":{"name":"Information Technology and Libraries","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Topic Modeling as a Tool for Analyzing Library Chat Transcripts\",\"authors\":\"HyunSeung Koh, M. Fienup\",\"doi\":\"10.6017/ital.v40i3.13333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.\",\"PeriodicalId\":50361,\"journal\":{\"name\":\"Information Technology and Libraries\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2021-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Technology and Libraries\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.6017/ital.v40i3.13333\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Libraries","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.6017/ital.v40i3.13333","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 4

摘要

图书馆聊天服务是连接读者与图书馆资源和服务的一个日益重要的沟通渠道。对聊天记录的分析可以为图书馆员提供改进服务的见解。不幸的是,聊天记录由非结构化文本数据组成,这使得图书管理员无法使用现有工具进行简单的定量分析(例如,聊天持续时间、消息计数、词频)。作为更复杂的聊天记录分析工具的基石,本研究研究了不同类型的话题建模技术的应用,分析了一个学术图书馆从2015年4月10日到2019年5月31日收集的聊天参考数据,目的是提取最准确和最容易解释的话题。在本研究中,主题准确性和可解释性——主题结果的质量——用主题一致性指标进行了定量测量。此外,定性的准确性和可解释性是由本文的图书馆员作者根据主题是否与学术图书馆环境中的常见问题或容易推断的主题相一致的主观判断来衡量的。本研究发现,从人类的定性评价来看,概率潜在语义分析(pLSA)产生了更准确和可解释的主题,这与使用所有三种主题一致性指标的定量评估结果并不一定一致。有趣的是,通常使用的潜狄利克雷分配(LDA)技术并不一定比pLSA表现更好。此外,使用人工策划的关联解释锚词(CorEx)或引导LDA (GuidedLDA)的半监督技术并不一定比Dirichlet多项式混合(DMM)的无监督技术表现更好。最后,研究发现,在提高主题结果的质量方面,使用完整的文字记录,包括图书馆顾客和图书管理员之间互动的双方,比只使用图书馆顾客提出的最初问题表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Topic Modeling as a Tool for Analyzing Library Chat Transcripts
Library chat services are an increasingly important communication channel to connect patrons to library resources and services. Analysis of chat transcripts could provide librarians with insights into improving services. Unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. As a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from April 10, 2015, to May 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. In this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. Additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. This study found that from a human’s qualitative evaluation, Probabilistic Latent Semantic Analysis (pLSA) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. Interestingly, the commonly used technique Latent Dirichlet Allocation (LDA) did not necessarily perform better than pLSA. Also, semi-supervised techniques with human-curated anchor words of Correlation Explanation (CorEx) or guided LDA (GuidedLDA) did not necessarily perform better than an unsupervised technique of Dirichlet Multinomial Mixture (DMM). Last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Technology and Libraries
Information Technology and Libraries 管理科学-计算机:信息系统
CiteScore
2.90
自引率
5.60%
发文量
25
审稿时长
1 months
期刊介绍: Information Technology and Libraries publishes original material related to all aspects of information technology in all types of libraries. Topic areas include, but are not limited to, library automation, digital libraries, metadata, identity management, distributed systems and networks, computer security, intellectual property rights, technical standards, geographic information systems, desktop applications, information discovery tools, web-scale library services, cloud computing, digital preservation, data curation, virtualization, search-engine optimization, emerging technologies, social networking, open data, the semantic web, mobile services and applications, usability, universal access to technology, library consortia, vendor relations, and digital humanities.
期刊最新文献
Response to "From ChatGPT to CatGPT" To Thine Own 3D Selfie Be True Towards an Open Source-first Praxis in Libraries Response to "From ChatGPT to CatGPT" Drained-pool Politics Versus Digital Libraries in U.S. Cyberspace
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1