Assessing text mining algorithm outcomes

IF 1.7 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Business Analytics Pub Date : 2020-06-25 DOI:10.1080/2573234x.2020.1785342
Triss Ashton, Nicholas E. Evangelopoulos, A. Paswan, V. Prybutok, R. Pavur
{"title":"Assessing text mining algorithm outcomes","authors":"Triss Ashton, Nicholas E. Evangelopoulos, A. Paswan, V. Prybutok, R. Pavur","doi":"10.1080/2573234x.2020.1785342","DOIUrl":null,"url":null,"abstract":"ABSTRACT There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":"15 1","pages":"107 - 121"},"PeriodicalIF":1.7000,"publicationDate":"2020-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234x.2020.1785342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 3

Abstract

ABSTRACT There is a surge in the development of decision-oriented analysis tools intended to extract actionable information from text. These tools integrate various text-mining methods that were performance tested in a manner that was often biased toward the new system. Those tests primarily utilised descriptive measurement criteria and test datasets that are inconsistent with most business corpora. We propose and test a user-oriented judgment approach that allows testing under controlled customer-oriented corpora and generates effect size measures. To illustrate the approach, customer relations data was analysed by latent semantic analysis and latent Dirichlet analysis with results evaluated by prospective business analysts. Reporting includes comparisons of results with published literature. While the research centres on the context-region text-mining systems, literature comparisons include word-embedding methods. The analysis concludes that none of the systems reviewed possess a repeatable statistical advantage over the others. Instead, distribution attributes, algorithm configuration, and the evaluation task drive results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估文本挖掘算法的结果
有一个以决策为导向的分析工具旨在从文本中提取可操作的信息的发展激增。这些工具集成了各种文本挖掘方法,这些方法以一种通常偏向于新系统的方式进行了性能测试。这些测试主要使用与大多数业务语料库不一致的描述性度量标准和测试数据集。我们提出并测试了一种面向用户的判断方法,该方法允许在受控的面向客户的语料库下进行测试,并生成效应大小测量。为了说明这种方法,客户关系数据通过潜在语义分析和潜在狄利克雷分析进行分析,结果由潜在业务分析师进行评估。报告包括结果与已发表文献的比较。虽然研究集中在上下文区域文本挖掘系统上,但文献比较包括词嵌入方法。分析得出的结论是,所审查的系统中没有一个比其他系统具有可重复的统计优势。相反,分布属性、算法配置和评估任务驱动结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Business Analytics
Journal of Business Analytics Business, Management and Accounting-Management Information Systems
CiteScore
2.50
自引率
0.00%
发文量
13
期刊最新文献
Exploring the relationship between YouTube video optimisation practices and video rankings for online marketing: a machine learning approach The era of business analytics: identifying and ranking the differences between business intelligence and data science from practitioners’ perspective using the Delphi method Intelligent decision support system using nested ensemble approach for customer churn in the hotel industry Introducing technological disruption: how breaking media attention on corporate events impacts online sentiment An adaptive and enhanced framework for daily stock market prediction using feature selection and ensemble learning algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1