Topic modeling techniques for text mining over large-scale scientific and biomedical text corpus

{"title":"Topic modeling techniques for text mining over large-scale scientific and biomedical text corpus","authors":"","doi":"10.4018/ijaci.293137","DOIUrl":null,"url":null,"abstract":"Topic models are efficient in extracting central themes from large-scale document collection and it is an active research area. The state-of-the-art techniques like Latent Dirichlet Allocation, Correlated Topic Model (CTM), Hierarchical Dirichlet Process (HDP), Dirichlet Multinomial Regression (DMR) and Hierarchical Pachinko Allocation (HPA) model is considered for comparison. . The abstracts of articles were collected between different periods from PUBMED library by keywords adolescence substance use and depression. A lot of research has happened in this area and thousands of articles are available on PubMed in this area. This collection is huge and so extracting information is very time-consuming. To fit the topic models this extracted text data is used and fitted models were evaluated using both likelihood and non-likelihood measures. The topic models are compared using the evaluation parameters like log-likelihood and perplexity. To evaluate the quality of topics topic coherence measures has been used.","PeriodicalId":51884,"journal":{"name":"International Journal of Ambient Computing and Intelligence","volume":"51 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Ambient Computing and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijaci.293137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 7

Abstract

Topic models are efficient in extracting central themes from large-scale document collection and it is an active research area. The state-of-the-art techniques like Latent Dirichlet Allocation, Correlated Topic Model (CTM), Hierarchical Dirichlet Process (HDP), Dirichlet Multinomial Regression (DMR) and Hierarchical Pachinko Allocation (HPA) model is considered for comparison. . The abstracts of articles were collected between different periods from PUBMED library by keywords adolescence substance use and depression. A lot of research has happened in this area and thousands of articles are available on PubMed in this area. This collection is huge and so extracting information is very time-consuming. To fit the topic models this extracted text data is used and fitted models were evaluated using both likelihood and non-likelihood measures. The topic models are compared using the evaluation parameters like log-likelihood and perplexity. To evaluate the quality of topics topic coherence measures has been used.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大规模科学和生物医学文本语料库文本挖掘的主题建模技术
主题模型是从大规模文档集合中提取中心主题的有效方法,是一个活跃的研究领域。考虑了潜在狄利克雷分配、相关主题模型(CTM)、层次狄利克雷过程(HDP)、狄利克雷多项回归(DMR)和层次柏青哥分配(HPA)模型等最先进的技术进行比较。以青少年物质使用和抑郁为关键词,从PUBMED图书馆中抽取不同时期的文献摘要。在这个领域有很多研究,在PubMed上有成千上万的文章。这个集合非常庞大,因此提取信息非常耗时。为了拟合主题模型,使用提取的文本数据,并使用似然和非似然度量对拟合模型进行评估。使用对数似然和困惑度等评价参数对主题模型进行比较。为了评估主题的质量,使用了主题一致性测量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.50
自引率
0.00%
发文量
30
期刊最新文献
Analysis of Home Furnishing Marketing Based on Internet of Things in the Intelligent Environment Management of New Automatic Ticket Vending Machine System in Urban Rail Transit Threat Attribution and Reasoning for Industrial Control System Asset A Blockchain-Based Security Model for Cloud Accounting Data Management and Optimization Methods of Music Audio-Visual Archives Resources Based on Big Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1