A frequent keyword-set based algorithm for topic modeling and clustering of research papers

Kumar Shubankar, A. Singh, Vikram Pudi
{"title":"A frequent keyword-set based algorithm for topic modeling and clustering of research papers","authors":"Kumar Shubankar, A. Singh, Vikram Pudi","doi":"10.1109/DMO.2011.5976511","DOIUrl":null,"url":null,"abstract":"In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 3rd Conference on Data Mining and Optimization (DMO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMO.2011.5976511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于频繁关键词集的研究论文主题建模与聚类算法
在本文中,我们介绍了一种新颖而有效的方法来检测大型研究论文语料库中的主题。随着学术文献数量的迅速增长,主题检测问题已经成为一项非常具有挑战性的任务。我们提出了一种独特的方法,使用封闭的频繁关键字集来形成主题。我们的方法还提供了一种自然的方法,将研究论文聚类成分层的,重叠的聚类,使用主题作为相似性度量。为了在主题聚类中对研究论文进行排名,我们设计了一种改进的PageRank算法,该算法通过考虑研究论文出现的子图,为每篇研究论文分配权威分数。我们在DBLP数据集上测试了我们的算法,实验表明我们的算法是快速、有效和可扩展的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparison of various Wiener model identification approach in modelling nonlinear process Data mining technique for expertise search in a special interest group knowledge portal A frequent keyword-set based algorithm for topic modeling and clustering of research papers Optimisation model of selective cutting for Timber Harvest Planning in Peninsular Malaysia Reducing network intrusion detection association rules using Chi-Squared pruning technique
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1