A multiple k-means cluster ensemble framework for clustering citation trajectories

IF 3.4 2区 管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Informetrics Pub Date : 2024-02-13 DOI:10.1016/j.joi.2024.101507
Joyita Chakraborty , Dinesh K. Pradhan , Subrata Nandi
{"title":"A multiple k-means cluster ensemble framework for clustering citation trajectories","authors":"Joyita Chakraborty ,&nbsp;Dinesh K. Pradhan ,&nbsp;Subrata Nandi","doi":"10.1016/j.joi.2024.101507","DOIUrl":null,"url":null,"abstract":"<div><p>Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window (2-5 years). Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non-linear and non-stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule-based approach. All methods are primarily parameter-dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalized clustering framework is required. This paper proposes a <em>feature-based multiple k-means cluster ensemble framework</em>. Multiple learners are trained for evaluating the credibility of class labels, unlike single clustering algorithms. 195,783 and 41,732 well-cited articles from the Microsoft Academic Graph data are considered for clustering short-term (10-year) and long-term (30-year) trajectories, respectively. It has linear run-time. Four distinct trajectories are obtained – <em>Early Rise-Rapid Decline (ER-RD)</em> (2.2%), <em>Early Rise-Slow Decline (ER-SD)</em> (45%), <em>Delayed Rise-Not yet Declined (DR-ND)</em> (53%), and <em>Delayed Rise-Slow Decline (DR-SD)</em> (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit <em>ER-SD</em> and <em>DR-ND</em> patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories' are re-defined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.</p></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000208","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window (2-5 years). Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non-linear and non-stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule-based approach. All methods are primarily parameter-dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalized clustering framework is required. This paper proposes a feature-based multiple k-means cluster ensemble framework. Multiple learners are trained for evaluating the credibility of class labels, unlike single clustering algorithms. 195,783 and 41,732 well-cited articles from the Microsoft Academic Graph data are considered for clustering short-term (10-year) and long-term (30-year) trajectories, respectively. It has linear run-time. Four distinct trajectories are obtained – Early Rise-Rapid Decline (ER-RD) (2.2%), Early Rise-Slow Decline (ER-SD) (45%), Delayed Rise-Not yet Declined (DR-ND) (53%), and Delayed Rise-Slow Decline (DR-SD) (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit ER-SD and DR-ND patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories' are re-defined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对引文轨迹进行聚类的多重均值聚类集合框架
不同文章的引文成熟时间各不相同。然而,所有文章的影响力都是在一个固定的窗口(2-5 年)内测量的。对这些文章的引用轨迹进行聚类有助于了解知识的传播过程,并揭示出并非所有文章都能在发表后立即获得成功。此外,对轨迹进行聚类对于论文影响力推荐算法也是必要的。这是一个具有挑战性的问题,因为引文时间序列由于非线性和非平稳特性而表现出显著的可变性。之前的研究提出了一系列任意阈值和基于固定规则的方法。所有方法都主要取决于参数。因此,在定义相似轨迹时会出现不一致的情况,在具体数量上也会出现模糊不清的情况。大多数研究只能捕捉极端轨迹。因此,需要一个通用的聚类框架。本文提出了一种基于特征的多重 K 均值聚类集合框架。与单一聚类算法不同,本文训练了多个学习者来评估类标签的可信度。在对短期(10 年)和长期(30 年)轨迹进行聚类时,分别考虑了微软学术图谱数据中的 195,783 篇和 41,732 篇被广泛引用的文章。它的运行时间是线性的。得出了四种不同的轨迹--早期崛起-快速衰退(ER-RD)(2.2%)、早期崛起-缓慢衰退(ER-SD)(45%)、延迟崛起-尚未衰退(DR-ND)(53%)和延迟崛起-缓慢衰退(DR-SD)(0.8%)。对两个不同跨度的个体轨迹差异进行了研究。大多数论文表现出 ER-SD 和 DR-ND 模式。通过经验重新定义了个别轨迹的增长和衰减时间、累积引文分布和峰值特征。详细的比较研究表明,我们提出的方法可以检测出所有不同的轨迹类别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Informetrics
Journal of Informetrics Social Sciences-Library and Information Sciences
CiteScore
6.40
自引率
16.20%
发文量
95
期刊介绍: Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.
期刊最新文献
Impact of gender composition of academic teams on disruptive output When career-boosting is on the line: Equity and inequality in grant evaluation, productivity, and the educational backgrounds of Marie Skłodowska-Curie Actions individual fellows in social sciences and humanities A multiple k-means cluster ensemble framework for clustering citation trajectories Does open data have the potential to improve the response of science to public health emergencies? Does the handling time of scientific papers relate to their academic impact and social attention? Evidence from Nature, Science, and PNAS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1