Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction

IF 3 2区计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS EPJ Data Science Pub Date : 2023-10-06 DOI:10.1140/epjds/s13688-023-00421-6

Fakhri Momeni, Philipp Mayr, Stefan Dietze

{"title":"Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction","authors":"Fakhri Momeni, Philipp Mayr, Stefan Dietze","doi":"10.1140/epjds/s13688-023-00421-6","DOIUrl":null,"url":null,"abstract":"Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 1","pages":"0"},"PeriodicalIF":3.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1140/epjds/s13688-023-00421-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

研究作者和出版物特定特征对学者h指数预测的贡献

对研究人员的产出进行评估对招聘委员会和资助机构至关重要，通常通过他们的科学生产力、引用或h指数等综合指标来衡量。对年轻研究人员的评价更为关键，因为他们需要一段时间才能获得引用和h指数的增长。因此，预测h指数有助于发现研究人员的科学影响力。此外，识别影响因素来预测科学影响有助于研究人员及其组织寻求改善科学影响的解决方案。本研究探讨了作者、论文/地点特征对未来h指数的影响。为此，我们使用机器学习方法来预测h指数，并使用特征分析技术来提高对特征影响的理解。利用Scopus中的文献计量数据，我们定义并提取了两组主要特征。第一个与先前的科学影响有关，我们将其命名为“基于先前影响的特征”，包括出版物数量、收到的引用和h指数。第二组是“非先验影响特征”，包含与作者、合著者、论文和地点特征相关的特征。我们探讨了它们在预测科研人员职业生涯三个阶段的h指数中的重要性。此外，我们还研究了不同特征类别预测性能的时间维度，以找出哪些特征在长期和短期预测中更可靠。我们参考了作者的性别来检验作者的特征在预测任务中的作用。我们的研究结果表明，性别对预测h指数的影响很小。虽然结果表明，在不久的将来，包含基于先验影响的特征的模型对所有研究人员的群体都有更好的表现，但我们发现，从长远来看，非基于先验影响的特征对年轻学者来说是更稳健的预测因子。此外，从长期来看，先前基于影响的特征会比其他特征失去更多预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

EPJ Data Science MATHEMATICS, INTERDISCIPLINARY APPLICATIONS -

CiteScore

6.10

自引率

5.60%

发文量

审稿时长

13 weeks

期刊介绍： EPJ Data Science covers a broad range of research areas and applications and particularly encourages contributions from techno-socio-economic systems, where it comprises those research lines that now regard the digital “tracks” of human beings as first-order objects for scientific investigation. Topics include, but are not limited to, human behavior, social interaction (including animal societies), economic and financial systems, management and business networks, socio-technical infrastructure, health and environmental systems, the science of science, as well as general risk and crisis scenario forecasting up to and including policy advice.