Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction

IF 3 2区 计算机科学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS EPJ Data Science Pub Date : 2023-10-06 DOI:10.1140/epjds/s13688-023-00421-6
Fakhri Momeni, Philipp Mayr, Stefan Dietze
{"title":"Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction","authors":"Fakhri Momeni, Philipp Mayr, Stefan Dietze","doi":"10.1140/epjds/s13688-023-00421-6","DOIUrl":null,"url":null,"abstract":"Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.","PeriodicalId":11887,"journal":{"name":"EPJ Data Science","volume":"13 1","pages":"0"},"PeriodicalIF":3.0000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1140/epjds/s13688-023-00421-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract Evaluation of researchers’ output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as the h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers’ scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers and their organizations seeking solutions to improve it. This study investigates the effect of the author, paper/venue-specific features on the future h-index. For this purpose, we used a machine learning approach to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it ‘prior impact-based features’ and includes the number of publications, received citations, and h-index. The second group is ‘non-prior impact-based features’ and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting researchers’ h-index in three career phases. Also, we examined the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author’s characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. Although the results demonstrate better performance for the models containing prior impact-based features for all researchers’ groups in the near future, we found that non-prior impact-based features are more robust predictors for younger scholars in the long term. Also, prior impact-based features lose their power to predict more than other features in the long term.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
研究作者和出版物特定特征对学者h指数预测的贡献
对研究人员的产出进行评估对招聘委员会和资助机构至关重要,通常通过他们的科学生产力、引用或h指数等综合指标来衡量。对年轻研究人员的评价更为关键,因为他们需要一段时间才能获得引用和h指数的增长。因此,预测h指数有助于发现研究人员的科学影响力。此外,识别影响因素来预测科学影响有助于研究人员及其组织寻求改善科学影响的解决方案。本研究探讨了作者、论文/地点特征对未来h指数的影响。为此,我们使用机器学习方法来预测h指数,并使用特征分析技术来提高对特征影响的理解。利用Scopus中的文献计量数据,我们定义并提取了两组主要特征。第一个与先前的科学影响有关,我们将其命名为“基于先前影响的特征”,包括出版物数量、收到的引用和h指数。第二组是“非先验影响特征”,包含与作者、合著者、论文和地点特征相关的特征。我们探讨了它们在预测科研人员职业生涯三个阶段的h指数中的重要性。此外,我们还研究了不同特征类别预测性能的时间维度,以找出哪些特征在长期和短期预测中更可靠。我们参考了作者的性别来检验作者的特征在预测任务中的作用。我们的研究结果表明,性别对预测h指数的影响很小。虽然结果表明,在不久的将来,包含基于先验影响的特征的模型对所有研究人员的群体都有更好的表现,但我们发现,从长远来看,非基于先验影响的特征对年轻学者来说是更稳健的预测因子。此外,从长期来看,先前基于影响的特征会比其他特征失去更多预测能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
EPJ Data Science
EPJ Data Science MATHEMATICS, INTERDISCIPLINARY APPLICATIONS -
CiteScore
6.10
自引率
5.60%
发文量
53
审稿时长
13 weeks
期刊介绍: EPJ Data Science covers a broad range of research areas and applications and particularly encourages contributions from techno-socio-economic systems, where it comprises those research lines that now regard the digital “tracks” of human beings as first-order objects for scientific investigation. Topics include, but are not limited to, human behavior, social interaction (including animal societies), economic and financial systems, management and business networks, socio-technical infrastructure, health and environmental systems, the science of science, as well as general risk and crisis scenario forecasting up to and including policy advice.
期刊最新文献
Comparison of home detection algorithms using smartphone GPS data What relational event models can reveal: Commentary on Thomas Grund’s “Dynamics of Denunciation: The Limits of a Scandal” On the duration of face-to-face contacts Computational social science with confidence Public perception of generative AI on Twitter: an empirical study based on occupation and usage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1