Textual features of peer review predict top-cited papers: An interpretable machine learning perspective

IF 3.4 2区 管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Informetrics Pub Date : 2024-01-25 DOI:10.1016/j.joi.2024.101501
Zhuanlan Sun
{"title":"Textual features of peer review predict top-cited papers: An interpretable machine learning perspective","authors":"Zhuanlan Sun","doi":"10.1016/j.joi.2024.101501","DOIUrl":null,"url":null,"abstract":"<div><p>Peer review is crucial in improving the quality and reliability of scientific research. However, the mechanisms through which peer review practices ensure papers become top-cited papers (TCPs) after publication are not well understood. In this study, by collecting a data set containing 13, 066 papers published between 2016 and 2020 from <em>Nature communications</em> with open peer review reports, we aim to examine how textual features embedded within the peer review reports of papers that reflect the reviewers’ emotions may predict the papers to be TCPs. We compiled a list of 15 textual features and classified them into three categories: peer review features, linguistic features, and sentiment features. We then chose the XGBoost machine learning model with the best performance in predicting TCPs, and utilized the explainable artificial intelligence techniques SHAP to interpret the role of feature importance on the prediction results. The distribution of feature importance ranking results demonstrates that sentiment features play a crucial role in determining papers’ potential to be highly cited. This conclusion still holds, even when the ranking of the feature importance changes in the subgroup analysis of dividing the samples into four disciplines (biological sciences, health sciences, physical sciences, and earth and environmental sciences), as well as two groups based on whether reviewers’ identities were revealed. This research emphasizes the textual features retrieved from peer review reports that play role in improving manuscript quality can predict the post-publication research impact.</p></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000142","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Peer review is crucial in improving the quality and reliability of scientific research. However, the mechanisms through which peer review practices ensure papers become top-cited papers (TCPs) after publication are not well understood. In this study, by collecting a data set containing 13, 066 papers published between 2016 and 2020 from Nature communications with open peer review reports, we aim to examine how textual features embedded within the peer review reports of papers that reflect the reviewers’ emotions may predict the papers to be TCPs. We compiled a list of 15 textual features and classified them into three categories: peer review features, linguistic features, and sentiment features. We then chose the XGBoost machine learning model with the best performance in predicting TCPs, and utilized the explainable artificial intelligence techniques SHAP to interpret the role of feature importance on the prediction results. The distribution of feature importance ranking results demonstrates that sentiment features play a crucial role in determining papers’ potential to be highly cited. This conclusion still holds, even when the ranking of the feature importance changes in the subgroup analysis of dividing the samples into four disciplines (biological sciences, health sciences, physical sciences, and earth and environmental sciences), as well as two groups based on whether reviewers’ identities were revealed. This research emphasizes the textual features retrieved from peer review reports that play role in improving manuscript quality can predict the post-publication research impact.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
同行评审的文本特征可预测高被引论文:可解释的机器学习视角
同行评审对于提高科学研究的质量和可靠性至关重要。然而,人们对同行评议实践如何确保论文在发表后成为高被引论文(TCPs)的机制并不十分了解。在本研究中,我们收集了2016年至2020年间发表的13 066篇论文的数据集,这些论文来自《自然》通讯,并公开了同行评议报告,我们旨在研究论文同行评议报告中蕴含的反映评议人情绪的文本特征如何预测论文成为TCPs。我们编制了 15 个文本特征列表,并将它们分为三类:同行评审特征、语言特征和情感特征。然后,我们选择了预测 TCP 性能最好的 XGBoost 机器学习模型,并利用可解释人工智能技术 SHAP 来解释特征重要性对预测结果的作用。特征重要性排名结果的分布表明,情感特征在决定论文被高引用的潜力方面起着至关重要的作用。即使在将样本分为四个学科(生物科学、健康科学、物理科学和地球与环境科学)和两个基于是否披露审稿人身份的组别进行分组分析时,特征重要性的排序发生了变化,这一结论依然成立。这项研究强调,从同行评议报告中检索到的文本特征在提高稿件质量方面发挥着作用,可以预测发表后的研究影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Informetrics
Journal of Informetrics Social Sciences-Library and Information Sciences
CiteScore
6.40
自引率
16.20%
发文量
95
期刊介绍: Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.
期刊最新文献
Impact of gender composition of academic teams on disruptive output When career-boosting is on the line: Equity and inequality in grant evaluation, productivity, and the educational backgrounds of Marie Skłodowska-Curie Actions individual fellows in social sciences and humanities A multiple k-means cluster ensemble framework for clustering citation trajectories Does open data have the potential to improve the response of science to public health emergencies? Does the handling time of scientific papers relate to their academic impact and social attention? Evidence from Nature, Science, and PNAS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1