Estimating the quality of published medical research with ChatGPT

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2025-07-01 Epub Date: 2025-03-06 DOI:10.1016/j.ipm.2025.104123

Mike Thelwall , Xiaorui Jiang , Peter A. Bath

{"title":"Estimating the quality of published medical research with ChatGPT","authors":"Mike Thelwall , Xiaorui Jiang , Peter A. Bath","doi":"10.1016/j.ipm.2025.104123","DOIUrl":null,"url":null,"abstract":"<div><div>Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (<em>r</em> = 0.134, <em>n</em> = 9872) with departmental mean REF scores, against a theoretical maximum correlation of <em>r</em> = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (<em>r</em> = 0.395, <em>n</em> = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (<em>r</em> = 0.495) but negatively with their citation rate (<em>r</em>=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104123"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000652","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (r = 0.134, n = 9872) with departmental mean REF scores, against a theoretical maximum correlation of r = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (r = 0.395, n = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (r = 0.495) but negatively with their citation rate (r=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用ChatGPT估计已发表医学研究的质量

评估已发表研究的质量对部门、研究人员和求职者的评估很重要。基于引用的指标有时支持这些任务，但不适用于新文章，并且准确性较低或中等。先前的研究表明，ChatGPT可以估计研究文章的质量，其分数与所有领域的专家分数代理呈正相关，并且通常比基于引用的指标更强，除了临床医学。因此，ChatGPT分数可能会在某些应用程序中取代基于引用的指标。本文以迄今为止最大的数据集对临床医学异常进行了调查和更详细的分析。结果显示，提交给英国卓越研究框架（REF） 2021评估单元（UoA） 1临床医学的文章的ChatGPT 40 -mini分数与院系平均REF分数正相关（r = 0.134, n = 9872），而理论最大相关r = 0.226。ChatGPT 40与3.5 turbo也呈正相关。在院系层面，ChatGPT平均分与院系REF平均分的相关性更强（r = 0.395, n = 31）。在UoA 1收录文章最多的100种期刊中，其ChatGPT平均分与其所在院系的REF平均分呈强相关（r= 0.495），与被引率呈负相关（r=-0.148）。这些结果中的期刊和部门异常表明，ChatGPT在评估著名医学期刊或直接影响人类健康的研究的质量方面是无效的，或者两者兼而有之。尽管如此，研究结果证明了ChatGPT在评估临床医学总体研究质量方面的能力，它可能取代基于引文的新研究指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.