Estimating the quality of published medical research with ChatGPT

IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2025-07-01 Epub Date: 2025-03-06 DOI:10.1016/j.ipm.2025.104123
Mike Thelwall , Xiaorui Jiang , Peter A. Bath
{"title":"Estimating the quality of published medical research with ChatGPT","authors":"Mike Thelwall ,&nbsp;Xiaorui Jiang ,&nbsp;Peter A. Bath","doi":"10.1016/j.ipm.2025.104123","DOIUrl":null,"url":null,"abstract":"<div><div>Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (<em>r</em> = 0.134, <em>n</em> = 9872) with departmental mean REF scores, against a theoretical maximum correlation of <em>r</em> = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (<em>r</em> = 0.395, <em>n</em> = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (<em>r</em> = 0.495) but negatively with their citation rate (<em>r</em>=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104123"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000652","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (r = 0.134, n = 9872) with departmental mean REF scores, against a theoretical maximum correlation of r = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (r = 0.395, n = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (r = 0.495) but negatively with their citation rate (r=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用ChatGPT估计已发表医学研究的质量
评估已发表研究的质量对部门、研究人员和求职者的评估很重要。基于引用的指标有时支持这些任务,但不适用于新文章,并且准确性较低或中等。先前的研究表明,ChatGPT可以估计研究文章的质量,其分数与所有领域的专家分数代理呈正相关,并且通常比基于引用的指标更强,除了临床医学。因此,ChatGPT分数可能会在某些应用程序中取代基于引用的指标。本文以迄今为止最大的数据集对临床医学异常进行了调查和更详细的分析。结果显示,提交给英国卓越研究框架(REF) 2021评估单元(UoA) 1临床医学的文章的ChatGPT 40 -mini分数与院系平均REF分数正相关(r = 0.134, n = 9872),而理论最大相关r = 0.226。ChatGPT 40与3.5 turbo也呈正相关。在院系层面,ChatGPT平均分与院系REF平均分的相关性更强(r = 0.395, n = 31)。在UoA 1收录文章最多的100种期刊中,其ChatGPT平均分与其所在院系的REF平均分呈强相关(r= 0.495),与被引率呈负相关(r=-0.148)。这些结果中的期刊和部门异常表明,ChatGPT在评估著名医学期刊或直接影响人类健康的研究的质量方面是无效的,或者两者兼而有之。尽管如此,研究结果证明了ChatGPT在评估临床医学总体研究质量方面的能力,它可能取代基于引文的新研究指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
期刊最新文献
Fuzzy neighborhood rough set-based attribute reduction over temporal information systems with application to clinical efficacy evaluation Empowering open-domain LLMs for legal document correction via legal knowledge integration and decoding constraints CTJANet: A class-task joint-aware network for enhanced few-shot image classification ALC-DRKG: an active learning-based framework for dynamic knowledge graph construction for drug repositioning Measuring stance dynamics in political debate using temporal graph neural networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1