Can large language models fully automate or partially assist paper selection in systematic reviews?

IF 3.7 2区 医学 Q1 OPHTHALMOLOGY British Journal of Ophthalmology Pub Date : 2025-01-15 DOI:10.1136/bjo-2024-326254
Haichao Chen, Zehua Jiang, Xinyu Liu, Can Can Xue, Samantha Min Er Yew, Bin Sheng, Ying-Feng Zheng, Xiaofei Wang, You Wu, Sobha Sivaprasad, Tien Yin Wong, Varun Chaudhary, Yih Chung Tham
{"title":"Can large language models fully automate or partially assist paper selection in systematic reviews?","authors":"Haichao Chen, Zehua Jiang, Xinyu Liu, Can Can Xue, Samantha Min Er Yew, Bin Sheng, Ying-Feng Zheng, Xiaofei Wang, You Wu, Sobha Sivaprasad, Tien Yin Wong, Varun Chaudhary, Yih Chung Tham","doi":"10.1136/bjo-2024-326254","DOIUrl":null,"url":null,"abstract":"Background/aims Large language models (LLMs) have substantial potential to enhance the efficiency of academic research. The accuracy and performance of LLMs in a systematic review, a core part of evidence building, has yet to be studied in detail. Methods We introduced two LLM-based approaches of systematic review: an LLM-enabled fully automated approach (LLM-FA) utilising three different GPT-4 plugins (Consensus GPT, Scholar GPT and GPT web browsing modes) and an LLM-facilitated semi-automated approach (LLM-SA) using GPT4’s Application Programming Interface (API). We benchmarked these approaches using three published systematic reviews that reported the prevalence of diabetic retinopathy across different populations (general population, pregnant women and children). Results The three published reviews consisted of 98 papers in total. Across these three reviews, in the LLM-FA approach, Consensus GPT correctly identified 32.7% (32 out of 98) of papers, while Scholar GPT and GPT4’s web browsing modes only identified 19.4% (19 out of 98) and 6.1% (6 out of 98), respectively. On the other hand, the LLM-SA approach not only successfully included 82.7% (81 out of 98) of these papers but also correctly excluded 92.2% of 4497 irrelevant papers. Conclusions Our findings suggest LLMs are not yet capable of autonomously identifying and selecting relevant papers in systematic reviews. However, they hold promise as an assistive tool to improve the efficiency of the paper selection process in systematic reviews. Data are available upon reasonable request. All data and code are available upon request by emailing thamyc@nus.edu.sg.","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":"28 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2024-326254","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background/aims Large language models (LLMs) have substantial potential to enhance the efficiency of academic research. The accuracy and performance of LLMs in a systematic review, a core part of evidence building, has yet to be studied in detail. Methods We introduced two LLM-based approaches of systematic review: an LLM-enabled fully automated approach (LLM-FA) utilising three different GPT-4 plugins (Consensus GPT, Scholar GPT and GPT web browsing modes) and an LLM-facilitated semi-automated approach (LLM-SA) using GPT4’s Application Programming Interface (API). We benchmarked these approaches using three published systematic reviews that reported the prevalence of diabetic retinopathy across different populations (general population, pregnant women and children). Results The three published reviews consisted of 98 papers in total. Across these three reviews, in the LLM-FA approach, Consensus GPT correctly identified 32.7% (32 out of 98) of papers, while Scholar GPT and GPT4’s web browsing modes only identified 19.4% (19 out of 98) and 6.1% (6 out of 98), respectively. On the other hand, the LLM-SA approach not only successfully included 82.7% (81 out of 98) of these papers but also correctly excluded 92.2% of 4497 irrelevant papers. Conclusions Our findings suggest LLMs are not yet capable of autonomously identifying and selecting relevant papers in systematic reviews. However, they hold promise as an assistive tool to improve the efficiency of the paper selection process in systematic reviews. Data are available upon reasonable request. All data and code are available upon request by emailing thamyc@nus.edu.sg.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型能否完全自动化或部分辅助系统评审中的论文选择?
背景/目的大型语言模型(llm)在提高学术研究效率方面具有巨大的潜力。法学硕士在系统评价中的准确性和表现,作为证据构建的核心部分,还有待详细研究。我们介绍了两种基于llm的系统评价方法:一种是基于llm的全自动方法(LLM-FA),利用三种不同的GPT-4插件(Consensus GPT、Scholar GPT和GPT网页浏览模式),另一种是基于llm的半自动方法(LLM-SA),使用GPT4的应用程序编程接口(API)。我们使用三篇已发表的系统综述来对这些方法进行基准测试,这些综述报道了糖尿病视网膜病变在不同人群(普通人群、孕妇和儿童)中的患病率。结果3篇综述共收录论文98篇。在这三篇综述中,在LLM-FA方法中,Consensus GPT正确识别了32.7%(98篇中的32篇)的论文,而Scholar GPT和GPT4的网页浏览模式分别仅识别了19.4%(98篇中的19篇)和6.1%(98篇中的6篇)。另一方面,LLM-SA方法不仅成功地收录了这些论文中的82.7%(98篇中的81篇),而且正确地排除了4497篇无关论文中的92.2%。我们的研究结果表明,法学硕士还没有能力在系统综述中自主识别和选择相关论文。然而,它们有望作为一种辅助工具来提高系统评价中论文选择过程的效率。如有合理要求,可提供资料。所有数据和代码可通过电子邮件thamyc@nus.edu.sg索取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
10.30
自引率
2.40%
发文量
213
审稿时长
3-6 weeks
期刊介绍: The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.
期刊最新文献
At a glance Risk factors for rapid axial length growth in a prospective cohort study of 3-year to 9-year-old Chinese children. Risk factors for visual loss after excision of orbital cavernous venous malformations: a systematic review. Burden of blindness and visual impairment associated with corneal opacities in India. Registry-based randomised controlled trials in glaucoma: the time is right?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1