Toward automating GRADE classification: a proof-of-concept evaluation of an artificial intelligence-based tool for semiautomated evidence quality rating in systematic reviews.

IF 7.6 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL BMJ Evidence-Based Medicine Pub Date : 2025-04-07 DOI:10.1136/bmjebm-2024-113123
Alisson Oliveira Dos Santos, Vinícius Silva Belo, Tales Mota Machado, Eduardo Sérgio da Silva
{"title":"Toward automating GRADE classification: a proof-of-concept evaluation of an artificial intelligence-based tool for semiautomated evidence quality rating in systematic reviews.","authors":"Alisson Oliveira Dos Santos, Vinícius Silva Belo, Tales Mota Machado, Eduardo Sérgio da Silva","doi":"10.1136/bmjebm-2024-113123","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Evaluation of the quality of evidence in systematic reviews (SRs) is essential for assertive decision-making. Although Grading of Recommendations Assessment, Development and Evaluation (GRADE) affords a consolidated approach for rating the level of evidence, its application is complex and time-consuming. Artificial intelligence (AI) can be used to overcome these barriers.</p><p><strong>Design: </strong>Analytical experimental study.</p><p><strong>Objective: </strong>The objective is to develop and appraise a proof-of-concept AI-powered tool for the semiautomation of an adaptation of the GRADE classification system to determine levels of evidence in SRs with meta-analyses compiled from randomised clinical trials.</p><p><strong>Methods: </strong>The URSE-automated system was based on an algorithm created to enhance the objectivity of the GRADE classification. It was developed using the Python language and the React library to create user-friendly interfaces. Evaluation of the URSE-automated system was performed by analysing 115 SRs from the Cochrane Library and comparing the predicted levels of evidence with those generated by human evaluators.</p><p><strong>Results: </strong>The open-source URSE code is available on GitHub (http://www.github.com/alisson-mfc/urse). The agreement between the URSE-automated GRADE system and human evaluators regarding the quality of evidence was 63.2% with a Cohen's kappa coefficient of 0.44. The metrics of the GRADE domains evaluated included accuracy and F1-scores, which were 0.97 and 0.94 for imprecision (number of participants), 0.73 and 0.7 for risk of bias, 0.9 and 0.9 for I<sup>2</sup> values (heterogeneity) and 0.98 and 0.99 for quality of methodology (A Measurement Tool to Assess Systematic Reviews), respectively.</p><p><strong>Conclusion: </strong>The results demonstrate the potential use of AI in assessing the quality of evidence. However, in consideration of the emphasis of the GRADE approach on subjectivity and understanding the context of evidence production, full automation of the classification process is not opportune. Nevertheless, the combination of the URSE-automated system with human evaluation or the integration of this tool into other platforms represents interesting directions for the future.</p>","PeriodicalId":9059,"journal":{"name":"BMJ Evidence-Based Medicine","volume":" ","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Evidence-Based Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjebm-2024-113123","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Evaluation of the quality of evidence in systematic reviews (SRs) is essential for assertive decision-making. Although Grading of Recommendations Assessment, Development and Evaluation (GRADE) affords a consolidated approach for rating the level of evidence, its application is complex and time-consuming. Artificial intelligence (AI) can be used to overcome these barriers.

Design: Analytical experimental study.

Objective: The objective is to develop and appraise a proof-of-concept AI-powered tool for the semiautomation of an adaptation of the GRADE classification system to determine levels of evidence in SRs with meta-analyses compiled from randomised clinical trials.

Methods: The URSE-automated system was based on an algorithm created to enhance the objectivity of the GRADE classification. It was developed using the Python language and the React library to create user-friendly interfaces. Evaluation of the URSE-automated system was performed by analysing 115 SRs from the Cochrane Library and comparing the predicted levels of evidence with those generated by human evaluators.

Results: The open-source URSE code is available on GitHub (http://www.github.com/alisson-mfc/urse). The agreement between the URSE-automated GRADE system and human evaluators regarding the quality of evidence was 63.2% with a Cohen's kappa coefficient of 0.44. The metrics of the GRADE domains evaluated included accuracy and F1-scores, which were 0.97 and 0.94 for imprecision (number of participants), 0.73 and 0.7 for risk of bias, 0.9 and 0.9 for I2 values (heterogeneity) and 0.98 and 0.99 for quality of methodology (A Measurement Tool to Assess Systematic Reviews), respectively.

Conclusion: The results demonstrate the potential use of AI in assessing the quality of evidence. However, in consideration of the emphasis of the GRADE approach on subjectivity and understanding the context of evidence production, full automation of the classification process is not opportune. Nevertheless, the combination of the URSE-automated system with human evaluation or the integration of this tool into other platforms represents interesting directions for the future.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迈向自动化GRADE分类:系统评价中用于半自动证据质量评级的基于人工智能的工具的概念验证评估。
背景:评价系统评价(SRs)中的证据质量对于果断决策至关重要。虽然建议评估、发展和评价分级(GRADE)提供了一种评价证据水平的统一方法,但其应用是复杂和耗时的。人工智能(AI)可以用来克服这些障碍。设计:分析性实验研究。目的:目的是开发和评估一种概念验证ai驱动的工具,用于对GRADE分类系统进行半自动化调整,通过随机临床试验汇编的荟萃分析确定SRs的证据水平。方法:urse自动化系统基于一种算法,该算法旨在提高GRADE分类的客观性。它是使用Python语言和React库开发的,以创建用户友好的界面。通过分析Cochrane图书馆的115个sr,并将预测的证据水平与人工评估人员产生的证据水平进行比较,对urse自动化系统进行了评估。结果:开源的URSE代码可以在GitHub (http://www.github.com/alisson-mfc/urse)上获得。在证据质量方面,urse自动化GRADE系统与人工评估人员之间的一致性为63.2%,Cohen’s kappa系数为0.44。GRADE领域评估的指标包括准确性和f1得分,其中不精确性(参与者人数)为0.97和0.94,偏倚风险为0.73和0.7,I2值(异质性)为0.9和0.9,方法学质量(评估系统评价的测量工具)为0.98和0.99。结论:结果显示了人工智能在评估证据质量方面的潜在应用。然而,考虑到GRADE方法对主观性的强调和对证据产生背景的理解,分类过程的完全自动化并不合适。然而,urse自动化系统与人工评估的结合或将该工具集成到其他平台中代表了未来有趣的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMJ Evidence-Based Medicine
BMJ Evidence-Based Medicine MEDICINE, GENERAL & INTERNAL-
CiteScore
8.90
自引率
3.40%
发文量
48
期刊介绍: BMJ Evidence-Based Medicine (BMJ EBM) publishes original evidence-based research, insights and opinions on what matters for health care. We focus on the tools, methods, and concepts that are basic and central to practising evidence-based medicine and deliver relevant, trustworthy and impactful evidence. BMJ EBM is a Plan S compliant Transformative Journal and adheres to the highest possible industry standards for editorial policies and publication ethics.
期刊最新文献
Methylphenidate denied access to WHO's list of essential medicines for the third time. Using and distinguishing evidence from non-randomised studies of interventions. Evaluating the performance of maternal risk factors in predicting gestational diabetes mellitus: a systematic review and meta-analysis. Reporting GUideline for Intervention DEscription in Rehabilitation (GUIDE-Rehab): a tool to open the 'black box' of rehabilitation complex interventions. Environmental Lessons Learned and Applied (ELLA): a new framework for identifying, prioritising and implementing decarbonisation interventions in clinical care pathways.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1