Systematic literature review and meta-analysis for real-world versus clinical validation performance of artificial intelligence applications indicated for ICH and LVO detection
Jason Le , Oisín Butler , Ann-Kathrin Frenz , Ankur Sharma
{"title":"Systematic literature review and meta-analysis for real-world versus clinical validation performance of artificial intelligence applications indicated for ICH and LVO detection","authors":"Jason Le , Oisín Butler , Ann-Kathrin Frenz , Ankur Sharma","doi":"10.1016/j.ibmed.2024.100187","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>We sought to compare the performance of AI applications in real-world studies to validation study data used to gain regulatory approval.</div></div><div><h3>Methods</h3><div>We searched PubMed, EBSCO, and EMBASE for publications from 2018 to 2023. We included articles that evaluated the sensitivity and specificity of ICH and LVO detection applications in real-world populations. We performed a quality and applicability assessment using QUADAS-2. We used a bivariate or two univariate meta-analyses, where appropriate, to calculate summary point estimates for sensitivity and specificity.</div></div><div><h3>Results</h3><div>Eighteen articles met the criteria of the systematic literature review. The included articles evaluated five applications indicated for ICH or LVO triage. Three of the five applications yielded adequate studies to be included in the meta-analysis. For most applications, we did not observe any systematic differences in sensitivity and specificity results between the point estimates from the meta-analysis and the respective 510k studies. For VIZ LVO and RAPID LVO, the 95 % CI for real-world sensitivity sat within the 95 % CI from their respective validation study. For BriefCase ICH, the 95 % CI for real-world sensitivity sat below the 95 % CI of the respective validation study. Additionally, the 95 % CI for real-world specificity for all three of the applications sat within the 95 % CI of their respective validation studies. Data from the individual real-world studies for RAPID ICH and CINA LVO followed a similar trend.</div></div><div><h3>Conclusion</h3><div>The performance of applications in real-world settings was non-inferior to the performance observed in validation studies used to obtain 510k clearance.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"10 ","pages":"Article 100187"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521224000541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
We sought to compare the performance of AI applications in real-world studies to validation study data used to gain regulatory approval.
Methods
We searched PubMed, EBSCO, and EMBASE for publications from 2018 to 2023. We included articles that evaluated the sensitivity and specificity of ICH and LVO detection applications in real-world populations. We performed a quality and applicability assessment using QUADAS-2. We used a bivariate or two univariate meta-analyses, where appropriate, to calculate summary point estimates for sensitivity and specificity.
Results
Eighteen articles met the criteria of the systematic literature review. The included articles evaluated five applications indicated for ICH or LVO triage. Three of the five applications yielded adequate studies to be included in the meta-analysis. For most applications, we did not observe any systematic differences in sensitivity and specificity results between the point estimates from the meta-analysis and the respective 510k studies. For VIZ LVO and RAPID LVO, the 95 % CI for real-world sensitivity sat within the 95 % CI from their respective validation study. For BriefCase ICH, the 95 % CI for real-world sensitivity sat below the 95 % CI of the respective validation study. Additionally, the 95 % CI for real-world specificity for all three of the applications sat within the 95 % CI of their respective validation studies. Data from the individual real-world studies for RAPID ICH and CINA LVO followed a similar trend.
Conclusion
The performance of applications in real-world settings was non-inferior to the performance observed in validation studies used to obtain 510k clearance.
系统性文献综述和荟萃分析:适用于 ICH 和 LVO 检测的人工智能应用在真实世界和临床验证中的表现
目的我们试图比较人工智能应用在真实世界研究中的表现与用于获得监管部门批准的验证研究数据。方法我们检索了PubMed、EBSCO和EMBASE上2018年至2023年的出版物。我们纳入了评估真实世界人群中 ICH 和 LVO 检测应用灵敏度和特异性的文章。我们使用 QUADAS-2 进行了质量和适用性评估。我们酌情使用双变量或两个单变量荟萃分析来计算灵敏度和特异性的汇总点估计值。纳入的文章评估了五种用于 ICH 或 LVO 分流的应用。在这五种应用中,有三种应用的研究结果足以纳入荟萃分析。对于大多数应用,我们没有观察到荟萃分析的点估计值与相应的 510k 研究之间在灵敏度和特异性结果上存在任何系统性差异。对于 VIZ LVO 和 RAPID LVO,真实世界灵敏度的 95 % CI 位于各自验证研究的 95 % CI 范围内。对于 BriefCase ICH,实际灵敏度的 95 % CI 低于各自验证研究的 95 % CI。此外,所有三种应用的实际特异性的 95 % CI 都在各自验证研究的 95 % CI 范围内。RAPID ICH 和 CINA LVO 的单项真实世界研究数据也呈现类似趋势。