利用人工智能开发的医学影像应用具有很高的内部有效性,但范围有限,缺乏外部验证。

Jacob F. Oeding M.S. , Aaron J. Krych M.D. , Andrew D. Pearle M.D. , Bryan T. Kelly M.D., M.B.A. , Kyle N. Kunze M.D.
{"title":"利用人工智能开发的医学影像应用具有很高的内部有效性,但范围有限,缺乏外部验证。","authors":"Jacob F. Oeding M.S. ,&nbsp;Aaron J. Krych M.D. ,&nbsp;Andrew D. Pearle M.D. ,&nbsp;Bryan T. Kelly M.D., M.B.A. ,&nbsp;Kyle N. Kunze M.D.","doi":"10.1016/j.arthro.2024.01.043","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To (1) review definitions and concepts necessary to interpret applications of deep learning (DL; a domain of artificial intelligence that leverages neural networks to make predictions on media inputs such as images) and (2) identify knowledge and translational gaps in the literature to provide insight into specific areas for improvement as adoption of this technology continues.</div></div><div><h3>Methods</h3><div>A comprehensive search of the literature was performed in December 2023 for articles regarding the use of DL in sports medicine. For each study, information regarding the joint of focus, specific anatomic structure/pathology to which DL was applied, imaging modality utilized, source of images used for model training and testing, data set size, model performance, and whether the DL model was externally validated was recorded. A numerical scale was used to rate each DL model’s clinical impact, with 1 corresponding to proof-of-concept studies with little to no direct clinical impact and 5 corresponding to practice-changing clinical impact and ready for clinical deployment.</div></div><div><h3>Results</h3><div>Fifty-five studies were identified, all of which were published within the past 5 years, while 82% were published within the past 3 years. Of the DL models identified, 84% were developed for classification tasks, 9% for automated measurements, and 7% for segmentation. A total of 62% of studies utilized magnetic resonance imaging as the imaging modality, 25% radiographs, and 7% ultrasound, while 1 study each used computed tomography<span><span><span>, arthroscopic images, or arthroscopic video. Sixty-five percent of studies focused on the detection of tears (anterior cruciate ligament [ACL], rotator cuff [RC], and meniscus). The </span>diagnostic performance of ACL tears, as determined by the area under the receiver operator curve (AUROC), ranged from 0.81 to 0.99 for ACL tears (excellent to near perfect), 0.83 to 0.94 for </span>RC tears<span> (excellent), and from 0.75 to 0.96 for meniscus tears (acceptable to excellent). In addition, 3 studies focused on detection of cartilage lesions had AUROC ranging from 0.90 to 0.92 (excellent performance). However, only 4 (7%) studies externally validated their models, suggesting that they may not be generalizable or may not perform well when applied to populations other than that used to develop the model. Finally, the mean clinical impact score was 2 (range, 1-3) on scale of 1 to 5, corresponding to limited clinical applicability.</span></span></div></div><div><h3>Conclusions</h3><div>DL models in orthopaedic sports medicine show generally excellent performance (high internal validity) but require external validation to facilitate clinical deployment. In addition, current models have low clinical applicability and fail to advance the field due to a focus on routine tasks and a narrow conceptual framework.</div></div><div><h3>Level of Evidence</h3><div>Level IV, scoping review of Level I to IV studies.</div></div>","PeriodicalId":55459,"journal":{"name":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","volume":"41 2","pages":"Pages 455-472"},"PeriodicalIF":4.4000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Medical Imaging Applications Developed Using Artificial Intelligence Demonstrate High Internal Validity Yet Are Limited in Scope and Lack External Validation\",\"authors\":\"Jacob F. Oeding M.S. ,&nbsp;Aaron J. Krych M.D. ,&nbsp;Andrew D. Pearle M.D. ,&nbsp;Bryan T. Kelly M.D., M.B.A. ,&nbsp;Kyle N. Kunze M.D.\",\"doi\":\"10.1016/j.arthro.2024.01.043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>To (1) review definitions and concepts necessary to interpret applications of deep learning (DL; a domain of artificial intelligence that leverages neural networks to make predictions on media inputs such as images) and (2) identify knowledge and translational gaps in the literature to provide insight into specific areas for improvement as adoption of this technology continues.</div></div><div><h3>Methods</h3><div>A comprehensive search of the literature was performed in December 2023 for articles regarding the use of DL in sports medicine. For each study, information regarding the joint of focus, specific anatomic structure/pathology to which DL was applied, imaging modality utilized, source of images used for model training and testing, data set size, model performance, and whether the DL model was externally validated was recorded. A numerical scale was used to rate each DL model’s clinical impact, with 1 corresponding to proof-of-concept studies with little to no direct clinical impact and 5 corresponding to practice-changing clinical impact and ready for clinical deployment.</div></div><div><h3>Results</h3><div>Fifty-five studies were identified, all of which were published within the past 5 years, while 82% were published within the past 3 years. Of the DL models identified, 84% were developed for classification tasks, 9% for automated measurements, and 7% for segmentation. A total of 62% of studies utilized magnetic resonance imaging as the imaging modality, 25% radiographs, and 7% ultrasound, while 1 study each used computed tomography<span><span><span>, arthroscopic images, or arthroscopic video. Sixty-five percent of studies focused on the detection of tears (anterior cruciate ligament [ACL], rotator cuff [RC], and meniscus). The </span>diagnostic performance of ACL tears, as determined by the area under the receiver operator curve (AUROC), ranged from 0.81 to 0.99 for ACL tears (excellent to near perfect), 0.83 to 0.94 for </span>RC tears<span> (excellent), and from 0.75 to 0.96 for meniscus tears (acceptable to excellent). In addition, 3 studies focused on detection of cartilage lesions had AUROC ranging from 0.90 to 0.92 (excellent performance). However, only 4 (7%) studies externally validated their models, suggesting that they may not be generalizable or may not perform well when applied to populations other than that used to develop the model. Finally, the mean clinical impact score was 2 (range, 1-3) on scale of 1 to 5, corresponding to limited clinical applicability.</span></span></div></div><div><h3>Conclusions</h3><div>DL models in orthopaedic sports medicine show generally excellent performance (high internal validity) but require external validation to facilitate clinical deployment. In addition, current models have low clinical applicability and fail to advance the field due to a focus on routine tasks and a narrow conceptual framework.</div></div><div><h3>Level of Evidence</h3><div>Level IV, scoping review of Level I to IV studies.</div></div>\",\"PeriodicalId\":55459,\"journal\":{\"name\":\"Arthroscopy-The Journal of Arthroscopic and Related Surgery\",\"volume\":\"41 2\",\"pages\":\"Pages 455-472\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Arthroscopy-The Journal of Arthroscopic and Related Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0749806324000999\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749806324000999","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

摘要

目的:(1) 回顾解读深度学习(DL;人工智能(AI)的一个领域,利用神经网络对图像等媒体输入进行预测)应用所需的定义和概念;(2) 找出文献中的知识和转化差距,以便随着该技术的不断应用,深入了解需要改进的具体领域:2023 年 12 月,我们对有关在运动医学中使用 DL 的文章进行了全面的文献检索。每项研究都记录了重点关节、应用 DL 的具体解剖结构/病理、使用的成像模式、用于模型训练和测试的图像来源、数据集大小、模型性能以及 DL 模型是否经过外部验证等信息。我们使用数字量表对每个 DL 模型的临床影响进行评分,1 分代表概念验证研究,对临床几乎没有直接影响;5 分代表改变临床实践的影响,可用于临床部署:结果:确定了 55 项研究,所有研究都是在过去五年内发表的,82% 的研究是在过去三年内发表的。在确定的 DL 模型中,84% 是为分类任务开发的,9% 是为自动测量开发的,7% 是为分割开发的。共有 62% 的研究使用核磁共振成像作为成像模式,25% 的研究使用射线照片,7% 的研究使用超声波,还有一项研究使用 CT、关节镜图像或关节镜视频。65%的研究侧重于检测撕裂(前交叉韧带(ACL)、肩袖(RC)和半月板)。根据接收者操作曲线下面积(AUROC)确定,前交叉韧带撕裂的诊断性能在 0.81-0.99 之间(优秀到接近完美),肩袖撕裂的诊断性能在 0.83-0.94 之间(优秀),半月板撕裂的诊断性能在 0.75-0.96 之间(可接受到优秀)。此外,三项重点检测软骨损伤的研究的 AUC 为 0.90-0.92(优秀)。然而,只有四项(7%)研究对其模型进行了外部验证,这表明这些模型可能无法推广到模型开发对象以外的人群,或者在应用于其他人群时表现不佳。最后,在 1-5 级评分中,平均临床影响评分为 2 分(范围为 1-3),临床适用性有限:结论:骨科运动医学中的 DL 模型总体表现优异(内部效度高),但需要外部验证以促进临床应用。此外,目前的模型临床适用性较低,并且由于侧重于常规任务和狭隘的概念框架,无法推动该领域的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Medical Imaging Applications Developed Using Artificial Intelligence Demonstrate High Internal Validity Yet Are Limited in Scope and Lack External Validation

Purpose

To (1) review definitions and concepts necessary to interpret applications of deep learning (DL; a domain of artificial intelligence that leverages neural networks to make predictions on media inputs such as images) and (2) identify knowledge and translational gaps in the literature to provide insight into specific areas for improvement as adoption of this technology continues.

Methods

A comprehensive search of the literature was performed in December 2023 for articles regarding the use of DL in sports medicine. For each study, information regarding the joint of focus, specific anatomic structure/pathology to which DL was applied, imaging modality utilized, source of images used for model training and testing, data set size, model performance, and whether the DL model was externally validated was recorded. A numerical scale was used to rate each DL model’s clinical impact, with 1 corresponding to proof-of-concept studies with little to no direct clinical impact and 5 corresponding to practice-changing clinical impact and ready for clinical deployment.

Results

Fifty-five studies were identified, all of which were published within the past 5 years, while 82% were published within the past 3 years. Of the DL models identified, 84% were developed for classification tasks, 9% for automated measurements, and 7% for segmentation. A total of 62% of studies utilized magnetic resonance imaging as the imaging modality, 25% radiographs, and 7% ultrasound, while 1 study each used computed tomography, arthroscopic images, or arthroscopic video. Sixty-five percent of studies focused on the detection of tears (anterior cruciate ligament [ACL], rotator cuff [RC], and meniscus). The diagnostic performance of ACL tears, as determined by the area under the receiver operator curve (AUROC), ranged from 0.81 to 0.99 for ACL tears (excellent to near perfect), 0.83 to 0.94 for RC tears (excellent), and from 0.75 to 0.96 for meniscus tears (acceptable to excellent). In addition, 3 studies focused on detection of cartilage lesions had AUROC ranging from 0.90 to 0.92 (excellent performance). However, only 4 (7%) studies externally validated their models, suggesting that they may not be generalizable or may not perform well when applied to populations other than that used to develop the model. Finally, the mean clinical impact score was 2 (range, 1-3) on scale of 1 to 5, corresponding to limited clinical applicability.

Conclusions

DL models in orthopaedic sports medicine show generally excellent performance (high internal validity) but require external validation to facilitate clinical deployment. In addition, current models have low clinical applicability and fail to advance the field due to a focus on routine tasks and a narrow conceptual framework.

Level of Evidence

Level IV, scoping review of Level I to IV studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.30
自引率
17.00%
发文量
555
审稿时长
58 days
期刊介绍: Nowhere is minimally invasive surgery explained better than in Arthroscopy, the leading peer-reviewed journal in the field. Every issue enables you to put into perspective the usefulness of the various emerging arthroscopic techniques. The advantages and disadvantages of these methods -- along with their applications in various situations -- are discussed in relation to their efficiency, efficacy and cost benefit. As a special incentive, paid subscribers also receive access to the journal expanded website.
期刊最新文献
Instructions for Authors Masthead Editorial Board Table of Contents Announcements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1