下载PDF
{"title":"Examination-Level Supervision for Deep Learning-based Intracranial Hemorrhage Detection on Head CT Scans.","authors":"Jacopo Teneggi, Paul H Yi, Jeremias Sulam","doi":"10.1148/ryai.230159","DOIUrl":null,"url":null,"abstract":"<p><p>Purpose To compare the effectiveness of weak supervision (ie, with examination-level labels only) and strong supervision (ie, with image-level labels) in training deep learning models for detection of intracranial hemorrhage (ICH) on head CT scans. Materials and Methods In this retrospective study, an attention-based convolutional neural network was trained with either local (ie, image level) or global (ie, examination level) binary labels on the Radiological Society of North America (RSNA) 2019 Brain CT Hemorrhage Challenge dataset of 21 736 examinations (8876 [40.8%] ICH) and 752 422 images (107 784 [14.3%] ICH). The CQ500 (436 examinations; 212 [48.6%] ICH) and CT-ICH (75 examinations; 36 [48.0%] ICH) datasets were employed for external testing. Performance in detecting ICH was compared between weak (examination-level labels) and strong (image-level labels) learners as a function of the number of labels available during training. Results On examination-level binary classification, strong and weak learners did not have different area under the receiver operating characteristic curve values on the internal validation split (0.96 vs 0.96; <i>P</i> = .64) and the CQ500 dataset (0.90 vs 0.92; <i>P</i> = .15). Weak learners outperformed strong ones on the CT-ICH dataset (0.95 vs 0.92; <i>P</i> = .03). Weak learners had better section-level ICH detection performance when more than 10 000 labels were available for training (average <i>f</i><sub>1</sub> = 0.73 vs 0.65; <i>P</i> < .001). Weakly supervised models trained on the entire RSNA dataset required 35 times fewer labels than equivalent strong learners. Conclusion Strongly supervised models did not achieve better performance than weakly supervised ones, which could reduce radiologist labor requirements for prospective dataset curation. <b>Keywords:</b> CT, Head/Neck, Brain/Brain Stem, Hemorrhage <i>Supplemental material is available for this article.</i> © RSNA, 2023 See also commentary by Wahid and Fuentes in this issue.</p>","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":"6 1","pages":"e230159"},"PeriodicalIF":8.1000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10831525/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.230159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
引用
批量引用
Abstract
Purpose To compare the effectiveness of weak supervision (ie, with examination-level labels only) and strong supervision (ie, with image-level labels) in training deep learning models for detection of intracranial hemorrhage (ICH) on head CT scans. Materials and Methods In this retrospective study, an attention-based convolutional neural network was trained with either local (ie, image level) or global (ie, examination level) binary labels on the Radiological Society of North America (RSNA) 2019 Brain CT Hemorrhage Challenge dataset of 21 736 examinations (8876 [40.8%] ICH) and 752 422 images (107 784 [14.3%] ICH). The CQ500 (436 examinations; 212 [48.6%] ICH) and CT-ICH (75 examinations; 36 [48.0%] ICH) datasets were employed for external testing. Performance in detecting ICH was compared between weak (examination-level labels) and strong (image-level labels) learners as a function of the number of labels available during training. Results On examination-level binary classification, strong and weak learners did not have different area under the receiver operating characteristic curve values on the internal validation split (0.96 vs 0.96; P = .64) and the CQ500 dataset (0.90 vs 0.92; P = .15). Weak learners outperformed strong ones on the CT-ICH dataset (0.95 vs 0.92; P = .03). Weak learners had better section-level ICH detection performance when more than 10 000 labels were available for training (average f 1 = 0.73 vs 0.65; P < .001). Weakly supervised models trained on the entire RSNA dataset required 35 times fewer labels than equivalent strong learners. Conclusion Strongly supervised models did not achieve better performance than weakly supervised ones, which could reduce radiologist labor requirements for prospective dataset curation. Keywords: CT, Head/Neck, Brain/Brain Stem, Hemorrhage Supplemental material is available for this article. © RSNA, 2023 See also commentary by Wahid and Fuentes in this issue.
基于深度学习的头部 CT 扫描颅内出血检测的检查级监督。
目的 比较弱监督(即仅使用检查级标签)和强监督(即使用图像级标签)训练深度学习模型检测头部 CT 扫描颅内出血 (ICH) 的效果。材料与方法 在这项回顾性研究中,在北美放射学会(RSNA)2019 年脑 CT 出血挑战赛数据集 21 736 次检查(8876 [40.8%] ICH)和 752 422 张图像(107 784 [14.3%] ICH)上,使用局部(即图像级)或全局(即检查级)二元标签训练了基于注意力的卷积神经网络。外部测试采用了 CQ500(436 次检查;212 [48.6%] ICH)和 CT-ICH (75 次检查;36 [48.0%] ICH)数据集。比较了弱学习者(检查级标签)和强学习者(图像级标签)检测 ICH 的性能,并将其作为训练期间可用标签数量的函数。结果 在检查级二元分类方面,强学习者和弱学习者在内部验证分割(0.96 vs 0.96; P = .64)和 CQ500 数据集(0.90 vs 0.92; P = .15)上的接收器操作特征曲线下面积值没有差异。在 CT-ICH 数据集上,弱学习者的表现优于强学习者(0.95 vs 0.92;P = .03)。当可用于训练的标签超过 10,000 个时,弱学习者的切片级 ICH 检测性能更好(平均 f1 = 0.73 vs 0.65;P < .001)。在整个 RSNA 数据集上训练的弱监督模型所需的标签比同等的强学习者少 35 倍。结论 强监督模型并不比弱监督模型取得更好的性能,而弱监督模型可以减少放射科医生在前瞻性数据集整理方面的人力需求。关键词CT、头颈部、脑/脑干、出血 本文有补充材料。© RSNA, 2023 另请参阅本期 Wahid 和 Fuentes 的评论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。