Accelerating voxelwise annotation of cross-sectional imaging through AI collaborative labeling with quality assurance and bias mitigation.

IF 2.3 Frontiers in radiology Pub Date : 2023-01-01 Epub Date: 2023-07-11 DOI:10.3389/fradi.2023.1202412

David Dreizin, Lei Zhang, Nathan Sarkar, Uttam K Bodanapally, Guang Li, Jiazhen Hu, Haomin Chen, Mustafa Khedr, Udit Khetan, Peter Campbell, Mathias Unberath

{"title":"Accelerating voxelwise annotation of cross-sectional imaging through AI collaborative labeling with quality assurance and bias mitigation.","authors":"David Dreizin, Lei Zhang, Nathan Sarkar, Uttam K Bodanapally, Guang Li, Jiazhen Hu, Haomin Chen, Mustafa Khedr, Udit Khetan, Peter Campbell, Mathias Unberath","doi":"10.3389/fradi.2023.1202412","DOIUrl":null,"url":null,"abstract":"Background: precision-medicine quantitative tools for cross-sectional imaging require painstaking labeling of targets that vary considerably in volume, prohibiting scaling of data annotation efforts and supervised training to large datasets for robust and generalizable clinical performance. A straight-forward time-saving strategy involves manual editing of AI-generated labels, which we call AI-collaborative labeling (AICL). Factors affecting the efficacy and utility of such an approach are unknown. Reduction in time effort is not well documented. Further, edited AI labels may be prone to automation bias.Purpose: In this pilot, using a cohort of CTs with intracavitary hemorrhage, we evaluate both time savings and AICL label quality and propose criteria that must be met for using AICL annotations as a high-throughput, high-quality ground truth.Methods: 57 CT scans of patients with traumatic intracavitary hemorrhage were included. No participant recruited for this study had previously interpreted the scans. nnU-net models trained on small existing datasets for each feature (hemothorax/hemoperitoneum/pelvic hematoma; n = 77-253) were used in inference. Two common scenarios served as baseline comparison- de novo expert manual labeling, and expert edits of trained staff labels. Parameters included time effort and image quality graded by a blinded independent expert using a 9-point scale. The observer also attempted to discriminate AICL and expert labels in a random subset (n = 18). Data were compared with ANOVA and post-hoc paired signed rank tests with Bonferroni correction.Results: AICL reduced time effort 2.8-fold compared to staff label editing, and 8.7-fold compared to expert labeling (corrected p < 0.0006). Mean Likert grades for AICL (8.4, SD:0.6) were significantly higher than for expert labels (7.8, SD:0.9) and edited staff labels (7.7, SD:0.8) (corrected p < 0.0006). The independent observer failed to correctly discriminate AI and human labels.Conclusion: For our use case and annotators, AICL facilitates rapid large-scale curation of high-quality ground truth. The proposed quality control regime can be employed by other investigators prior to embarking on AICL for segmentation tasks in large datasets.","PeriodicalId":73101,"journal":{"name":"Frontiers in radiology","volume":"3 ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10362988/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in radiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fradi.2023.1202412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/7/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: precision-medicine quantitative tools for cross-sectional imaging require painstaking labeling of targets that vary considerably in volume, prohibiting scaling of data annotation efforts and supervised training to large datasets for robust and generalizable clinical performance. A straight-forward time-saving strategy involves manual editing of AI-generated labels, which we call AI-collaborative labeling (AICL). Factors affecting the efficacy and utility of such an approach are unknown. Reduction in time effort is not well documented. Further, edited AI labels may be prone to automation bias.

Purpose: In this pilot, using a cohort of CTs with intracavitary hemorrhage, we evaluate both time savings and AICL label quality and propose criteria that must be met for using AICL annotations as a high-throughput, high-quality ground truth.

Methods: 57 CT scans of patients with traumatic intracavitary hemorrhage were included. No participant recruited for this study had previously interpreted the scans. nnU-net models trained on small existing datasets for each feature (hemothorax/hemoperitoneum/pelvic hematoma; n = 77-253) were used in inference. Two common scenarios served as baseline comparison- de novo expert manual labeling, and expert edits of trained staff labels. Parameters included time effort and image quality graded by a blinded independent expert using a 9-point scale. The observer also attempted to discriminate AICL and expert labels in a random subset (n = 18). Data were compared with ANOVA and post-hoc paired signed rank tests with Bonferroni correction.

Results: AICL reduced time effort 2.8-fold compared to staff label editing, and 8.7-fold compared to expert labeling (corrected p < 0.0006). Mean Likert grades for AICL (8.4, SD:0.6) were significantly higher than for expert labels (7.8, SD:0.9) and edited staff labels (7.7, SD:0.8) (corrected p < 0.0006). The independent observer failed to correctly discriminate AI and human labels.

Conclusion: For our use case and annotators, AICL facilitates rapid large-scale curation of high-quality ground truth. The proposed quality control regime can be employed by other investigators prior to embarking on AICL for segmentation tasks in large datasets.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过具有质量保证和减少偏差功能的人工智能协作标注，加速横断面成像的体素标注。

背景：用于横断面成像的精准医疗定量工具需要对体积差异很大的靶标进行艰苦的标注，这使得数据标注工作和监督训练无法扩展到大型数据集，从而无法获得稳健、可推广的临床表现。一种直接省时的策略是手动编辑人工智能生成的标签，我们称之为人工智能协作标签（AICL）。影响这种方法有效性和实用性的因素尚不清楚。减少时间方面的努力还没有很好的记录。目的：在这项试验中，我们利用一组腔内出血的 CT 扫描，对节省的时间和 AICL 标签质量进行了评估，并提出了使用 AICL 注释作为高通量、高质量地面实况必须满足的标准。在推断过程中使用了在现有小型数据集上针对每个特征（血胸/腹腔积血/骨盆血肿；n = 77-253）训练的 nnU 网络模型。两种常见情况作为基线比较--从头开始的专家人工标注和专家对训练有素的工作人员标注的编辑。参数包括时间精力和图像质量，由盲法独立专家使用 9 分制评分。观察者还尝试在随机子集中区分 AICL 和专家标签（n = 18）。数据比较采用方差分析和事后配对符号秩检验，并进行 Bonferroni 校正：与员工标签编辑相比，AICL 节省了 2.8 倍的时间，与专家标签编辑相比，AICL 节省了 8.7 倍的时间（校正后 p < 0.0006）。AICL 的平均李克特评分（8.4，标准差：0.6）明显高于专家标签（7.8，标准差：0.9）和编辑过的员工标签（7.7，标准差：0.8）（校正后 p < 0.0006）。独立观察者未能正确区分人工智能和人类标签：对于我们的使用案例和注释者来说，AICL 可以快速大规模地整理高质量的基本事实。其他研究人员在开始使用 AICL 执行大型数据集的分割任务之前，可以采用建议的质量控制制度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in radiology

CiteScore

1.20

自引率

0.00%

发文量