Children Are Not Small Adults: Addressing Limited Generalizability of an Adult Deep Learning CT Organ Segmentation Model to the Pediatric Population.

Journal of imaging informatics in medicine Pub Date : 2024-09-19 DOI:10.1007/s10278-024-01273-w

Devina Chatterjee, Adway Kanhere, Florence X Doo, Jerry Zhao, Andrew Chan, Alexander Welsh, Pranav Kulkarni, Annie Trang, Vishwa S Parekh, Paul H Yi

{"title":"Children Are Not Small Adults: Addressing Limited Generalizability of an Adult Deep Learning CT Organ Segmentation Model to the Pediatric Population.","authors":"Devina Chatterjee, Adway Kanhere, Florence X Doo, Jerry Zhao, Andrew Chan, Alexander Welsh, Pranav Kulkarni, Annie Trang, Vishwa S Parekh, Paul H Yi","doi":"10.1007/s10278-024-01273-w","DOIUrl":null,"url":null,"abstract":"<p><p>Deep learning (DL) tools developed on adult data sets may not generalize well to pediatric patients, posing potential safety risks. We evaluated the performance of TotalSegmentator, a state-of-the-art adult-trained CT organ segmentation model, on a subset of organs in a pediatric CT dataset and explored optimization strategies to improve pediatric segmentation performance. TotalSegmentator was retrospectively evaluated on abdominal CT scans from an external adult dataset (n = 300) and an external pediatric data set (n = 359). Generalizability was quantified by comparing Dice scores between adult and pediatric external data sets using Mann-Whitney U tests. Two DL optimization approaches were then evaluated: (1) 3D nnU-Net model trained on only pediatric data, and (2) an adult nnU-Net model fine-tuned on the pediatric cases. Our results show TotalSegmentator had significantly lower overall mean Dice scores on pediatric vs. adult CT scans (0.73 vs. 0.81, P < .001) demonstrating limited generalizability to pediatric CT scans. Stratified by organ, there was lower mean pediatric Dice score for four organs (P < .001, all): right and left adrenal glands (right adrenal, 0.41 [0.39-0.43] vs. 0.69 [0.66-0.71]; left adrenal, 0.35 [0.32-0.37] vs. 0.68 [0.65-0.71]); duodenum (0.47 [0.45-0.49] vs. 0.67 [0.64-0.69]); and pancreas (0.73 [0.72-0.74] vs. 0.79 [0.77-0.81]). Performance on pediatric CT scans improved by developing pediatric-specific models and fine-tuning an adult-trained model on pediatric images where both methods significantly improved segmentation accuracy over TotalSegmentator for all organs, especially for smaller anatomical structures (e.g., > 0.2 higher mean Dice for adrenal glands; P < .001).</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01273-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning (DL) tools developed on adult data sets may not generalize well to pediatric patients, posing potential safety risks. We evaluated the performance of TotalSegmentator, a state-of-the-art adult-trained CT organ segmentation model, on a subset of organs in a pediatric CT dataset and explored optimization strategies to improve pediatric segmentation performance. TotalSegmentator was retrospectively evaluated on abdominal CT scans from an external adult dataset (n = 300) and an external pediatric data set (n = 359). Generalizability was quantified by comparing Dice scores between adult and pediatric external data sets using Mann-Whitney U tests. Two DL optimization approaches were then evaluated: (1) 3D nnU-Net model trained on only pediatric data, and (2) an adult nnU-Net model fine-tuned on the pediatric cases. Our results show TotalSegmentator had significantly lower overall mean Dice scores on pediatric vs. adult CT scans (0.73 vs. 0.81, P < .001) demonstrating limited generalizability to pediatric CT scans. Stratified by organ, there was lower mean pediatric Dice score for four organs (P < .001, all): right and left adrenal glands (right adrenal, 0.41 [0.39-0.43] vs. 0.69 [0.66-0.71]; left adrenal, 0.35 [0.32-0.37] vs. 0.68 [0.65-0.71]); duodenum (0.47 [0.45-0.49] vs. 0.67 [0.64-0.69]); and pancreas (0.73 [0.72-0.74] vs. 0.79 [0.77-0.81]). Performance on pediatric CT scans improved by developing pediatric-specific models and fine-tuning an adult-trained model on pediatric images where both methods significantly improved segmentation accuracy over TotalSegmentator for all organs, especially for smaller anatomical structures (e.g., > 0.2 higher mean Dice for adrenal glands; P < .001).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

儿童不是小大人：解决成人深度学习 CT 器官分割模型在儿科人群中通用性有限的问题。

在成人数据集上开发的深度学习（DL）工具可能无法很好地推广到儿科患者，从而带来潜在的安全风险。我们在儿科 CT 数据集的器官子集上评估了最先进的成人训练 CT 器官分割模型 TotalSegmentator 的性能，并探索了提高儿科分割性能的优化策略。在外部成人数据集（n = 300）和外部儿科数据集（n = 359）的腹部 CT 扫描上对 TotalSegmentator 进行了回顾性评估。通过使用 Mann-Whitney U 检验比较成人和儿童外部数据集的 Dice 分数，对通用性进行量化。然后对两种 DL 优化方法进行了评估：(1) 仅根据儿科数据训练的 3D nnU-Net 模型；(2) 根据儿科病例微调的成人 nnU-Net 模型。我们的结果表明，TotalSegmentator 在儿科与成人 CT 扫描上的总体平均 Dice 分数明显较低（0.73 与 0.81，P 0.2，肾上腺平均 Dice 分数较高；P 0.3，肾上腺平均 Dice 分数较低；P 0.4，肾上腺平均 Dice 分数较高）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量