Comparing the performance of a deep learning-based lung gross tumour volume segmentation algorithm before and after transfer learning in a new hospital.

BJR open Pub Date : 2023-12-12 eCollection Date: 2024-01-01 DOI:10.1093/bjro/tzad008

Chaitanya Kulkarni, Umesh Sherkhane, Vinay Jaiswar, Sneha Mithun, Dinesh Mysore Siddu, Venkatesh Rangarajan, Andre Dekker, Alberto Traverso, Ashish Jha, Leonard Wee

{"title":"Comparing the performance of a deep learning-based lung gross tumour volume segmentation algorithm before and after transfer learning in a new hospital.","authors":"Chaitanya Kulkarni, Umesh Sherkhane, Vinay Jaiswar, Sneha Mithun, Dinesh Mysore Siddu, Venkatesh Rangarajan, Andre Dekker, Alberto Traverso, Ashish Jha, Leonard Wee","doi":"10.1093/bjro/tzad008","DOIUrl":null,"url":null,"abstract":"Objectives: Radiation therapy for lung cancer requires a gross tumour volume (GTV) to be carefully outlined by a skilled radiation oncologist (RO) to accurately pinpoint high radiation dose to a malignant mass while simultaneously minimizing radiation damage to adjacent normal tissues. This is manually intensive and tedious however, it is feasible to train a deep learning (DL) neural network that could assist ROs to delineate the GTV. However, DL trained on large openly accessible data sets might not perform well when applied to a superficially similar task but in a different clinical setting. In this work, we tested the performance of DL automatic lung GTV segmentation model trained on open-access Dutch data when used on Indian patients from a large public tertiary hospital, and hypothesized that generic DL performance could be improved for a specific local clinical context, by means of modest transfer-learning on a small representative local subset.Methods: X-ray computed tomography (CT) series in a public data set called \"NSCLC-Radiomics\" from The Cancer Imaging Archive was first used to train a DL-based lung GTV segmentation model (Model 1). Its performance was assessed using a different open access data set (Interobserver1) of Dutch subjects plus a private Indian data set from a local tertiary hospital (Test Set 2). Another Indian data set (Retrain Set 1) was used to fine-tune the former DL model using a transfer learning method. The Indian data sets were taken from CT of a hybrid scanner based in nuclear medicine, but the GTV was drawn by skilled Indian ROs. The final (after fine-tuning) model (Model 2) was then re-evaluated in \"Interobserver1\" and \"Test Set 2.\" Dice similarity coefficient (DSC), precision, and recall were used as geometric segmentation performance metrics.Results: Model 1 trained exclusively on Dutch scans showed a significant fall in performance when tested on \"Test Set 2.\" However, the DSC of Model 2 recovered by 14 percentage points when evaluated in the same test set. Precision and recall showed a similar rebound of performance after transfer learning, in spite of using a comparatively small sample size. The performance of both models, before and after the fine-tuning, did not significantly change the segmentation performance in \"Interobserver1.\"Conclusions: A large public open-access data set was used to train a generic DL model for lung GTV segmentation, but this did not perform well initially in the Indian clinical context. Using transfer learning methods, it was feasible to efficiently and easily fine-tune the generic model using only a small number of local examples from the Indian hospital. This led to a recovery of some of the geometric segmentation performance, but the tuning did not appear to affect the performance of the model in another open-access data set.Advances in knowledge: Caution is needed when using models trained on large volumes of international data in a local clinical setting, even when that training data set is of good quality. Minor differences in scan acquisition and clinician delineation preferences may result in an apparent drop in performance. However, DL models have the advantage of being efficiently \"adapted\" from a generic to a locally specific context, with only a small amount of fine-tuning by means of transfer learning on a small local institutional data set.","PeriodicalId":72419,"journal":{"name":"BJR open","volume":"6 1","pages":"tzad008"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10860512/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJR open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bjro/tzad008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Radiation therapy for lung cancer requires a gross tumour volume (GTV) to be carefully outlined by a skilled radiation oncologist (RO) to accurately pinpoint high radiation dose to a malignant mass while simultaneously minimizing radiation damage to adjacent normal tissues. This is manually intensive and tedious however, it is feasible to train a deep learning (DL) neural network that could assist ROs to delineate the GTV. However, DL trained on large openly accessible data sets might not perform well when applied to a superficially similar task but in a different clinical setting. In this work, we tested the performance of DL automatic lung GTV segmentation model trained on open-access Dutch data when used on Indian patients from a large public tertiary hospital, and hypothesized that generic DL performance could be improved for a specific local clinical context, by means of modest transfer-learning on a small representative local subset.

Methods: X-ray computed tomography (CT) series in a public data set called "NSCLC-Radiomics" from The Cancer Imaging Archive was first used to train a DL-based lung GTV segmentation model (Model 1). Its performance was assessed using a different open access data set (Interobserver1) of Dutch subjects plus a private Indian data set from a local tertiary hospital (Test Set 2). Another Indian data set (Retrain Set 1) was used to fine-tune the former DL model using a transfer learning method. The Indian data sets were taken from CT of a hybrid scanner based in nuclear medicine, but the GTV was drawn by skilled Indian ROs. The final (after fine-tuning) model (Model 2) was then re-evaluated in "Interobserver1" and "Test Set 2." Dice similarity coefficient (DSC), precision, and recall were used as geometric segmentation performance metrics.

Results: Model 1 trained exclusively on Dutch scans showed a significant fall in performance when tested on "Test Set 2." However, the DSC of Model 2 recovered by 14 percentage points when evaluated in the same test set. Precision and recall showed a similar rebound of performance after transfer learning, in spite of using a comparatively small sample size. The performance of both models, before and after the fine-tuning, did not significantly change the segmentation performance in "Interobserver1."

Conclusions: A large public open-access data set was used to train a generic DL model for lung GTV segmentation, but this did not perform well initially in the Indian clinical context. Using transfer learning methods, it was feasible to efficiently and easily fine-tune the generic model using only a small number of local examples from the Indian hospital. This led to a recovery of some of the geometric segmentation performance, but the tuning did not appear to affect the performance of the model in another open-access data set.

Advances in knowledge: Caution is needed when using models trained on large volumes of international data in a local clinical setting, even when that training data set is of good quality. Minor differences in scan acquisition and clinician delineation preferences may result in an apparent drop in performance. However, DL models have the advantage of being efficiently "adapted" from a generic to a locally specific context, with only a small amount of fine-tuning by means of transfer learning on a small local institutional data set.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

比较基于深度学习的肺毛肿瘤体积分割算法在新医院进行迁移学习前后的性能。

目的：肺癌的放射治疗需要熟练的放射肿瘤学家（RO）仔细勾画出肿瘤的总体积（GTV），以便准确地将高放射剂量照射到恶性肿块上，同时最大限度地减少对邻近正常组织的放射损伤。这需要大量的人工操作，非常繁琐，但是，训练一个深度学习（DL）神经网络是可行的，它可以帮助放射肿瘤学家划定 GTV。然而，在大型公开数据集上训练的深度学习神经网络在应用于表面相似但临床环境不同的任务时可能表现不佳。在这项工作中，我们测试了在开放访问的荷兰数据上训练的 DL 自动肺部 GTV 分割模型在用于一家大型公立三甲医院的印度患者时的性能，并假设通过在一个小的有代表性的本地子集上进行适度的迁移学习，可以针对特定的本地临床环境提高通用 DL 的性能：方法：首先使用癌症影像档案馆名为 "NSCLC-Radiomics "的公共数据集中的 X 射线计算机断层扫描（CT）序列来训练基于 DL 的肺 GTV 分割模型（模型 1）。模型 1 的性能使用不同的公开访问数据集（Interobserver1）进行评估，该数据集包含荷兰受试者和来自当地一家三甲医院的印度私人数据集（测试集 2）。另一个印度数据集（Retrain Set 1）用于使用迁移学习方法对前一个 DL 模型进行微调。印度数据集来自核医学混合扫描仪的 CT，但 GTV 是由熟练的印度 RO 绘制的。然后在 "观察者间 1 "和 "测试集 2 "中对最终（微调后）模型（模型 2）进行重新评估。骰子相似系数（DSC）、精确度和召回率被用作几何分割性能指标：在 "测试集 2 "上进行测试时，完全根据荷兰扫描结果训练的模型 1 的性能明显下降。然而，在同一测试集中进行评估时，模型 2 的 DSC 恢复了 14 个百分点。尽管使用的样本量相对较小，但经过迁移学习后，精确度和召回率都出现了类似的性能反弹。两个模型在微调前后的性能都没有显著改变 "Interobserver1 "的分割性能：我们使用了一个大型公共开放数据集来训练肺GTV分割的通用DL模型，但该模型在印度临床环境中的初始表现并不理想。利用迁移学习方法，只需使用来自印度医院的少量本地示例，就能高效、轻松地对通用模型进行微调。这使得一些几何分割性能得以恢复，但调整似乎并未影响该模型在另一个开放数据集中的性能：在本地临床环境中使用根据大量国际数据训练的模型时需要谨慎，即使训练数据集的质量很好。扫描采集和临床医生划线偏好的细微差别可能会导致性能明显下降。然而，DL 模型的优势在于可以有效地从通用模型 "调整 "到本地特定环境，只需在本地机构的小型数据集上通过迁移学习进行少量微调即可。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BJR open

自引率

0.00%

发文量

审稿时长

18 weeks