Data augmented lung cancer prediction framework using the nested case control NLST cohort.

IF 3.5 3区 医学 Q2 ONCOLOGY Frontiers in Oncology Pub Date : 2025-02-25 eCollection Date: 2025-01-01 DOI:10.3389/fonc.2025.1492758
Yifan Jiang, Venkata S K Manem
{"title":"Data augmented lung cancer prediction framework using the nested case control NLST cohort.","authors":"Yifan Jiang, Venkata S K Manem","doi":"10.3389/fonc.2025.1492758","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>In the context of lung cancer screening, the scarcity of well-labeled medical images poses a significant challenge to implement supervised learning-based deep learning methods. While data augmentation is an effective technique for countering the difficulties caused by insufficient data, it has not been fully explored in the context of lung cancer screening. In this research study, we analyzed the state-of-the-art (SOTA) data augmentation techniques for lung cancer binary prediction.</p><p><strong>Methods: </strong>To comprehensively evaluate the efficiency of data augmentation approaches, we considered the nested case control National Lung Screening Trial (NLST) cohort comprising of 253 individuals who had the commonly used CT scans without contrast. The CT scans were pre-processed into three-dimensional volumes based on the lung nodule annotations. Subsequently, we evaluated five basic (online) and two generative model-based offline data augmentation methods with ten state-of-the-art (SOTA) 3D deep learning-based lung cancer prediction models.</p><p><strong>Results: </strong>Our results demonstrated that the performance improvement by data augmentation was highly dependent on approach used. The Cutmix method resulted in the highest average performance improvement across all three metrics: 1.07%, 3.29%, 1.19% for accuracy, F1 score and AUC, respectively. MobileNetV2 with a simple data augmentation approach achieved the best AUC of 0.8719 among all lung cancer predictors, demonstrating a 7.62% improvement compared to baseline. Furthermore, the MED-DDPM data augmentation approach was able to improve prediction performance by rebalancing the training set and adding moderately synthetic data.</p><p><strong>Conclusions: </strong>The effectiveness of online and offline data augmentation methods were highly sensitive to the prediction model, highlighting the importance of carefully selecting the optimal data augmentation method. Our findings suggest that certain traditional methods can provide more stable and higher performance compared to SOTA online data augmentation approaches. Overall, these results offer meaningful insights for the development and clinical integration of data augmented deep learning tools for lung cancer screening.</p>","PeriodicalId":12482,"journal":{"name":"Frontiers in Oncology","volume":"15 ","pages":"1492758"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11893409/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fonc.2025.1492758","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: In the context of lung cancer screening, the scarcity of well-labeled medical images poses a significant challenge to implement supervised learning-based deep learning methods. While data augmentation is an effective technique for countering the difficulties caused by insufficient data, it has not been fully explored in the context of lung cancer screening. In this research study, we analyzed the state-of-the-art (SOTA) data augmentation techniques for lung cancer binary prediction.

Methods: To comprehensively evaluate the efficiency of data augmentation approaches, we considered the nested case control National Lung Screening Trial (NLST) cohort comprising of 253 individuals who had the commonly used CT scans without contrast. The CT scans were pre-processed into three-dimensional volumes based on the lung nodule annotations. Subsequently, we evaluated five basic (online) and two generative model-based offline data augmentation methods with ten state-of-the-art (SOTA) 3D deep learning-based lung cancer prediction models.

Results: Our results demonstrated that the performance improvement by data augmentation was highly dependent on approach used. The Cutmix method resulted in the highest average performance improvement across all three metrics: 1.07%, 3.29%, 1.19% for accuracy, F1 score and AUC, respectively. MobileNetV2 with a simple data augmentation approach achieved the best AUC of 0.8719 among all lung cancer predictors, demonstrating a 7.62% improvement compared to baseline. Furthermore, the MED-DDPM data augmentation approach was able to improve prediction performance by rebalancing the training set and adding moderately synthetic data.

Conclusions: The effectiveness of online and offline data augmentation methods were highly sensitive to the prediction model, highlighting the importance of carefully selecting the optimal data augmentation method. Our findings suggest that certain traditional methods can provide more stable and higher performance compared to SOTA online data augmentation approaches. Overall, these results offer meaningful insights for the development and clinical integration of data augmented deep learning tools for lung cancer screening.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Oncology
Frontiers in Oncology Biochemistry, Genetics and Molecular Biology-Cancer Research
CiteScore
6.20
自引率
10.60%
发文量
6641
审稿时长
14 weeks
期刊介绍: Cancer Imaging and Diagnosis is dedicated to the publication of results from clinical and research studies applied to cancer diagnosis and treatment. The section aims to publish studies from the entire field of cancer imaging: results from routine use of clinical imaging in both radiology and nuclear medicine, results from clinical trials, experimental molecular imaging in humans and small animals, research on new contrast agents in CT, MRI, ultrasound, publication of new technical applications and processing algorithms to improve the standardization of quantitative imaging and image guided interventions for the diagnosis and treatment of cancer.
期刊最新文献
Case Report: Prepubertal-type testicular teratoma with local metastasis in a postpubertal patient. Case Report: Thymic neuroendocrine tumor with metastasis to the breast causing ectopic Cushing's syndrome. Compound cancer with small cell carcinoma and mucinous adenocarcinoma of the ovary: a case report and literature review. CENP-H as a new prognostic biomarker for tumors: a real-world literature review. Corrigendum: Exploring the impact of HDL and LMNA gene expression on immunotherapy outcomes in NSCLC: a comprehensive analysis using clinical & gene data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1