Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study

Yu Zhao, Shan Xiong, Qin Ren, Jun Wang, Min Li, Lin Yang, Di Wu, Kejing Tang, Xiaojie Pan, Fengxia Chen, Wenxiang Wang, Shi Jin, Xianling Liu, Gen Lin, Wenxiu Yao, Linbo Cai, Yi Yang, Jixian Liu, Jingxun Wu, Wenfan Fu, Wenhua Liang
{"title":"Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study","authors":"Yu Zhao, Shan Xiong, Qin Ren, Jun Wang, Min Li, Lin Yang, Di Wu, Kejing Tang, Xiaojie Pan, Fengxia Chen, Wenxiang Wang, Shi Jin, Xianling Liu, Gen Lin, Wenxiu Yao, Linbo Cai, Yi Yang, Jixian Liu, Jingxun Wu, Wenfan Fu, Wenhua Liang","doi":"10.1016/s1470-2045(24)00599-0","DOIUrl":null,"url":null,"abstract":"<h3>Background</h3>Accurate detection of driver gene mutations is crucial for treatment planning and predicting prognosis for patients with lung cancer. Conventional genomic testing requires high-quality tissue samples and is time-consuming and resource-consuming, and as a result, is not available for most patients, especially those in low-resource settings. We aimed to develop an annotation-free Deep learning-enabled artificial intelligence method to predict GEne Mutations (DeepGEM) from routinely acquired histological slides.<h3>Methods</h3>In this multicentre retrospective study, we collected data for patients with lung cancer who had a biopsy and multigene next-generation sequencing done at 16 hospitals in China (with no restrictions on age, sex, or histology type), to form a large multicentre dataset comprising paired pathological image and multiple gene mutation information. We also included patients from The Cancer Genome Atlas (TCGA) publicly available dataset. Our developed model is an instance-level and bag-level co-supervised multiple instance learning method with label disambiguation design. We trained and initially tested the DeepGEM model on the internal dataset (patients from the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China), and further evaluated it on the external dataset (patients from the remaining 15 centres) and the public TCGA dataset. Additionally, a dataset of patients from the same medical centre as the internal dataset, but without overlap, was used to evaluate the model's generalisation ability to biopsy samples from lymph node metastases. The primary objective was the performance of the DeepGEM model in predicting gene mutations (area under the curve [AUC] and accuracy) in the four prespecified groups (ie, the hold-out internal test set, multicentre external test set, TCGA set, and lymph node metastases set).<h3>Findings</h3>Assessable pathological images and multigene testing information were available for 3697 patients who had biopsy and multigene next-generation sequencing done between Jan 1, 2018, and March 31, 2022, at the 16 centres. We excluded 60 patients with low-quality images. We included 3767 images from 3637 consecutive patients (1978 [54·4%] men, 1514 [41·6%] women, 145 [4·0%] unknown; median age 60 years [IQR 52–67]), with 1716 patients in the internal dataset, 1718 patients in the external dataset, and 203 patients in the lymph node metastases dataset. The DeepGEM model showed robust performance in the internal dataset: for excisional biopsy samples, AUC values for gene mutation prediction ranged from 0·90 (95% CI 0·77–1·00) to 0·97 (0·93–1·00) and accuracy values ranged from 0·91 (0·85–0·98) to 0·97 (0·93–1·00); for aspiration biopsy samples, AUC values ranged from 0·85 (0·80–0·91) to 0·95 (0·86–1·00) and accuracy values ranged from 0·79 (0·74–0·85) to 0·99 (0·98–1·00). In the multicentre external dataset, for excisional biopsy samples, AUC values ranged from 0·80 (95% CI 0·75–0·85) to 0·91 (0·88–1·00) and accuracy values ranged from 0·79 (0·76–0·82) to 0·95 (0·93–0·96); for aspiration biopsy samples, AUC values ranged from 0·76 (0·70–0·83) to 0·87 (0·80–0·94) and accuracy values ranged from 0·76 (0·74–0·79) to 0·97 (0·96–0·98). The model also showed strong performance on the TCGA dataset (473 patients; 535 slides; AUC values ranged from 0·82 [95% CI 0·71–0·93] to 0·96 [0·91–1·00], accuracy values ranged from 0·79 [0·70–0·88] to 0·95 [0·90–1·00]). The DeepGEM model, trained on primary region biopsy samples, could be generalised to biopsy samples from lymph node metastases, with AUC values of 0·91 (95% CI 0·88–0·94) for <em>EGFR</em> and 0·88 (0·82–0·93) for <em>KRAS</em> and accuracy values of 0·85 (0·80–0·88) for <em>EGFR</em> and 0·95 (0·92–0·96) for <em>KRAS</em> and showed potential for prognostic prediction of targeted therapy. The model generated spatial gene mutation maps, indicating gene mutation spatial distribution.<h3>Interpretation</h3>We developed an AI-based method that can provide an accurate, timely, and economical prediction of gene mutation and mutation spatial distribution. The method showed substantial potential as an assistive tool for guiding the clinical treatment of patients with lung cancer.<h3>Funding</h3>National Natural Science Foundation of China, the Science and Technology Planning Project of Guangzhou, and the National Key Research and Development Program of China.<h3>Translation</h3>For the Chinese translation of the abstract see Supplementary Materials section.","PeriodicalId":22865,"journal":{"name":"The Lancet Oncology","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/s1470-2045(24)00599-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Accurate detection of driver gene mutations is crucial for treatment planning and predicting prognosis for patients with lung cancer. Conventional genomic testing requires high-quality tissue samples and is time-consuming and resource-consuming, and as a result, is not available for most patients, especially those in low-resource settings. We aimed to develop an annotation-free Deep learning-enabled artificial intelligence method to predict GEne Mutations (DeepGEM) from routinely acquired histological slides.

Methods

In this multicentre retrospective study, we collected data for patients with lung cancer who had a biopsy and multigene next-generation sequencing done at 16 hospitals in China (with no restrictions on age, sex, or histology type), to form a large multicentre dataset comprising paired pathological image and multiple gene mutation information. We also included patients from The Cancer Genome Atlas (TCGA) publicly available dataset. Our developed model is an instance-level and bag-level co-supervised multiple instance learning method with label disambiguation design. We trained and initially tested the DeepGEM model on the internal dataset (patients from the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China), and further evaluated it on the external dataset (patients from the remaining 15 centres) and the public TCGA dataset. Additionally, a dataset of patients from the same medical centre as the internal dataset, but without overlap, was used to evaluate the model's generalisation ability to biopsy samples from lymph node metastases. The primary objective was the performance of the DeepGEM model in predicting gene mutations (area under the curve [AUC] and accuracy) in the four prespecified groups (ie, the hold-out internal test set, multicentre external test set, TCGA set, and lymph node metastases set).

Findings

Assessable pathological images and multigene testing information were available for 3697 patients who had biopsy and multigene next-generation sequencing done between Jan 1, 2018, and March 31, 2022, at the 16 centres. We excluded 60 patients with low-quality images. We included 3767 images from 3637 consecutive patients (1978 [54·4%] men, 1514 [41·6%] women, 145 [4·0%] unknown; median age 60 years [IQR 52–67]), with 1716 patients in the internal dataset, 1718 patients in the external dataset, and 203 patients in the lymph node metastases dataset. The DeepGEM model showed robust performance in the internal dataset: for excisional biopsy samples, AUC values for gene mutation prediction ranged from 0·90 (95% CI 0·77–1·00) to 0·97 (0·93–1·00) and accuracy values ranged from 0·91 (0·85–0·98) to 0·97 (0·93–1·00); for aspiration biopsy samples, AUC values ranged from 0·85 (0·80–0·91) to 0·95 (0·86–1·00) and accuracy values ranged from 0·79 (0·74–0·85) to 0·99 (0·98–1·00). In the multicentre external dataset, for excisional biopsy samples, AUC values ranged from 0·80 (95% CI 0·75–0·85) to 0·91 (0·88–1·00) and accuracy values ranged from 0·79 (0·76–0·82) to 0·95 (0·93–0·96); for aspiration biopsy samples, AUC values ranged from 0·76 (0·70–0·83) to 0·87 (0·80–0·94) and accuracy values ranged from 0·76 (0·74–0·79) to 0·97 (0·96–0·98). The model also showed strong performance on the TCGA dataset (473 patients; 535 slides; AUC values ranged from 0·82 [95% CI 0·71–0·93] to 0·96 [0·91–1·00], accuracy values ranged from 0·79 [0·70–0·88] to 0·95 [0·90–1·00]). The DeepGEM model, trained on primary region biopsy samples, could be generalised to biopsy samples from lymph node metastases, with AUC values of 0·91 (95% CI 0·88–0·94) for EGFR and 0·88 (0·82–0·93) for KRAS and accuracy values of 0·85 (0·80–0·88) for EGFR and 0·95 (0·92–0·96) for KRAS and showed potential for prognostic prediction of targeted therapy. The model generated spatial gene mutation maps, indicating gene mutation spatial distribution.

Interpretation

We developed an AI-based method that can provide an accurate, timely, and economical prediction of gene mutation and mutation spatial distribution. The method showed substantial potential as an assistive tool for guiding the clinical treatment of patients with lung cancer.

Funding

National Natural Science Foundation of China, the Science and Technology Planning Project of Guangzhou, and the National Key Research and Development Program of China.

Translation

For the Chinese translation of the abstract see Supplementary Materials section.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用组织学图像进行肺癌基因突变预测的深度学习:一项多中心回顾性研究
背景准确检测驱动基因突变对肺癌患者的治疗计划和预测预后至关重要。传统的基因组检测需要高质量的组织样本,耗时耗力,因此大多数患者,特别是资源匮乏地区的患者无法使用。我们的目标是开发一种无注释的深度学习人工智能方法,从常规获得的组织学切片中预测基因突变(DeepGEM)。方法在这项多中心回顾性研究中,我们收集了在中国16家医院进行活检和多基因下一代测序的肺癌患者的数据(没有年龄、性别或组织学类型的限制),形成了一个大型的多中心数据集,包括成对的病理图像和多基因突变信息。我们还纳入了来自癌症基因组图谱(TCGA)公开数据集的患者。我们开发的模型是一种具有标签消歧设计的实例级和袋级共同监督多实例学习方法。我们在内部数据集(来自中国广州医科大学第一附属医院的患者)上训练并初步测试了DeepGEM模型,并在外部数据集(来自其余15个中心的患者)和公共TCGA数据集上进一步评估了它。此外,使用来自同一医疗中心的患者数据集作为内部数据集,但没有重叠,用于评估模型对淋巴结转移活检样本的泛化能力。主要目的是研究DeepGEM模型在四个预先指定的组(即保留内部测试集、多中心外部测试集、TCGA集和淋巴结转移集)中预测基因突变的性能(曲线下面积[AUC]和准确性)。在2018年1月1日至2022年3月31日期间,在16个中心进行了活检和多基因下一代测序的3697例患者可获得可评估的病理图像和多基因检测信息。我们排除了60例低质量图像的患者。我们纳入了来自3637例连续患者的3767张图像(男性1978[54.4%],女性1514[41.6%],未知145 [4.0%];中位年龄60岁[IQR 52-67]),内部数据集中有1716例,外部数据集中有1718例,淋巴结转移数据集中有203例。DeepGEM模型在内部数据集中表现出稳健的性能:对于切除活检样本,基因突变预测的AUC值范围为0.90 (95% CI为0.77 - 1.00)至0.97 (95% CI为0.93 - 1.00),准确率范围为0.91 (95% CI为0.85 - 0.98)至0.97 (0.93 - 1.00);对于抽吸活检样本,AUC值范围为0.85(0.80 - 0.91)至0.95(0.86 - 1.00),准确度值范围为0.79(0.74 - 0.85)至0.99(0.98 - 1.00)。在多中心外部数据集中,对于切除活检样本,AUC值范围为0.80 (95% CI为0.75 - 0.85)至0.91(0.88 - 1.00),准确度值范围为0.79(0.76 - 0.82)至0.95 (0.93 - 0.96);对于穿刺活检样本,AUC值范围为0.76(0.70 - 0.83)至0.87(0.80 - 0.94),准确度范围为0.76(0.74 - 0.79)至0.97(0.96 - 0.98)。该模型在TCGA数据集(473例患者;535幻灯片;AUC值范围为0.82 [95% CI 0.71 ~ 0.93] ~ 0.96[0.91 ~ 0.00],准确度范围为0.79[0.70 ~ 0.88]~ 0.95[0.90 ~ 1.00]。在原发性区域活检样本上训练的DeepGEM模型可以推广到淋巴结转移的活检样本,EGFR的AUC值为0.91 (95% CI为0.88 - 0.94),KRAS的AUC值为0.88 (95% CI为0.82 - 0.93),EGFR的准确率值为0.85 (0.80 - 0.88),KRAS的准确率为0.95(0.92 - 0.96),显示出靶向治疗的预后预测潜力。该模型生成空间基因突变图,显示基因突变的空间分布。我们开发了一种基于人工智能的方法,可以提供准确、及时、经济的基因突变和突变空间分布预测。该方法作为指导肺癌患者临床治疗的辅助工具显示出巨大的潜力。国家自然科学基金、广州市科技计划项目、国家重点研发计划项目。摘要的中文译文见补充资料部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Texan judge blocks US FDA changes on cigarette packaging India is boosting vaccination efforts to eradicate cervical cancer Tiragolumab in combination with atezolizumab and bevacizumab in patients with unresectable, locally advanced or metastatic hepatocellular carcinoma (MORPHEUS-Liver): a randomised, open-label, phase 1b–2, study Insights into the future of first-line advanced hepatocellular carcinoma treatment Thermal ablation versus surgical resection of small-size colorectal liver metastases (COLLISION): an international, randomised, controlled, phase 3 non-inferiority trial
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1