Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study

Yu Zhao, Shan Xiong, Qin Ren, Jun Wang, Min Li, Lin Yang, Di Wu, Kejing Tang, Xiaojie Pan, Fengxia Chen, Wenxiang Wang, Shi Jin, Xianling Liu, Gen Lin, Wenxiu Yao, Linbo Cai, Yi Yang, Jixian Liu, Jingxun Wu, Wenfan Fu, Wenhua Liang
{"title":"Deep learning using histological images for gene mutation prediction in lung cancer: a multicentre retrospective study","authors":"Yu Zhao, Shan Xiong, Qin Ren, Jun Wang, Min Li, Lin Yang, Di Wu, Kejing Tang, Xiaojie Pan, Fengxia Chen, Wenxiang Wang, Shi Jin, Xianling Liu, Gen Lin, Wenxiu Yao, Linbo Cai, Yi Yang, Jixian Liu, Jingxun Wu, Wenfan Fu, Wenhua Liang","doi":"10.1016/s1470-2045(24)00599-0","DOIUrl":null,"url":null,"abstract":"<h3>Background</h3>Accurate detection of driver gene mutations is crucial for treatment planning and predicting prognosis for patients with lung cancer. Conventional genomic testing requires high-quality tissue samples and is time-consuming and resource-consuming, and as a result, is not available for most patients, especially those in low-resource settings. We aimed to develop an annotation-free Deep learning-enabled artificial intelligence method to predict GEne Mutations (DeepGEM) from routinely acquired histological slides.<h3>Methods</h3>In this multicentre retrospective study, we collected data for patients with lung cancer who had a biopsy and multigene next-generation sequencing done at 16 hospitals in China (with no restrictions on age, sex, or histology type), to form a large multicentre dataset comprising paired pathological image and multiple gene mutation information. We also included patients from The Cancer Genome Atlas (TCGA) publicly available dataset. Our developed model is an instance-level and bag-level co-supervised multiple instance learning method with label disambiguation design. We trained and initially tested the DeepGEM model on the internal dataset (patients from the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China), and further evaluated it on the external dataset (patients from the remaining 15 centres) and the public TCGA dataset. Additionally, a dataset of patients from the same medical centre as the internal dataset, but without overlap, was used to evaluate the model's generalisation ability to biopsy samples from lymph node metastases. The primary objective was the performance of the DeepGEM model in predicting gene mutations (area under the curve [AUC] and accuracy) in the four prespecified groups (ie, the hold-out internal test set, multicentre external test set, TCGA set, and lymph node metastases set).<h3>Findings</h3>Assessable pathological images and multigene testing information were available for 3697 patients who had biopsy and multigene next-generation sequencing done between Jan 1, 2018, and March 31, 2022, at the 16 centres. We excluded 60 patients with low-quality images. We included 3767 images from 3637 consecutive patients (1978 [54·4%] men, 1514 [41·6%] women, 145 [4·0%] unknown; median age 60 years [IQR 52–67]), with 1716 patients in the internal dataset, 1718 patients in the external dataset, and 203 patients in the lymph node metastases dataset. The DeepGEM model showed robust performance in the internal dataset: for excisional biopsy samples, AUC values for gene mutation prediction ranged from 0·90 (95% CI 0·77–1·00) to 0·97 (0·93–1·00) and accuracy values ranged from 0·91 (0·85–0·98) to 0·97 (0·93–1·00); for aspiration biopsy samples, AUC values ranged from 0·85 (0·80–0·91) to 0·95 (0·86–1·00) and accuracy values ranged from 0·79 (0·74–0·85) to 0·99 (0·98–1·00). In the multicentre external dataset, for excisional biopsy samples, AUC values ranged from 0·80 (95% CI 0·75–0·85) to 0·91 (0·88–1·00) and accuracy values ranged from 0·79 (0·76–0·82) to 0·95 (0·93–0·96); for aspiration biopsy samples, AUC values ranged from 0·76 (0·70–0·83) to 0·87 (0·80–0·94) and accuracy values ranged from 0·76 (0·74–0·79) to 0·97 (0·96–0·98). The model also showed strong performance on the TCGA dataset (473 patients; 535 slides; AUC values ranged from 0·82 [95% CI 0·71–0·93] to 0·96 [0·91–1·00], accuracy values ranged from 0·79 [0·70–0·88] to 0·95 [0·90–1·00]). The DeepGEM model, trained on primary region biopsy samples, could be generalised to biopsy samples from lymph node metastases, with AUC values of 0·91 (95% CI 0·88–0·94) for <em>EGFR</em> and 0·88 (0·82–0·93) for <em>KRAS</em> and accuracy values of 0·85 (0·80–0·88) for <em>EGFR</em> and 0·95 (0·92–0·96) for <em>KRAS</em> and showed potential for prognostic prediction of targeted therapy. The model generated spatial gene mutation maps, indicating gene mutation spatial distribution.<h3>Interpretation</h3>We developed an AI-based method that can provide an accurate, timely, and economical prediction of gene mutation and mutation spatial distribution. The method showed substantial potential as an assistive tool for guiding the clinical treatment of patients with lung cancer.<h3>Funding</h3>National Natural Science Foundation of China, the Science and Technology Planning Project of Guangzhou, and the National Key Research and Development Program of China.<h3>Translation</h3>For the Chinese translation of the abstract see Supplementary Materials section.","PeriodicalId":22865,"journal":{"name":"The Lancet Oncology","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Lancet Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/s1470-2045(24)00599-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Accurate detection of driver gene mutations is crucial for treatment planning and predicting prognosis for patients with lung cancer. Conventional genomic testing requires high-quality tissue samples and is time-consuming and resource-consuming, and as a result, is not available for most patients, especially those in low-resource settings. We aimed to develop an annotation-free Deep learning-enabled artificial intelligence method to predict GEne Mutations (DeepGEM) from routinely acquired histological slides.

Methods

In this multicentre retrospective study, we collected data for patients with lung cancer who had a biopsy and multigene next-generation sequencing done at 16 hospitals in China (with no restrictions on age, sex, or histology type), to form a large multicentre dataset comprising paired pathological image and multiple gene mutation information. We also included patients from The Cancer Genome Atlas (TCGA) publicly available dataset. Our developed model is an instance-level and bag-level co-supervised multiple instance learning method with label disambiguation design. We trained and initially tested the DeepGEM model on the internal dataset (patients from the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China), and further evaluated it on the external dataset (patients from the remaining 15 centres) and the public TCGA dataset. Additionally, a dataset of patients from the same medical centre as the internal dataset, but without overlap, was used to evaluate the model's generalisation ability to biopsy samples from lymph node metastases. The primary objective was the performance of the DeepGEM model in predicting gene mutations (area under the curve [AUC] and accuracy) in the four prespecified groups (ie, the hold-out internal test set, multicentre external test set, TCGA set, and lymph node metastases set).

Findings

Assessable pathological images and multigene testing information were available for 3697 patients who had biopsy and multigene next-generation sequencing done between Jan 1, 2018, and March 31, 2022, at the 16 centres. We excluded 60 patients with low-quality images. We included 3767 images from 3637 consecutive patients (1978 [54·4%] men, 1514 [41·6%] women, 145 [4·0%] unknown; median age 60 years [IQR 52–67]), with 1716 patients in the internal dataset, 1718 patients in the external dataset, and 203 patients in the lymph node metastases dataset. The DeepGEM model showed robust performance in the internal dataset: for excisional biopsy samples, AUC values for gene mutation prediction ranged from 0·90 (95% CI 0·77–1·00) to 0·97 (0·93–1·00) and accuracy values ranged from 0·91 (0·85–0·98) to 0·97 (0·93–1·00); for aspiration biopsy samples, AUC values ranged from 0·85 (0·80–0·91) to 0·95 (0·86–1·00) and accuracy values ranged from 0·79 (0·74–0·85) to 0·99 (0·98–1·00). In the multicentre external dataset, for excisional biopsy samples, AUC values ranged from 0·80 (95% CI 0·75–0·85) to 0·91 (0·88–1·00) and accuracy values ranged from 0·79 (0·76–0·82) to 0·95 (0·93–0·96); for aspiration biopsy samples, AUC values ranged from 0·76 (0·70–0·83) to 0·87 (0·80–0·94) and accuracy values ranged from 0·76 (0·74–0·79) to 0·97 (0·96–0·98). The model also showed strong performance on the TCGA dataset (473 patients; 535 slides; AUC values ranged from 0·82 [95% CI 0·71–0·93] to 0·96 [0·91–1·00], accuracy values ranged from 0·79 [0·70–0·88] to 0·95 [0·90–1·00]). The DeepGEM model, trained on primary region biopsy samples, could be generalised to biopsy samples from lymph node metastases, with AUC values of 0·91 (95% CI 0·88–0·94) for EGFR and 0·88 (0·82–0·93) for KRAS and accuracy values of 0·85 (0·80–0·88) for EGFR and 0·95 (0·92–0·96) for KRAS and showed potential for prognostic prediction of targeted therapy. The model generated spatial gene mutation maps, indicating gene mutation spatial distribution.

Interpretation

We developed an AI-based method that can provide an accurate, timely, and economical prediction of gene mutation and mutation spatial distribution. The method showed substantial potential as an assistive tool for guiding the clinical treatment of patients with lung cancer.

Funding

National Natural Science Foundation of China, the Science and Technology Planning Project of Guangzhou, and the National Key Research and Development Program of China.

Translation

For the Chinese translation of the abstract see Supplementary Materials section.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pesticide exposure and increased risk of breast cancer for women in rural Brazil Correction to Lancet Oncol 2024; published online Dec 12. https://doi.org/10.1016/S1470-2045(24)00719-8 Integrating cancer into crisis: a global vision for action from WHO and partners The changing global landscape of national cancer control plans The landscape of primary mismatch repair deficient gliomas in children, adolescents, and young adults: a multi-cohort study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1