Lina Zhou, Chenkai Mao, Tingting Fu, Xiao Ding, Luca Bertolaccini, Ao Liu, Junjun Zhang, Shicheng Li
{"title":"Development of an AI model for predicting hypoxia status and prognosis in non-small cell lung cancer using multi-modal data.","authors":"Lina Zhou, Chenkai Mao, Tingting Fu, Xiao Ding, Luca Bertolaccini, Ao Liu, Junjun Zhang, Shicheng Li","doi":"10.21037/tlcr-24-982","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Prognosis prediction is crucial for non-small cell lung cancer (NSCLC) treatment planning. While tumor hypoxia significantly impacts patient outcomes, identifying hypoxic genomic markers remains challenging. This study sought to identify hypoxic computed tomography (CT) radiomic features and create an artificial intelligence (AI) model for NSCLC through the integration of multi-modal data.</p><p><strong>Methods: </strong>In total, 452 NSCLC patients were enrolled in this study, including patients from The Second Affiliated Hospital of Soochow University (SC, n=112), The Cancer Genome Atlas (TCGA)-NSCLC dataset (n=74), the radiogenomics dataset (n=130), and the Gene Expression Omnibus (GEO) datasets (GSE19188: n=82, and GSE87340: n=54). Hypoxia status was classified using optimized cut-off values of hypoxia enrichment scores, which were calculated through single-sample gene set enrichment analysis (ssGSEA) of hypoxic genes. Radiomic features were extracted using three-dimensional (3D)-Slicer software. The least absolute shrinkage and selection operator (LASSO) algorithm was used to identify hypoxic CT radiomic features. A model named ssuBERT (semantic structured unit embedded in Bidirectional Encoder Representations from Transformers) was developed to analyze electronic health records (EHRs). An AI model for overall survival prediction was constructed by integrating CT radiomic features, ssuBERT features, and clinical data, and evaluated using five-fold cross-validation.</p><p><strong>Results: </strong>Higher hypoxia levels were correlated with worse survival outcomes. Twenty-eight radiomic features showed significant discriminatory power in detecting hypoxia status with an area under the curve (AUC) of 0.8295. The ssuBERT model achieved a weighted accuracy of 0.945 in recognizing semantic structured units in EHRs. The EHR model exhibited superior predictive performance among the single-modal models with an AUC of 0.7662. However, the multi-modal AI model had the highest average AUC of 0.8449 and an F1 score of 0.7557.</p><p><strong>Conclusions: </strong>The AI model demonstrated potential in predicting NSCLC patient prognosis through multi-modal data integration, warranting further validation.</p>","PeriodicalId":23271,"journal":{"name":"Translational lung cancer research","volume":"13 12","pages":"3642-3656"},"PeriodicalIF":4.0000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11736583/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational lung cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/tlcr-24-982","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Prognosis prediction is crucial for non-small cell lung cancer (NSCLC) treatment planning. While tumor hypoxia significantly impacts patient outcomes, identifying hypoxic genomic markers remains challenging. This study sought to identify hypoxic computed tomography (CT) radiomic features and create an artificial intelligence (AI) model for NSCLC through the integration of multi-modal data.
Methods: In total, 452 NSCLC patients were enrolled in this study, including patients from The Second Affiliated Hospital of Soochow University (SC, n=112), The Cancer Genome Atlas (TCGA)-NSCLC dataset (n=74), the radiogenomics dataset (n=130), and the Gene Expression Omnibus (GEO) datasets (GSE19188: n=82, and GSE87340: n=54). Hypoxia status was classified using optimized cut-off values of hypoxia enrichment scores, which were calculated through single-sample gene set enrichment analysis (ssGSEA) of hypoxic genes. Radiomic features were extracted using three-dimensional (3D)-Slicer software. The least absolute shrinkage and selection operator (LASSO) algorithm was used to identify hypoxic CT radiomic features. A model named ssuBERT (semantic structured unit embedded in Bidirectional Encoder Representations from Transformers) was developed to analyze electronic health records (EHRs). An AI model for overall survival prediction was constructed by integrating CT radiomic features, ssuBERT features, and clinical data, and evaluated using five-fold cross-validation.
Results: Higher hypoxia levels were correlated with worse survival outcomes. Twenty-eight radiomic features showed significant discriminatory power in detecting hypoxia status with an area under the curve (AUC) of 0.8295. The ssuBERT model achieved a weighted accuracy of 0.945 in recognizing semantic structured units in EHRs. The EHR model exhibited superior predictive performance among the single-modal models with an AUC of 0.7662. However, the multi-modal AI model had the highest average AUC of 0.8449 and an F1 score of 0.7557.
Conclusions: The AI model demonstrated potential in predicting NSCLC patient prognosis through multi-modal data integration, warranting further validation.
期刊介绍:
Translational Lung Cancer Research(TLCR, Transl Lung Cancer Res, Print ISSN 2218-6751; Online ISSN 2226-4477) is an international, peer-reviewed, open-access journal, which was founded in March 2012. TLCR is indexed by PubMed/PubMed Central and the Chemical Abstracts Service (CAS) Databases. It is published quarterly the first year, and published bimonthly since February 2013. It provides practical up-to-date information on prevention, early detection, diagnosis, and treatment of lung cancer. Specific areas of its interest include, but not limited to, multimodality therapy, markers, imaging, tumor biology, pathology, chemoprevention, and technical advances related to lung cancer.