Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm

Frontiers in Oncology Pub Date : 2024-08-09 DOI:10.3389/fonc.2024.1403392

Tao Sun, Jun Liu, Houqin Yuan, Xin Li, Hui Yan

{"title":"Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm","authors":"Tao Sun, Jun Liu, Houqin Yuan, Xin Li, Hui Yan","doi":"10.3389/fonc.2024.1403392","DOIUrl":null,"url":null,"abstract":"The objective of this study was to create and validate a machine learning (ML)-based model for predicting the likelihood of lung infections following chemotherapy in patients with lung cancer.A retrospective study was conducted on a cohort of 502 lung cancer patients undergoing chemotherapy. Data on age, Body Mass Index (BMI), underlying disease, chemotherapy cycle, number of hospitalizations, and various blood test results were collected from medical records. We used the Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. Feature screening was performed using the Boruta algorithm and The Least Absolute Shrinkage and Selection Operator (LASSO). Subsequently, six ML algorithms, namely Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were employed to train and develop an ML model using a 10-fold cross-validation methodology. The model’s performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curves, clinical impact curve, and confusion matrix. In addition, model interpretation was performed by the Shapley Additive Explanations (SHAP) analysis to clarify the importance of each feature of the model and its decision basis. Finally, we constructed nomograms to make the predictive model results more readable.The integration of Boruta and LASSO methodologies identified Gender, Smoke, Drink, Chemotherapy cycles, pleural effusion (PE), Neutrophil-lymphocyte count ratio (NLR), Neutrophil-monocyte count ratio (NMR), Lymphocytes (LYM) and Neutrophil (NEUT) as significant predictors. The LR model demonstrated superior performance compared to alternative ML algorithms, achieving an accuracy of 81.80%, a sensitivity of 81.1%, a specificity of 82.5%, an F1 score of 81.6%, and an AUC of 0.888(95%CI(0.863-0.911)). Furthermore, the SHAP method identified Chemotherapy cycles and Smoke as the primary decision factors influencing the ML model’s predictions. Finally, this study successfully constructed interactive nomograms and dynamic nomograms.The ML algorithm, combining demographic and clinical factors, accurately predicted post-chemotherapy lung infections in cancer patients. The LR model performed well, potentially improving early detection and treatment in clinical practice.","PeriodicalId":507440,"journal":{"name":"Frontiers in Oncology","volume":"64 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fonc.2024.1403392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The objective of this study was to create and validate a machine learning (ML)-based model for predicting the likelihood of lung infections following chemotherapy in patients with lung cancer.A retrospective study was conducted on a cohort of 502 lung cancer patients undergoing chemotherapy. Data on age, Body Mass Index (BMI), underlying disease, chemotherapy cycle, number of hospitalizations, and various blood test results were collected from medical records. We used the Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. Feature screening was performed using the Boruta algorithm and The Least Absolute Shrinkage and Selection Operator (LASSO). Subsequently, six ML algorithms, namely Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were employed to train and develop an ML model using a 10-fold cross-validation methodology. The model’s performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curves, clinical impact curve, and confusion matrix. In addition, model interpretation was performed by the Shapley Additive Explanations (SHAP) analysis to clarify the importance of each feature of the model and its decision basis. Finally, we constructed nomograms to make the predictive model results more readable.The integration of Boruta and LASSO methodologies identified Gender, Smoke, Drink, Chemotherapy cycles, pleural effusion (PE), Neutrophil-lymphocyte count ratio (NLR), Neutrophil-monocyte count ratio (NMR), Lymphocytes (LYM) and Neutrophil (NEUT) as significant predictors. The LR model demonstrated superior performance compared to alternative ML algorithms, achieving an accuracy of 81.80%, a sensitivity of 81.1%, a specificity of 82.5%, an F1 score of 81.6%, and an AUC of 0.888(95%CI(0.863-0.911)). Furthermore, the SHAP method identified Chemotherapy cycles and Smoke as the primary decision factors influencing the ML model’s predictions. Finally, this study successfully constructed interactive nomograms and dynamic nomograms.The ML algorithm, combining demographic and clinical factors, accurately predicted post-chemotherapy lung infections in cancer patients. The LR model performed well, potentially improving early detection and treatment in clinical practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习算法构建肺癌患者化疗后肺部感染风险预测模型

本研究的目的是创建并验证一个基于机器学习（ML）的模型，用于预测肺癌患者化疗后肺部感染的可能性。我们从病历中收集了有关年龄、体重指数（BMI）、基础疾病、化疗周期、住院次数和各种血液检测结果的数据。我们使用合成少数群体过度取样技术（SMOTE）来处理不平衡数据。我们使用 Boruta 算法和最小绝对收缩和选择操作器（LASSO）进行了特征筛选。随后，采用六种 ML 算法，即逻辑回归 (LR)、随机森林 (RF)、高斯直觉贝叶斯 (GNB)、多层感知器 (MLP)、支持向量机 (SVM) 和 K-Nearest Neighbors (KNN)，使用 10 倍交叉验证方法来训练和开发 ML 模型。模型的性能通过各种指标进行评估，包括接收者操作特征曲线下面积（ROC）、准确性、灵敏度、特异性、F1 分数、校准曲线、决策曲线、临床影响曲线和混淆矩阵。此外，我们还通过夏普利相加解释（SHAP）分析法对模型进行了解释，以明确模型每个特征的重要性及其决策依据。通过整合 Boruta 和 LASSO 方法，我们发现性别、吸烟、饮酒、化疗周期、胸腔积液（PE）、中性粒细胞-淋巴细胞计数比（NLR）、中性粒细胞-单核细胞计数比（NMR）、淋巴细胞（LYM）和中性粒细胞（NEUT）是重要的预测因子。与其他 ML 算法相比，LR 模型表现出更优越的性能，准确率为 81.80%，灵敏度为 81.1%，特异性为 82.5%，F1 得分为 81.6%，AUC 为 0.888（95%CI(0.863-0.911)）。此外，SHAP 方法还发现化疗周期和烟雾是影响 ML 模型预测结果的主要决策因素。最后，本研究成功构建了交互式提名图和动态提名图。ML算法结合人口统计学和临床因素，准确预测了癌症患者化疗后的肺部感染。LR模型表现良好，有望改善临床实践中的早期检测和治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in Oncology

自引率

0.00%

发文量