基于机器学习算法构建肺癌患者化疗后肺部感染风险预测模型

Tao Sun, Jun Liu, Houqin Yuan, Xin Li, Hui Yan
{"title":"基于机器学习算法构建肺癌患者化疗后肺部感染风险预测模型","authors":"Tao Sun, Jun Liu, Houqin Yuan, Xin Li, Hui Yan","doi":"10.3389/fonc.2024.1403392","DOIUrl":null,"url":null,"abstract":"The objective of this study was to create and validate a machine learning (ML)-based model for predicting the likelihood of lung infections following chemotherapy in patients with lung cancer.A retrospective study was conducted on a cohort of 502 lung cancer patients undergoing chemotherapy. Data on age, Body Mass Index (BMI), underlying disease, chemotherapy cycle, number of hospitalizations, and various blood test results were collected from medical records. We used the Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. Feature screening was performed using the Boruta algorithm and The Least Absolute Shrinkage and Selection Operator (LASSO). Subsequently, six ML algorithms, namely Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were employed to train and develop an ML model using a 10-fold cross-validation methodology. The model’s performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curves, clinical impact curve, and confusion matrix. In addition, model interpretation was performed by the Shapley Additive Explanations (SHAP) analysis to clarify the importance of each feature of the model and its decision basis. Finally, we constructed nomograms to make the predictive model results more readable.The integration of Boruta and LASSO methodologies identified Gender, Smoke, Drink, Chemotherapy cycles, pleural effusion (PE), Neutrophil-lymphocyte count ratio (NLR), Neutrophil-monocyte count ratio (NMR), Lymphocytes (LYM) and Neutrophil (NEUT) as significant predictors. The LR model demonstrated superior performance compared to alternative ML algorithms, achieving an accuracy of 81.80%, a sensitivity of 81.1%, a specificity of 82.5%, an F1 score of 81.6%, and an AUC of 0.888(95%CI(0.863-0.911)). Furthermore, the SHAP method identified Chemotherapy cycles and Smoke as the primary decision factors influencing the ML model’s predictions. Finally, this study successfully constructed interactive nomograms and dynamic nomograms.The ML algorithm, combining demographic and clinical factors, accurately predicted post-chemotherapy lung infections in cancer patients. The LR model performed well, potentially improving early detection and treatment in clinical practice.","PeriodicalId":507440,"journal":{"name":"Frontiers in Oncology","volume":"64 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm\",\"authors\":\"Tao Sun, Jun Liu, Houqin Yuan, Xin Li, Hui Yan\",\"doi\":\"10.3389/fonc.2024.1403392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this study was to create and validate a machine learning (ML)-based model for predicting the likelihood of lung infections following chemotherapy in patients with lung cancer.A retrospective study was conducted on a cohort of 502 lung cancer patients undergoing chemotherapy. Data on age, Body Mass Index (BMI), underlying disease, chemotherapy cycle, number of hospitalizations, and various blood test results were collected from medical records. We used the Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. Feature screening was performed using the Boruta algorithm and The Least Absolute Shrinkage and Selection Operator (LASSO). Subsequently, six ML algorithms, namely Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were employed to train and develop an ML model using a 10-fold cross-validation methodology. The model’s performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curves, clinical impact curve, and confusion matrix. In addition, model interpretation was performed by the Shapley Additive Explanations (SHAP) analysis to clarify the importance of each feature of the model and its decision basis. Finally, we constructed nomograms to make the predictive model results more readable.The integration of Boruta and LASSO methodologies identified Gender, Smoke, Drink, Chemotherapy cycles, pleural effusion (PE), Neutrophil-lymphocyte count ratio (NLR), Neutrophil-monocyte count ratio (NMR), Lymphocytes (LYM) and Neutrophil (NEUT) as significant predictors. The LR model demonstrated superior performance compared to alternative ML algorithms, achieving an accuracy of 81.80%, a sensitivity of 81.1%, a specificity of 82.5%, an F1 score of 81.6%, and an AUC of 0.888(95%CI(0.863-0.911)). Furthermore, the SHAP method identified Chemotherapy cycles and Smoke as the primary decision factors influencing the ML model’s predictions. Finally, this study successfully constructed interactive nomograms and dynamic nomograms.The ML algorithm, combining demographic and clinical factors, accurately predicted post-chemotherapy lung infections in cancer patients. The LR model performed well, potentially improving early detection and treatment in clinical practice.\",\"PeriodicalId\":507440,\"journal\":{\"name\":\"Frontiers in Oncology\",\"volume\":\"64 6\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fonc.2024.1403392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fonc.2024.1403392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究的目的是创建并验证一个基于机器学习(ML)的模型,用于预测肺癌患者化疗后肺部感染的可能性。我们从病历中收集了有关年龄、体重指数(BMI)、基础疾病、化疗周期、住院次数和各种血液检测结果的数据。我们使用合成少数群体过度取样技术(SMOTE)来处理不平衡数据。我们使用 Boruta 算法和最小绝对收缩和选择操作器(LASSO)进行了特征筛选。随后,采用六种 ML 算法,即逻辑回归 (LR)、随机森林 (RF)、高斯直觉贝叶斯 (GNB)、多层感知器 (MLP)、支持向量机 (SVM) 和 K-Nearest Neighbors (KNN),使用 10 倍交叉验证方法来训练和开发 ML 模型。模型的性能通过各种指标进行评估,包括接收者操作特征曲线下面积(ROC)、准确性、灵敏度、特异性、F1 分数、校准曲线、决策曲线、临床影响曲线和混淆矩阵。此外,我们还通过夏普利相加解释(SHAP)分析法对模型进行了解释,以明确模型每个特征的重要性及其决策依据。通过整合 Boruta 和 LASSO 方法,我们发现性别、吸烟、饮酒、化疗周期、胸腔积液(PE)、中性粒细胞-淋巴细胞计数比(NLR)、中性粒细胞-单核细胞计数比(NMR)、淋巴细胞(LYM)和中性粒细胞(NEUT)是重要的预测因子。与其他 ML 算法相比,LR 模型表现出更优越的性能,准确率为 81.80%,灵敏度为 81.1%,特异性为 82.5%,F1 得分为 81.6%,AUC 为 0.888(95%CI(0.863-0.911))。此外,SHAP 方法还发现化疗周期和烟雾是影响 ML 模型预测结果的主要决策因素。最后,本研究成功构建了交互式提名图和动态提名图。ML算法结合人口统计学和临床因素,准确预测了癌症患者化疗后的肺部感染。LR模型表现良好,有望改善临床实践中的早期检测和治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm
The objective of this study was to create and validate a machine learning (ML)-based model for predicting the likelihood of lung infections following chemotherapy in patients with lung cancer.A retrospective study was conducted on a cohort of 502 lung cancer patients undergoing chemotherapy. Data on age, Body Mass Index (BMI), underlying disease, chemotherapy cycle, number of hospitalizations, and various blood test results were collected from medical records. We used the Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. Feature screening was performed using the Boruta algorithm and The Least Absolute Shrinkage and Selection Operator (LASSO). Subsequently, six ML algorithms, namely Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were employed to train and develop an ML model using a 10-fold cross-validation methodology. The model’s performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curves, clinical impact curve, and confusion matrix. In addition, model interpretation was performed by the Shapley Additive Explanations (SHAP) analysis to clarify the importance of each feature of the model and its decision basis. Finally, we constructed nomograms to make the predictive model results more readable.The integration of Boruta and LASSO methodologies identified Gender, Smoke, Drink, Chemotherapy cycles, pleural effusion (PE), Neutrophil-lymphocyte count ratio (NLR), Neutrophil-monocyte count ratio (NMR), Lymphocytes (LYM) and Neutrophil (NEUT) as significant predictors. The LR model demonstrated superior performance compared to alternative ML algorithms, achieving an accuracy of 81.80%, a sensitivity of 81.1%, a specificity of 82.5%, an F1 score of 81.6%, and an AUC of 0.888(95%CI(0.863-0.911)). Furthermore, the SHAP method identified Chemotherapy cycles and Smoke as the primary decision factors influencing the ML model’s predictions. Finally, this study successfully constructed interactive nomograms and dynamic nomograms.The ML algorithm, combining demographic and clinical factors, accurately predicted post-chemotherapy lung infections in cancer patients. The LR model performed well, potentially improving early detection and treatment in clinical practice.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clinical experience with adaptive MRI-guided pancreatic SBRT and the use of abdominal compression to reduce treatment volume Frequency of pathogenic germline variants in pediatric medulloblastoma survivors Long segment ureterectomy with tapered demucosalized ileum replacement of ureter for ureteral cancer: a case report and literature review Application of λ esophagojejunostomy in total gastrectomy under laparoscopy: a modified technique for post-gastrectomy reconstruction Treatment accuracy of standard linear accelerator-based prostate SBRT: the delivered dose assessment of patients treated within two major clinical trials using an in-house position monitoring system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1