V. Makarov, D. Kaidarova, S. Yessentayeva, J. Kalmatayeva, М. Мansurova, N. Каdyrbek, R. Kadyrbayeva, S. Оlzhayev, I. Novikov
{"title":"USING MACHINE LEARNING ALGORITHMS TO DEVELOP A MODEL FOR PREDICTING THE SURVIVAL OF LUNG CANCER PATIENTS IN THE REPUBLIC OF KAZAKHSTAN","authors":"V. Makarov, D. Kaidarova, S. Yessentayeva, J. Kalmatayeva, М. Мansurova, N. Каdyrbek, R. Kadyrbayeva, S. Оlzhayev, I. Novikov","doi":"10.52532/2663-4864-2022-3-65-4-11","DOIUrl":null,"url":null,"abstract":"Relevance: The 5-year overall survival rate(s) in NSCLC p-stage IA is 73%, and the recurrence rate in radically treated patients is \nalmost 10%. \nThe study aimed to evaluate the prognostic significance of several clinical and morphological factors and apply machine learning \nalgorithms to predict the results of the overall survival of patients with lung cancer. \nMethods: The forms 030-6/y C34 – lung cancer (n=19,379) from the EROB database for 2014-2018 were analyzed, and the impact of \nrisk factors on overall survival was assessed using the Kaplan-Meier method. Accordingly, the training data set for constructing forecasting \nmodels included 19,379 observations and 15 factors. The machine learning algorithms such as Random Forest Classifier, Gradient \nBoosting Classifier, Logistic Regression Model, Decision Tree Classifier, and K Nearest Neighbors (KNN) Classifier were implemented \nin the Python programming language. The results were evaluated by constructing an error matrix and calculating classification metrics: \nthe proportion of correctly classified objects (accuracy) during training and validation (validation), accuracy (precision), completeness \n(recall), Kappa-Cohen. \nResults: In our study, 19,379 patients were analyzed, including 15,494 men (79.95%) and 3,885 women (20.04%). At the time of the \nstudy, 6,171 men (39.8%) and 1,962 women (49.5%) were alive. Median survival was 8.3 months (SE – 0.154 months, 95% CI – 7.96-8.56) \nin men and 15.43 months (SE – 1.0 months, 95% CI – 13.497-17.363) in women. At diagnosis, 1,037 patients (5.35%) had stage I disease, \nand 4,145 (21.38%) had stage II. Most patients (61.4%) had advanced stage NSCLC: 9,189 people (47.4%) were diagnosed with stage III, \nand 4,655 (24%) – with stage IV. The reliability of differences in median survival (χ2=3991.6, p=0.00) indicated the prognostic significance \nof the tumor process stage and its influence on the patient’s survival. Also, the revealed significant difference in the median survival of \npatients with various morphological forms of lung cancer suggests the prognostic significance of the morphological factor (the difference \nbetween those indicators was statistically significant, χ2=623.4 p=0.000). \nConclusion: Machine learning models can predict the risk of fatal outcomes for patients after surgical treatment and registration in \nthe EROB database. The creation of patient-oriented systems to support medical decision-making makes it possible to choose the optimal \nstrategies for adjuvant therapy, dispensary observation, and frequency of diagnostic studies.","PeriodicalId":19480,"journal":{"name":"Oncologia i radiologia Kazakhstana","volume":"71 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oncologia i radiologia Kazakhstana","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52532/2663-4864-2022-3-65-4-11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Relevance: The 5-year overall survival rate(s) in NSCLC p-stage IA is 73%, and the recurrence rate in radically treated patients is
almost 10%.
The study aimed to evaluate the prognostic significance of several clinical and morphological factors and apply machine learning
algorithms to predict the results of the overall survival of patients with lung cancer.
Methods: The forms 030-6/y C34 – lung cancer (n=19,379) from the EROB database for 2014-2018 were analyzed, and the impact of
risk factors on overall survival was assessed using the Kaplan-Meier method. Accordingly, the training data set for constructing forecasting
models included 19,379 observations and 15 factors. The machine learning algorithms such as Random Forest Classifier, Gradient
Boosting Classifier, Logistic Regression Model, Decision Tree Classifier, and K Nearest Neighbors (KNN) Classifier were implemented
in the Python programming language. The results were evaluated by constructing an error matrix and calculating classification metrics:
the proportion of correctly classified objects (accuracy) during training and validation (validation), accuracy (precision), completeness
(recall), Kappa-Cohen.
Results: In our study, 19,379 patients were analyzed, including 15,494 men (79.95%) and 3,885 women (20.04%). At the time of the
study, 6,171 men (39.8%) and 1,962 women (49.5%) were alive. Median survival was 8.3 months (SE – 0.154 months, 95% CI – 7.96-8.56)
in men and 15.43 months (SE – 1.0 months, 95% CI – 13.497-17.363) in women. At diagnosis, 1,037 patients (5.35%) had stage I disease,
and 4,145 (21.38%) had stage II. Most patients (61.4%) had advanced stage NSCLC: 9,189 people (47.4%) were diagnosed with stage III,
and 4,655 (24%) – with stage IV. The reliability of differences in median survival (χ2=3991.6, p=0.00) indicated the prognostic significance
of the tumor process stage and its influence on the patient’s survival. Also, the revealed significant difference in the median survival of
patients with various morphological forms of lung cancer suggests the prognostic significance of the morphological factor (the difference
between those indicators was statistically significant, χ2=623.4 p=0.000).
Conclusion: Machine learning models can predict the risk of fatal outcomes for patients after surgical treatment and registration in
the EROB database. The creation of patient-oriented systems to support medical decision-making makes it possible to choose the optimal
strategies for adjuvant therapy, dispensary observation, and frequency of diagnostic studies.