A retrospective study differentiating nontuberculous mycobacterial pulmonary disease from pulmonary tuberculosis on computed tomography using radiomics and machine learning algorithms.
{"title":"A retrospective study differentiating nontuberculous mycobacterial pulmonary disease from pulmonary tuberculosis on computed tomography using radiomics and machine learning algorithms.","authors":"Lihong Zhou,Yiwen Wang,Wenchao Zhu,Yafang Zhao,Yihang Yu,Qin Hu,Wenke Yu","doi":"10.1080/07853890.2024.2401613","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\r\nTo evaluate the effectiveness of a machine learning based on computed tomography (CT) radiomics to distinguish nontuberculous mycobacterial pulmonary disease (NTM-PD) from pulmonary tuberculosis (PTB).\r\n\r\nMETHODS\r\nIn this retrospective analysis, medical records of 99 individuals afflicted with NTM-PD and 285 individuals with PTB in Zhejiang Chinese and Western Medicine Integrated Hospital were examined. Random numbers generated by a computer were utilized to stratify the study cohort, with 80% designated as the training cohort and 20% as the validation cohort. A total of 2153 radiomics features were extracted using Python (Pyradiomics package) to analyse the CT characteristics of the large disease areas. The identification of significant factors was conducted through the least absolute shrinkage and selection operator (LASSO) regression. The following four supervised learning classifier models were developed: random forest (RF), support vector machine (SVM), logistic regression (LR), and extreme gradient boosting (XGBoost). For assessment and comparison of the predictive performance among these models, receiver-operating characteristic (ROC) curves and the areas under the ROC curves (AUCs) were employed.\r\n\r\nRESULTS\r\nThe Student's t-test, Levene test, and LASSO algorithm collectively selected 23 optimal features. ROC analysis was then conducted, with the respective AUC values of the XGBoost, LR, SVM, and RF models recorded to be 1, 0.9044, 0.8868, and 0.7982 in the training cohort. In the validation cohort, the respective AUC values of the XGBoost, LR, SVM, and RF models were 0.8358, 0.8085, 0.87739, and 0.7759. The DeLong test results noted the lack of remarkable variation across the models.\r\n\r\nCONCLUSION\r\nThe CT radiomics features can help distinguish between NTM-PD and PTB. Among the four classifiers, SVM showed a stable performance in effectively identifying these two diseases.","PeriodicalId":8371,"journal":{"name":"Annals of medicine","volume":null,"pages":null},"PeriodicalIF":4.9000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/07853890.2024.2401613","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
OBJECTIVE
To evaluate the effectiveness of a machine learning based on computed tomography (CT) radiomics to distinguish nontuberculous mycobacterial pulmonary disease (NTM-PD) from pulmonary tuberculosis (PTB).
METHODS
In this retrospective analysis, medical records of 99 individuals afflicted with NTM-PD and 285 individuals with PTB in Zhejiang Chinese and Western Medicine Integrated Hospital were examined. Random numbers generated by a computer were utilized to stratify the study cohort, with 80% designated as the training cohort and 20% as the validation cohort. A total of 2153 radiomics features were extracted using Python (Pyradiomics package) to analyse the CT characteristics of the large disease areas. The identification of significant factors was conducted through the least absolute shrinkage and selection operator (LASSO) regression. The following four supervised learning classifier models were developed: random forest (RF), support vector machine (SVM), logistic regression (LR), and extreme gradient boosting (XGBoost). For assessment and comparison of the predictive performance among these models, receiver-operating characteristic (ROC) curves and the areas under the ROC curves (AUCs) were employed.
RESULTS
The Student's t-test, Levene test, and LASSO algorithm collectively selected 23 optimal features. ROC analysis was then conducted, with the respective AUC values of the XGBoost, LR, SVM, and RF models recorded to be 1, 0.9044, 0.8868, and 0.7982 in the training cohort. In the validation cohort, the respective AUC values of the XGBoost, LR, SVM, and RF models were 0.8358, 0.8085, 0.87739, and 0.7759. The DeLong test results noted the lack of remarkable variation across the models.
CONCLUSION
The CT radiomics features can help distinguish between NTM-PD and PTB. Among the four classifiers, SVM showed a stable performance in effectively identifying these two diseases.
期刊介绍:
Annals of Medicine is one of the world’s leading general medical review journals, boasting an impact factor of 5.435. It presents high-quality topical review articles, commissioned by the Editors and Editorial Committee, as well as original articles. The journal provides the current opinion on recent developments across the major medical specialties, with a particular focus on internal medicine. The peer-reviewed content of the journal keeps readers updated on the latest advances in the understanding of the pathogenesis of diseases, and in how molecular medicine and genetics can be applied in daily clinical practice.