MultiThal-classifier, a machine learning-based multi-class model for thalassemia diagnosis and classification

IF 3.2 3区 医学 Q2 MEDICAL LABORATORY TECHNOLOGY Clinica Chimica Acta Pub Date : 2024-11-07 DOI:10.1016/j.cca.2024.120025
WenQiang Wang, RenQing Ye, BaoJia Tang, YuYing Qi
{"title":"MultiThal-classifier, a machine learning-based multi-class model for thalassemia diagnosis and classification","authors":"WenQiang Wang,&nbsp;RenQing Ye,&nbsp;BaoJia Tang,&nbsp;YuYing Qi","doi":"10.1016/j.cca.2024.120025","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>The differential diagnosis between iron deficiency anemia (IDA) and thalassemia trait (TT) remains a significant clinical challenge. This study aimed to develop a machine learning-based multi-class model to differentiate among Microcytic-TT(TT with low mean corpuscular volume), Normocytic-TT (TT with normal mean corpuscular volume), IDA, and healthy individuals.</div></div><div><h3>Methods</h3><div>A comprehensive dataset comprising 1,819 individuals was analyzed using six distinct machine learning algorithms. The eXtreme Gradient Boosting (XGBoost) algorithm was ultimately selected to construct the MultiThal-Classifier (M−THAL) model. SMOTENC (Synthetic Minority Over-sampling Technique for Nominal and Continuous features) was employed to address data imbalance. Model performance was evaluated using various metrics, and SHAP values were applied to interpret the model’s predictions.Additionally, external validation was conducted to assess the model’s robustness and generalizability.</div></div><div><h3>Results</h3><div>After performing 1000 bootstrap resamples of the test set, the average performance metrics of M−THAL and the 95 % confidence interval(CI) were as follows, sensitivity 90.27 % (95 % CI: 84.88–95.26), specificity 97.87 % (95% CI: 97.10–98.55), PPV 93.42 % (95 % CI: 89.34–96.48), NPV 97.82% (95 % CI: 97.00–98.53), F1-score 91.50 % (95% CI: 87.29–95.34), Youden’s index 88.15 % (95 % CI: 82.33–93.70), accuracy 97.06 % (95% CI: 96.06–97.99), and AUC 94.07 % (95 % CI: 91.17–96.84).Feature importance analysis identified mean corpuscular volume(MCV), mean corpuscular hemoglobin(MCH), red cell distribution width − standard deviation(RDW-SD), and hemoglobin (HGB) were identified as the most important features. External validation confirmed the model’s robustness and generalizability.</div></div><div><h3>Conclusion</h3><div>The M−THAL effectively distinguishes Normocytic-TT, Microcytic-TT, IDA, and healthy individuals using hematological parameters, offers a rapid and cost-effective screening tool that can be readily implemented in diverse healthcare settings.</div></div>","PeriodicalId":10205,"journal":{"name":"Clinica Chimica Acta","volume":"567 ","pages":"Article 120025"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinica Chimica Acta","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0009898124022782","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

The differential diagnosis between iron deficiency anemia (IDA) and thalassemia trait (TT) remains a significant clinical challenge. This study aimed to develop a machine learning-based multi-class model to differentiate among Microcytic-TT(TT with low mean corpuscular volume), Normocytic-TT (TT with normal mean corpuscular volume), IDA, and healthy individuals.

Methods

A comprehensive dataset comprising 1,819 individuals was analyzed using six distinct machine learning algorithms. The eXtreme Gradient Boosting (XGBoost) algorithm was ultimately selected to construct the MultiThal-Classifier (M−THAL) model. SMOTENC (Synthetic Minority Over-sampling Technique for Nominal and Continuous features) was employed to address data imbalance. Model performance was evaluated using various metrics, and SHAP values were applied to interpret the model’s predictions.Additionally, external validation was conducted to assess the model’s robustness and generalizability.

Results

After performing 1000 bootstrap resamples of the test set, the average performance metrics of M−THAL and the 95 % confidence interval(CI) were as follows, sensitivity 90.27 % (95 % CI: 84.88–95.26), specificity 97.87 % (95% CI: 97.10–98.55), PPV 93.42 % (95 % CI: 89.34–96.48), NPV 97.82% (95 % CI: 97.00–98.53), F1-score 91.50 % (95% CI: 87.29–95.34), Youden’s index 88.15 % (95 % CI: 82.33–93.70), accuracy 97.06 % (95% CI: 96.06–97.99), and AUC 94.07 % (95 % CI: 91.17–96.84).Feature importance analysis identified mean corpuscular volume(MCV), mean corpuscular hemoglobin(MCH), red cell distribution width − standard deviation(RDW-SD), and hemoglobin (HGB) were identified as the most important features. External validation confirmed the model’s robustness and generalizability.

Conclusion

The M−THAL effectively distinguishes Normocytic-TT, Microcytic-TT, IDA, and healthy individuals using hematological parameters, offers a rapid and cost-effective screening tool that can be readily implemented in diverse healthcare settings.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MultiThal-classifier,一种基于机器学习的地中海贫血诊断和分类多类模型。
背景:缺铁性贫血(IDA)和地中海贫血特质(TT)之间的鉴别诊断仍然是一项重大的临床挑战。本研究旨在开发一种基于机器学习的多类模型,以区分小红细胞型地中海贫血(平均血红蛋白量低的地中海贫血)、正常红细胞型地中海贫血(平均血红蛋白量正常的地中海贫血)、缺铁性贫血和健康人:使用六种不同的机器学习算法分析了由 1819 人组成的综合数据集。最终选择了梯度提升(XGBoost)算法来构建多通道分类器(M-THAL)模型。为解决数据不平衡问题,采用了 SMOTENC(用于标称和连续特征的合成少数群体过度采样技术)。此外,还进行了外部验证,以评估模型的鲁棒性和普适性:在对测试集进行 1000 次引导重采样后,M-THAL 的平均性能指标和 95 % 置信区间(CI)如下:灵敏度 90.27 %(95 % CI:84.88-95.26)、特异性 97.87 %(95 % CI:97.10-98.55)、PPV 93.42 %(95 % CI:89.34-96.48)、NPV 97.82 %(95 % CI:97.00-98.53)、F1-分数 91.50 %(95 % CI:87.29-95.34)、尤登指数(Youden index)97.特征重要性分析表明,平均血球容积(MCV)、平均血球血红蛋白(MCH)、红细胞分布宽度-标准偏差(RDW-SD)和血红蛋白(HGB)是最重要的特征。外部验证证实了该模型的稳健性和普适性:M-THAL利用血液学参数有效区分了正常红细胞-TT、小红细胞-TT、IDA和健康人,是一种快速、经济有效的筛查工具,可在不同的医疗环境中随时使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Clinica Chimica Acta
Clinica Chimica Acta 医学-医学实验技术
CiteScore
10.10
自引率
2.00%
发文量
1268
审稿时长
23 days
期刊介绍: The Official Journal of the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) Clinica Chimica Acta is a high-quality journal which publishes original Research Communications in the field of clinical chemistry and laboratory medicine, defined as the diagnostic application of chemistry, biochemistry, immunochemistry, biochemical aspects of hematology, toxicology, and molecular biology to the study of human disease in body fluids and cells. The objective of the journal is to publish novel information leading to a better understanding of biological mechanisms of human diseases, their prevention, diagnosis, and patient management. Reports of an applied clinical character are also welcome. Papers concerned with normal metabolic processes or with constituents of normal cells or body fluids, such as reports of experimental or clinical studies in animals, are only considered when they are clearly and directly relevant to human disease. Evaluation of commercial products have a low priority for publication, unless they are novel or represent a technological breakthrough. Studies dealing with effects of drugs and natural products and studies dealing with the redox status in various diseases are not within the journal''s scope. Development and evaluation of novel analytical methodologies where applicable to diagnostic clinical chemistry and laboratory medicine, including point-of-care testing, and topics on laboratory management and informatics will also be considered. Studies focused on emerging diagnostic technologies and (big) data analysis procedures including digitalization, mobile Health, and artificial Intelligence applied to Laboratory Medicine are also of interest.
期刊最新文献
Corrigendum to "Irisin in thyroid diseases" [Clin. Chim. Acta 564 (2025) 119929]. Steroid hormone concentrations in dried blood spots: A comparison between capillary and venous blood samples. A nomogram model for predicting advanced liver fibrosis in patients with hepatitis B: A multicenter study. Recommendations for assessing commutability of a replacement batch of a secondary calibrator certified reference material. Advances in laboratory diagnosis of Sjogren's disease in children.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1