Improving the second-tier classification of methylmalonic acidemia patients using a machine learning ensemble method.

IF 6.1 2区 医学 Q1 PEDIATRICS World Journal of Pediatrics Pub Date : 2024-10-01 Epub Date: 2024-02-24 DOI:10.1007/s12519-023-00788-6
Zhi-Xing Zhu, Georgi Z Genchev, Yan-Min Wang, Wei Ji, Yong-Yong Ren, Guo-Li Tian, Sira Sriswasdi, Hui Lu
{"title":"Improving the second-tier classification of methylmalonic acidemia patients using a machine learning ensemble method.","authors":"Zhi-Xing Zhu, Georgi Z Genchev, Yan-Min Wang, Wei Ji, Yong-Yong Ren, Guo-Li Tian, Sira Sriswasdi, Hui Lu","doi":"10.1007/s12519-023-00788-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.</p><p><strong>Methods: </strong>We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.</p><p><strong>Results: </strong>Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.</p><p><strong>Conclusions: </strong>The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.</p>","PeriodicalId":23883,"journal":{"name":"World Journal of Pediatrics","volume":" ","pages":"1090-1101"},"PeriodicalIF":6.1000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502559/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12519-023-00788-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.

Methods: We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.

Results: Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.

Conclusions: The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习集合方法改进甲基丙二酸血症患者的二级分类。
简介:甲基丙二酸血症(MMA)是一种常染色体隐性遗传疾病:甲基丙二酸血症(MMA)是一种常染色体隐性遗传疾病,发病率约为 1:50,000。一级临床诊断测试通常会出现许多假阳性[五个假阳性(FP):一个真阳性(TP)]。在这项工作中,我们的目标是改进一种分类模型,以尽量减少假阳性的数量,这是目前 MMA 上游诊断中尚未满足的需求:我们开发了针对 MMA 的机器学习多变量筛选模型,作为减少假阳性的二级工具。我们利用了基于质谱的特征,其中包括从新生儿患者的干血样中提取的 11 种氨基酸和 31 种肉毒碱,然后再构建额外的比值特征。我们采用特征选择策略(过滤选择、递归特征消除和学习向量量化)来确定输入集,以评估 14 个分类模型的性能,从而确定用于开发集合模型的候选模型集:我们的工作确定了探索新陈代谢分析物的计算模型,以在不影响灵敏度的情况下减少假阳性的数量。利用随机森林算法、C5.0 算法、稀疏线性判别分析算法和自动编码器深度神经网络算法的集合,并以随机梯度提升算法作为监督算法,获得了最佳结果[接收者操作特征曲线下面积(AUROC)为 97%,灵敏度为 92%,特异性为 95%]。该模型在筛选应用中实现了良好的性能权衡,在灵敏度为 95% 时,假阳性率(FPR)为 6%;在灵敏度为 99% 时,假阳性率(FPR)为 35%;在灵敏度为 100% 时,假阳性率(FPR)为 39%:这项研究的分类结果和方法可供全球临床医生使用,以提高儿科患者MMA的整体发现率。改进后的方法在调整到 100% 精确度后,可用于进一步指导 MMA 的诊断过程,并帮助减轻患者及其家属的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
World Journal of Pediatrics
World Journal of Pediatrics 医学-小儿科
CiteScore
10.50
自引率
1.10%
发文量
592
审稿时长
2.5 months
期刊介绍: The World Journal of Pediatrics, a monthly publication, is dedicated to disseminating peer-reviewed original papers, reviews, and special reports focusing on clinical practice and research in pediatrics. We welcome contributions from pediatricians worldwide on new developments across all areas of pediatrics, including pediatric surgery, preventive healthcare, pharmacology, stomatology, and biomedicine. The journal also covers basic sciences and experimental work, serving as a comprehensive academic platform for the international exchange of medical findings.
期刊最新文献
Fecal microbiota transplants in pediatric autism: opportunities and challenges. Impact of serial clinical swallow evaluations and feeding interventions on growth and feeding outcomes in children with long-gap esophageal atresia after anastomosis: a retrospective cohort study. Expert consensus for pertussis in children: new concepts in diagnosis and treatment. Associations between the prevalence of asthma and dietary exposure to food contaminants in children: CHASER study. Correction: Longer duration of initial invasive mechanical ventilation is still a crucial risk factor for moderate-to-severe bronchopulmonary dysplasia in very preterm infants: a multicentrer prospective study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1