Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing.

IF 3.5 2区 医学 Q1 PARASITOLOGY Parasites & Vectors Pub Date : 2025-01-29 DOI:10.1186/s13071-024-06618-6
Enas Al-Khlifeh, Ahmad S Tarawneh, Khalid Almohammadi, Malek Alrashidi, Ramadan Hassanat, Ahmad B Hassanat
{"title":"Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing.","authors":"Enas Al-Khlifeh, Ahmad S Tarawneh, Khalid Almohammadi, Malek Alrashidi, Ramadan Hassanat, Ahmad B Hassanat","doi":"10.1186/s13071-024-06618-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis.</p><p><strong>Method: </strong>Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions.</p><p><strong>Results: </strong>The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers.</p><p><strong>Conclusions: </strong>The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis.</p>","PeriodicalId":19793,"journal":{"name":"Parasites & Vectors","volume":"18 1","pages":"33"},"PeriodicalIF":3.5000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11780931/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parasites & Vectors","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13071-024-06618-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PARASITOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis.

Method: Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions.

Results: The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers.

Conclusions: The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于决策树的学习和实验室数据挖掘:一种有效的阿米巴病检测方法。
背景:阿米巴病是一个重要的全球健康问题。这在感染更为普遍的发展中国家尤为明显。实验室的主要诊断方法包括粪便样本的显微镜检查。然而,这种方法有时会导致阿米巴病被误解为其他胃肠炎(GE)病症。这项工作的目标是产生一个机器学习(ML)模型,该模型使用实验室结果和人口统计信息来自动预测阿米巴病。方法:提取约旦2020 - 2022年电子病历(EMR)数据,包括763例阿米巴病例和314例非阿米巴病例。患者人口统计学、临床体征、显微诊断和白细胞计数被用来训练八种决策树算法,并比较它们的预测准确性。采用特征排序和相关性方法,提高阿米巴病与其他疾病的分类准确率。结果:区分阿米巴病的主要因变量包括中性粒细胞的百分比,粘液的存在,以及粪便样本中红细胞(rbc)和白细胞(wbc)的计数。当采用决策树分类器,包括决策树(DT)、随机森林(RF)、XGBoost、AdaBoost和梯度增强(GB)时,预测准确度和精度从92%到94.6%不等。然而,优化后的RF模型显示,从实验室数据检测阿米巴病的曲线下面积(AUC)为98%,仅使用300个估计器,最大深度为20。本研究强调阿米巴病在约旦是一个重要的健康问题,在本研究中占所有胃肠炎发作的17.22%。男性性别和年龄与阿米巴病发病率较高相关(P = 0.014),超过25%的病例发生在婴幼儿中。结论:ML应用于EMR可准确预测阿米巴病。这一发现极大地促进了机器学习作为寄生虫病诊断决策支持系统的新兴应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Parasites & Vectors
Parasites & Vectors 医学-寄生虫学
CiteScore
6.30
自引率
9.40%
发文量
433
审稿时长
1.4 months
期刊介绍: Parasites & Vectors is an open access, peer-reviewed online journal dealing with the biology of parasites, parasitic diseases, intermediate hosts, vectors and vector-borne pathogens. Manuscripts published in this journal will be available to all worldwide, with no barriers to access, immediately following acceptance. However, authors retain the copyright of their material and may use it, or distribute it, as they wish. Manuscripts on all aspects of the basic and applied biology of parasites, intermediate hosts, vectors and vector-borne pathogens will be considered. In addition to the traditional and well-established areas of science in these fields, we also aim to provide a vehicle for publication of the rapidly developing resources and technology in parasite, intermediate host and vector genomics and their impacts on biological research. We are able to publish large datasets and extensive results, frequently associated with genomic and post-genomic technologies, which are not readily accommodated in traditional journals. Manuscripts addressing broader issues, for example economics, social sciences and global climate change in relation to parasites, vectors and disease control, are also welcomed.
期刊最新文献
Development and evaluation of an assay for the detection of tick-borne encephalitis virus RNA via real-time PCR with reverse transcription. Improved serological testing for bovine schistosomiasis in Eastern Africa. Nanoscaffold-based 3D human liver spheroids for predictive hepatotoxicity screening of antimalarial compounds from the global health priority box. Leishmania infantum in red foxes (Vulpes vulpes): from clinical findings to cytokine expression. Assessment of environmental contamination with Echinococcus spp. through DNA detection in free-roaming canid feces and soil in human echinococcosis hotspots from the Three-River-Source Region of the Qinghai-Tibet Plateau, China.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1