Enas Al-Khlifeh, Ahmad S Tarawneh, Khalid Almohammadi, Malek Alrashidi, Ramadan Hassanat, Ahmad B Hassanat
{"title":"Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing.","authors":"Enas Al-Khlifeh, Ahmad S Tarawneh, Khalid Almohammadi, Malek Alrashidi, Ramadan Hassanat, Ahmad B Hassanat","doi":"10.1186/s13071-024-06618-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis.</p><p><strong>Method: </strong>Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions.</p><p><strong>Results: </strong>The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers.</p><p><strong>Conclusions: </strong>The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis.</p>","PeriodicalId":19793,"journal":{"name":"Parasites & Vectors","volume":"18 1","pages":"33"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11780931/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parasites & Vectors","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13071-024-06618-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PARASITOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis.
Method: Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions.
Results: The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers.
Conclusions: The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis.
期刊介绍:
Parasites & Vectors is an open access, peer-reviewed online journal dealing with the biology of parasites, parasitic diseases, intermediate hosts, vectors and vector-borne pathogens. Manuscripts published in this journal will be available to all worldwide, with no barriers to access, immediately following acceptance. However, authors retain the copyright of their material and may use it, or distribute it, as they wish.
Manuscripts on all aspects of the basic and applied biology of parasites, intermediate hosts, vectors and vector-borne pathogens will be considered. In addition to the traditional and well-established areas of science in these fields, we also aim to provide a vehicle for publication of the rapidly developing resources and technology in parasite, intermediate host and vector genomics and their impacts on biological research. We are able to publish large datasets and extensive results, frequently associated with genomic and post-genomic technologies, which are not readily accommodated in traditional journals. Manuscripts addressing broader issues, for example economics, social sciences and global climate change in relation to parasites, vectors and disease control, are also welcomed.