Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101666
Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili
This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.
The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m2), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m2), and HOMA-IR (6.05), suggesting a latent risk group.
PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.
The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.
Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.
{"title":"Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity","authors":"Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili","doi":"10.1016/j.imu.2025.101666","DOIUrl":"10.1016/j.imu.2025.101666","url":null,"abstract":"<div><div>This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.</div><div>The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m<sup>2</sup>), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m<sup>2</sup>), and HOMA-IR (6.05), suggesting a latent risk group.</div><div>PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.</div><div>The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.</div><div>Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101666"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144597124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heart attack prediction using machine learning is crucial for preemptive action and personalized healthcare. This research aims to predict heart attacks by employing machine learning in healthcare using a diverse range of patient data-including demographic, lifestyle, and physiological factors, which helps to create robust and generalizable predictions. Besides this, various models that balance accuracy with interpretability have been presented, emphasizing early detection and proactive intervention. It is expected that this cross-disciplinary approach will underline the role of machine learning in the mitigation of the heart disease burden and optimization of resources spent on healthcare.
Methods:
This study explores the application of machine learning techniques for predicting heart attack risk using structured clinical data. A range of classification models — Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) — were selected based on their proven effectiveness in prior healthcare prediction studies and their balance between accuracy and interpretability. The methodology involved comprehensive data preprocessing, class imbalance handling, and hyperparameter tuning to optimize model performance. Performance metrics included Accuracy, Precision, Recall, F1-score, and AUC-ROC. Exploratory Data Analysis (EDA) was conducted to assess the role of variables such as BMI, age, and glucose levels in predicting stroke, a proxy used for heart attack due to dataset limitations.
Results:
The SVM and LR models achieved the highest accuracy (95.08%), followed by RF (94.86%) and DT (91.46%). Despite high accuracy, key challenges were observed:
Class Imbalance: Only 249 cases in the dataset represented positive stroke outcomes, resulting in poor recall for minority class predictions. This reduced the model’s sensitivity to actual stroke cases, a significant limitation in clinical scenarios where false negatives can be life-threatening.
Data-Label Inconsistency: Although the study is framed as predicting heart attacks, the dataset pertains to stroke prediction. This misalignment creates confusion in the clinical relevance of the findings and weakens the generalizability of the models for heart attack risk assessment.
Lack of Model Interpretability in Practice: Though LIME and SHAP were cited as tools to ensure model transparency, they were not implemented or evaluated. This limits clinicians’ trust in the model’s predictions—an essential factor for real-world adoption.
Conclusion:
This research shows how machine learning can play a meaningful role in improving how we predict heart attacks and ultimately help improve patient care. The results demonstrated that even well-known models like Support Vector Machine and Logistic Regression can perform very well when applied to s
{"title":"Revolutionizing heart attack prognosis: Introducing an innovative regression model for prediction","authors":"Hanaa Albanna , Madhav Raj Theeng Tamang , Chandan Patel , Mhd Saeed Sharif","doi":"10.1016/j.imu.2025.101664","DOIUrl":"10.1016/j.imu.2025.101664","url":null,"abstract":"<div><h3>Objective:</h3><div>Heart attack prediction using machine learning is crucial for preemptive action and personalized healthcare. This research aims to predict heart attacks by employing machine learning in healthcare using a diverse range of patient data-including demographic, lifestyle, and physiological factors, which helps to create robust and generalizable predictions. Besides this, various models that balance accuracy with interpretability have been presented, emphasizing early detection and proactive intervention. It is expected that this cross-disciplinary approach will underline the role of machine learning in the mitigation of the heart disease burden and optimization of resources spent on healthcare.</div></div><div><h3>Methods:</h3><div>This study explores the application of machine learning techniques for predicting heart attack risk using structured clinical data. A range of classification models — Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Decision Tree (DT) — were selected based on their proven effectiveness in prior healthcare prediction studies and their balance between accuracy and interpretability. The methodology involved comprehensive data preprocessing, class imbalance handling, and hyperparameter tuning to optimize model performance. Performance metrics included Accuracy, Precision, Recall, F1-score, and AUC-ROC. Exploratory Data Analysis (EDA) was conducted to assess the role of variables such as BMI, age, and glucose levels in predicting stroke, a proxy used for heart attack due to dataset limitations.</div></div><div><h3>Results:</h3><div>The SVM and LR models achieved the highest accuracy (95.08%), followed by RF (94.86%) and DT (91.46%). Despite high accuracy, key challenges were observed:</div><div>Class Imbalance: Only 249 cases in the dataset represented positive stroke outcomes, resulting in poor recall for minority class predictions. This reduced the model’s sensitivity to actual stroke cases, a significant limitation in clinical scenarios where false negatives can be life-threatening.</div><div>Data-Label Inconsistency: Although the study is framed as predicting heart attacks, the dataset pertains to stroke prediction. This misalignment creates confusion in the clinical relevance of the findings and weakens the generalizability of the models for heart attack risk assessment.</div><div>Lack of Model Interpretability in Practice: Though LIME and SHAP were cited as tools to ensure model transparency, they were not implemented or evaluated. This limits clinicians’ trust in the model’s predictions—an essential factor for real-world adoption.</div></div><div><h3>Conclusion:</h3><div>This research shows how machine learning can play a meaningful role in improving how we predict heart attacks and ultimately help improve patient care. The results demonstrated that even well-known models like Support Vector Machine and Logistic Regression can perform very well when applied to s","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101664"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101669
Jesika Debnath , Al Shahriar Uddin Khondakar Pranta , Amira Hossain , Anamul Sakib , Hamdadur Rahman , Rezaul Haque , Md.Redwan Ahmed , Ahmed Wasif Reza , S M Masfequier.Rahman Swapno , Abhishek Appaji
Lung cancer continues to be a leading cause of cancer-related deaths worldwide due to its high mortality rate and the complexities involved in diagnosis. Traditional diagnostic approaches often face issues such as subjectivity, class imbalance, and limited applicability across different imaging modalities. To tackle these problems, we introduce Lung MobileVIT (LMVT), a lightweight hybrid model that combines a Convolutional Neural Network (CNN) and a Transformer for multiclass lung cancer classification. LMVT utilizes depthwise separable convolutions for local texture extraction while employing multi-head self-attention (MHSA) to capture long-range global dependencies. Furthermore, we integrate attention mechanisms based on the Convolutional Block Attention Module (CBAM) and feature selection techniques derived from the Simple Gray Level Difference Method (SGLDM) to improve discriminative focus and minimize redundancy. LMVT utilizes attention recalibration to enhance the saliency of the minority class, while also incorporating curriculum augmentation strategies that balance representation across underrepresented classes. The model has been trained and validated using two public datasets (IQ-OTH/NCCD and LC25000) and evaluated for both 3-class and 5-class classification tasks. LMVT achieved an impressive 99.61 % accuracy and 99.22 % F1-score for the 3-class classification, along with 99.75 % accuracy and 99.44 % specificity for the 5-class classification. This performance surpasses that of several recent Vision Transformer (ViT) architectures. Statistical significance tests and confidence intervals confirm the reliability of these performance metrics, while an analysis of model complexity supports its capability for potential deployment. To enhance clinical interpretability, the model is integrated with explainable AI (XAI) and is implemented within a web-based diagnostic application for analyzing CT and histopathology images. This study highlights the potential of hybrid ViT architectures in creating scalable and interpretable data-driven tools for practical use in lung cancer diagnostics.
{"title":"LMVT: A hybrid vision transformer with attention mechanisms for efficient and explainable lung cancer diagnosis","authors":"Jesika Debnath , Al Shahriar Uddin Khondakar Pranta , Amira Hossain , Anamul Sakib , Hamdadur Rahman , Rezaul Haque , Md.Redwan Ahmed , Ahmed Wasif Reza , S M Masfequier.Rahman Swapno , Abhishek Appaji","doi":"10.1016/j.imu.2025.101669","DOIUrl":"10.1016/j.imu.2025.101669","url":null,"abstract":"<div><div>Lung cancer continues to be a leading cause of cancer-related deaths worldwide due to its high mortality rate and the complexities involved in diagnosis. Traditional diagnostic approaches often face issues such as subjectivity, class imbalance, and limited applicability across different imaging modalities. To tackle these problems, we introduce Lung MobileVIT (LMVT), a lightweight hybrid model that combines a Convolutional Neural Network (CNN) and a Transformer for multiclass lung cancer classification. LMVT utilizes depthwise separable convolutions for local texture extraction while employing multi-head self-attention (MHSA) to capture long-range global dependencies. Furthermore, we integrate attention mechanisms based on the Convolutional Block Attention Module (CBAM) and feature selection techniques derived from the Simple Gray Level Difference Method (SGLDM) to improve discriminative focus and minimize redundancy. LMVT utilizes attention recalibration to enhance the saliency of the minority class, while also incorporating curriculum augmentation strategies that balance representation across underrepresented classes. The model has been trained and validated using two public datasets (IQ-OTH/NCCD and LC25000) and evaluated for both 3-class and 5-class classification tasks. LMVT achieved an impressive 99.61 % accuracy and 99.22 % F1-score for the 3-class classification, along with 99.75 % accuracy and 99.44 % specificity for the 5-class classification. This performance surpasses that of several recent Vision Transformer (ViT) architectures. Statistical significance tests and confidence intervals confirm the reliability of these performance metrics, while an analysis of model complexity supports its capability for potential deployment. To enhance clinical interpretability, the model is integrated with explainable AI (XAI) and is implemented within a web-based diagnostic application for analyzing CT and histopathology images. This study highlights the potential of hybrid ViT architectures in creating scalable and interpretable data-driven tools for practical use in lung cancer diagnostics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101669"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101693
Jorge Miguel Silva, José Luis Oliveira
Background:
Large-scale genomic research requires robust, consistent phenotypic datasets for meaningful genotype–phenotype correlations. However, diverse collection protocols, incomplete entries, and heterogeneous terminologies frequently compromise data quality and slows downstream analysis.
Methodology:
To address these issues, we present PhenoQC, a high-throughput, configuration-driven toolkit that unifies schema validation, ontology-based semantic alignment, and missing-data imputation in a single workflow. Its modular architecture leverages chunk-based parallelism to handle large datasets, while customizable schemas enforce structural and type constraints. PhenoQC applies user-defined and state-of-the-art machine learning-based imputation and performs multi-ontology mapping with fuzzy matching to harmonize phenotype text. It also quantifies potential imputation-induced distributional shifts by reporting standardized mean difference, variance ratio, and Kolmogorov–Smirnov statistics for numeric variables, and population stability index and Cramér’s for categorical variables, with user-configurable thresholds. The toolkit provides command-line and graphical interfaces for seamless integration into automated pipelines and interactive curation environments.
Results:
We benchmarked PhenoQC on synthetic datasets with up to 100,000 records and it demonstrated near-linear scalability and full recovery of artificially missing numeric values.Moreover, PhenoQC’s ontology alignment achieved over 97% accuracy under textual corruption. Finally, using two real clinical datasets, PhenoQC successfully imputed missing values, enforced schema compliance, and flagged data anomalies without significant overhead.
Conclusions:
PhenoQC saves manual curation time and ensures consistent, analysis-ready phenotypic data through its streamlined system. Its adaptable design adjusts to evolving ontologies and domain-specific rules, empowering researchers to conduct more reliable studies.
{"title":"PhenoQC: An integrated toolkit for quality control of phenotypic data in genomic research","authors":"Jorge Miguel Silva, José Luis Oliveira","doi":"10.1016/j.imu.2025.101693","DOIUrl":"10.1016/j.imu.2025.101693","url":null,"abstract":"<div><h3>Background:</h3><div>Large-scale genomic research requires robust, consistent phenotypic datasets for meaningful genotype–phenotype correlations. However, diverse collection protocols, incomplete entries, and heterogeneous terminologies frequently compromise data quality and slows downstream analysis.</div></div><div><h3>Methodology:</h3><div>To address these issues, we present PhenoQC, a high-throughput, configuration-driven toolkit that unifies schema validation, ontology-based semantic alignment, and missing-data imputation in a single workflow. Its modular architecture leverages chunk-based parallelism to handle large datasets, while customizable schemas enforce structural and type constraints. PhenoQC applies user-defined and state-of-the-art machine learning-based imputation and performs multi-ontology mapping with fuzzy matching to harmonize phenotype text. It also quantifies potential imputation-induced distributional shifts by reporting standardized mean difference, variance ratio, and Kolmogorov–Smirnov statistics for numeric variables, and population stability index and Cramér’s <span><math><mi>V</mi></math></span> for categorical variables, with user-configurable thresholds. The toolkit provides command-line and graphical interfaces for seamless integration into automated pipelines and interactive curation environments.</div></div><div><h3>Results:</h3><div>We benchmarked PhenoQC on synthetic datasets with up to 100,000 records and it demonstrated near-linear scalability and full recovery of artificially missing numeric values.Moreover, PhenoQC’s ontology alignment achieved over 97% accuracy under textual corruption. Finally, using two real clinical datasets, PhenoQC successfully imputed missing values, enforced schema compliance, and flagged data anomalies without significant overhead.</div></div><div><h3>Conclusions:</h3><div>PhenoQC saves manual curation time and ensures consistent, analysis-ready phenotypic data through its streamlined system. Its adaptable design adjusts to evolving ontologies and domain-specific rules, empowering researchers to conduct more reliable studies.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"58 ","pages":"Article 101693"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101690
Heba Sharif , Denise E. Jackson , Genia Burchall
Background
The aim of this study was to evaluate the blood film assessment of CellaVision DC-1 compared to conventional microscopy in stained peripheral blood (PB) films from paediatric samples.
Methods
Blood films (n = 50) including clinically normal samples as well as common pathological conditions, were collected and examined by conventional microscopy and CellaVision DC-1. Manual microscopy counts vs. automated WBC differentiation and RBC grading via Cellavision, including manual re-classification, were compared to expert morphologist reporting. Using statistical analysis, the following metrics were measured including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
Results
The reliability of RBC grading ranged between 60 and 100 % sensitivity and 55–74 % specificity for CellaVision method compared to 78–93 % sensitivity with manual microscopy, demonstrating the latter as the superior method. Additionally, DC-1 misclassified the presence of blasts for lymphocytes, with 67 % compared to 100 % specificity with the gold standard microscopy. Both pre- and post-classification, re-classifications, and manual microscopy showed strong correlations of WBC differential counts with expert/known readings, mainly for neutrophils and lymphocytes ( 0.60–0.85). In terms of time, CellaVision took 1 min longer to scan and assess each slide than did light microscopy, which could affect timely diagnosis and treatment decisions.
Conclusion
The use of CellaVision DC-1 may be beneficial to diagnostic laboratories in the adult setting; however, further research should focus on enhancing automated analysis when assessing paediatric samples that demand human intellect and critical thinking. Medical Scientist training and software development are recommended. Manual microscopy is faster and more accurate. Slide signing and DC-1 classifications of unclassified WBCs need scientist intervention.
{"title":"Novel comparison of CellaVision DC-1 and microscopic assessment of blood film morphology in paediatrics","authors":"Heba Sharif , Denise E. Jackson , Genia Burchall","doi":"10.1016/j.imu.2025.101690","DOIUrl":"10.1016/j.imu.2025.101690","url":null,"abstract":"<div><h3>Background</h3><div>The aim of this study was to evaluate the blood film assessment of CellaVision DC-1 compared to conventional microscopy in stained peripheral blood (PB) films from paediatric samples.</div></div><div><h3>Methods</h3><div>Blood films (n = 50) including clinically normal samples as well as common pathological conditions, were collected and examined by conventional microscopy and CellaVision DC-1. Manual microscopy counts vs. automated WBC differentiation and RBC grading via Cellavision, including manual re-classification, were compared to expert morphologist reporting. Using statistical analysis, the following metrics were measured including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).</div></div><div><h3>Results</h3><div>The reliability of RBC grading ranged between 60 and 100 % sensitivity and 55–74 % specificity for CellaVision method compared to 78–93 % sensitivity with manual microscopy, demonstrating the latter as the superior method. Additionally, DC-1 misclassified the presence of blasts for lymphocytes, with 67 % compared to 100 % specificity with the gold standard microscopy. Both pre- and post-classification, re-classifications, and manual microscopy showed strong correlations of WBC differential counts with expert/known readings, mainly for neutrophils and lymphocytes (<span><math><mrow><msup><mi>R</mi><mrow><mn>2</mn><mo>:</mo></mrow></msup></mrow></math></span> 0.60–0.85). In terms of time, CellaVision took 1 min longer to scan and assess each slide than did light microscopy, which could affect timely diagnosis and treatment decisions.</div></div><div><h3>Conclusion</h3><div>The use of CellaVision DC-1 may be beneficial to diagnostic laboratories in the adult setting; however, further research should focus on enhancing automated analysis when assessing paediatric samples that demand human intellect and critical thinking. Medical Scientist training and software development are recommended. Manual microscopy is faster and more accurate. Slide signing and DC-1 classifications of unclassified WBCs need scientist intervention.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"58 ","pages":"Article 101690"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145099557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101618
Khadija Pervez , Syed Irfan Sohail , Faiza Parwez , Muhammad Abdullah Zia
Accurate detection and classification of microscopic cells from acute lymphoblastic leukemia remain challenging due to the difficulty of differentiating between cancerous and healthy cells. This paper proposes a novel approach to identify and categorize acute lymphoblastic leukemia that uses explainable artificial intelligence and federated learning to train models across multiple institutions while keeping patient information decentralized and encrypted. The framework trains EfficientNetB3 for the classification of leukemia cells and incorporates explainability techniques to make decisions of the underlying model transparent and interpretable. The framework employs a hierarchical federated learning approach that allows distributed learning across clinical centers, ensuring that sensitive patient data remain localized. Explainability techniques such as saliency maps, occlusion sensitivity, and randomized input sampling for explanation with relevant evaluation scores are integrated in the framework to provide visual and textual explanations of model’s predictions to enhance interpretability. The experiments were carried out on a publicly available dataset consisting of 15,135 microscopic images. The performance of the proposed model was benchmarked against traditional centralized models and classical federated learning techniques. The proposed model demonstrated a 2.5% improvement in accuracy (96.5%) and a 5.4% increase in F1-score (94.4%) compared to baseline models. Hierarchical federated learning reduced communication costs by 15% while maintaining data privacy. The integration of explainable artificial intelligence improved the transparency of model decisions, with a high area under the ROC curve (AUC) of 0.98 for the classification of leukemia cells. These results suggest that the proposed framework offers a robust solution for intelligent systems for medical diagnostics and can also be extended to other medical imaging tasks.
{"title":"Towards trustworthy AI-driven leukemia diagnosis: A hybrid Hierarchical Federated Learning and explainable AI framework","authors":"Khadija Pervez , Syed Irfan Sohail , Faiza Parwez , Muhammad Abdullah Zia","doi":"10.1016/j.imu.2025.101618","DOIUrl":"10.1016/j.imu.2025.101618","url":null,"abstract":"<div><div>Accurate detection and classification of microscopic cells from acute lymphoblastic leukemia remain challenging due to the difficulty of differentiating between cancerous and healthy cells. This paper proposes a novel approach to identify and categorize acute lymphoblastic leukemia that uses explainable artificial intelligence and federated learning to train models across multiple institutions while keeping patient information decentralized and encrypted. The framework trains EfficientNetB3 for the classification of leukemia cells and incorporates explainability techniques to make decisions of the underlying model transparent and interpretable. The framework employs a hierarchical federated learning approach that allows distributed learning across clinical centers, ensuring that sensitive patient data remain localized. Explainability techniques such as saliency maps, occlusion sensitivity, and randomized input sampling for explanation with relevant evaluation scores are integrated in the framework to provide visual and textual explanations of model’s predictions to enhance interpretability. The experiments were carried out on a publicly available dataset consisting of 15,135 microscopic images. The performance of the proposed model was benchmarked against traditional centralized models and classical federated learning techniques. The proposed model demonstrated a 2.5% improvement in accuracy (96.5%) and a 5.4% increase in F1-score (94.4%) compared to baseline models. Hierarchical federated learning reduced communication costs by 15% while maintaining data privacy. The integration of explainable artificial intelligence improved the transparency of model decisions, with a high area under the ROC curve (AUC) of 0.98 for the classification of leukemia cells. These results suggest that the proposed framework offers a robust solution for intelligent systems for medical diagnostics and can also be extended to other medical imaging tasks.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"53 ","pages":"Article 101618"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143103459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101628
Jonel Bation , Mary Ann Jaro , Lheyniel Jane Nery , Mudjahidin Mudjahidin , Andre Parvian Aristio , Eddie Bouy Palad , Jason Chavez , Lemuel Clark Velasco
The implementation of Customer Relationship Management (CRM) Systems in clinical laboratories is crucial in improving customer relationships, service quality, and operational efficiency that aligns with a patient-centric care model. This study utilizes the PRISMA guidelines in reviewing and synthesizing 26 journal articles using the People, Process, Technology (PPT) framework to analyze the roles of people involved in clinical settings, the processes by which laboratory services were delivered, and the technological considerations enhancing patient care. Results revealed that the successful implementation of CRM systems in clinical laboratories depends on the aligned efforts of both developers and end-users. Subsequently, marketing processes and customer service were then found out to be crucial for the successful utilization of CRM systems in clinical laboratories. The features and the system integration techniques of CRM systems were found out to be vital in developing efficient operations, enhancing data analysis, and extending accessibility. The research gap analysis, on the one hand, shows that the effectiveness of CRM systems on the patients, the lack of qualitative methods, and the development of corrective actions to increase patient satisfaction are relevant areas of research concerns to optimize the effectiveness of implementing different CRM systems.
{"title":"Customer relationship management systems in clinical laboratories: A systematic review","authors":"Jonel Bation , Mary Ann Jaro , Lheyniel Jane Nery , Mudjahidin Mudjahidin , Andre Parvian Aristio , Eddie Bouy Palad , Jason Chavez , Lemuel Clark Velasco","doi":"10.1016/j.imu.2025.101628","DOIUrl":"10.1016/j.imu.2025.101628","url":null,"abstract":"<div><div>The implementation of Customer Relationship Management (CRM) Systems in clinical laboratories is crucial in improving customer relationships, service quality, and operational efficiency that aligns with a patient-centric care model. This study utilizes the PRISMA guidelines in reviewing and synthesizing 26 journal articles using the People, Process, Technology (PPT) framework to analyze the roles of people involved in clinical settings, the processes by which laboratory services were delivered, and the technological considerations enhancing patient care. Results revealed that the successful implementation of CRM systems in clinical laboratories depends on the aligned efforts of both developers and end-users. Subsequently, marketing processes and customer service were then found out to be crucial for the successful utilization of CRM systems in clinical laboratories. The features and the system integration techniques of CRM systems were found out to be vital in developing efficient operations, enhancing data analysis, and extending accessibility. The research gap analysis, on the one hand, shows that the effectiveness of CRM systems on the patients, the lack of qualitative methods, and the development of corrective actions to increase patient satisfaction are relevant areas of research concerns to optimize the effectiveness of implementing different CRM systems.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"53 ","pages":"Article 101628"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101622
Junko Ami , Yanbo Pang , Hiroshi Masui , Takashi Okumura , Yoshihide Sekimoto
During the COVID-19 pandemic, many countries adopted Digital Contact Tracing (DCT) technology to control infections. However, the widely-used Bluetooth Low Energy (BLE)-based DCT requires both the infected individual and the contact to have the application activated to detect exposure. Forcing citizens to install the DCT application could compromise their privacy. Therefore, to make DCT a truly usable tool, it is crucial to develop a DCT system that possesses high sensitivity, without depending on the application usage rate.
The Computation of Infection Risk via Confidential Locational Entries (CIRCLE) is a DCT method that utilizes connection logs from mobile phone base stations, theoretically offering much higher sensitivity than BLE-based DCT. However, its real performance has not been proven, and thus, this paper estimates the sensitivity and specificity of both BLE-based DCT and CIRCLE in a comparative setting. The estimation combines simulated movement patterns of residents with real-world data from app usage in Japan, utilizing both simulation and numerical modeling, with missing data supplemented through sensitivity analysis.
The sensitivity of BLE-based DCT is severely limited by the application’s usage rate, with an estimated baseline of just 10.9%, and even under highly optimistic assumptions, it only reaches 27.0%. In contrast, CIRCLE demonstrated a significantly higher sensitivity of 85.6%, greatly surpassing BLE-based DCT. The specificity of CIRCLE, though, decreased as the number of infected individuals increased, dropping to less than half of BLE-based DCT’s specificity during widespread infection. The BLE-based DCT used during the pandemic suffers from low sensitivity. While CIRCLE has specificity challenges, it provides exceptionally high sensitivity. Integrating these methods could redefine the design of digital contact tracing, leading to better utility for future infection control.
{"title":"Advancing the Sensitivity Frontier in digital contact tracing: Comparative analysis of proposed methods toward maximized utility","authors":"Junko Ami , Yanbo Pang , Hiroshi Masui , Takashi Okumura , Yoshihide Sekimoto","doi":"10.1016/j.imu.2025.101622","DOIUrl":"10.1016/j.imu.2025.101622","url":null,"abstract":"<div><div>During the COVID-19 pandemic, many countries adopted Digital Contact Tracing (DCT) technology to control infections. However, the widely-used Bluetooth Low Energy (BLE)-based DCT requires both the infected individual and the contact to have the application activated to detect exposure. Forcing citizens to install the DCT application could compromise their privacy. Therefore, to make DCT a truly usable tool, it is crucial to develop a DCT system that possesses high sensitivity, without depending on the application usage rate.</div><div>The Computation of Infection Risk via Confidential Locational Entries (CIRCLE) is a DCT method that utilizes connection logs from mobile phone base stations, theoretically offering much higher sensitivity than BLE-based DCT. However, its real performance has not been proven, and thus, this paper estimates the sensitivity and specificity of both BLE-based DCT and CIRCLE in a comparative setting. The estimation combines simulated movement patterns of residents with real-world data from app usage in Japan, utilizing both simulation and numerical modeling, with missing data supplemented through sensitivity analysis.</div><div>The sensitivity of BLE-based DCT is severely limited by the application’s usage rate, with an estimated baseline of just 10.9%, and even under highly optimistic assumptions, it only reaches 27.0%. In contrast, CIRCLE demonstrated a significantly higher sensitivity of 85.6%, greatly surpassing BLE-based DCT. The specificity of CIRCLE, though, decreased as the number of infected individuals increased, dropping to less than half of BLE-based DCT’s specificity during widespread infection. The BLE-based DCT used during the pandemic suffers from low sensitivity. While CIRCLE has specificity challenges, it provides exceptionally high sensitivity. Integrating these methods could redefine the design of digital contact tracing, leading to better utility for future infection control.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"53 ","pages":"Article 101622"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.imu.2025.101707
Nattawipa Thawinwisan , Chang Liu , Goshiro Yamamoto , Kazumasa Kishimoto , Yukiko Mori , Tomohiro Kuroda
Background:
Patient perception is crucial for medical decision-making. However, discrepancies often arise between patients’ reports and physician documentation. This study aims to examine differences in subjective information between patient questionnaires and physician records in a tertiary hospital setting to provide insights for the development of clinical documentation tools.
Methods:
We retrospectively analyzed 500 paired patient questionnaires and corresponding physician records from five departments at Kyoto University Hospital. Subjective information from the History of Present Illness (HPI) was manually extracted. AI-assisted comparison identified discrepancies, with outputs reviewed by trained researchers. Discrepancies were graded by severity and examined for symptom characteristics and documentation patterns. Logistic regression assessed associations between discrepancy and patient demographics, department, and additional content indicators, with results expressed as odds ratios (OR) and 95% confidence intervals (CI).
Results:
HPI-related subjective information appeared in 72.8% of patient questionnaires and 80.6% of physician records. However, 48.6% of physician records missed portions of patient-reported information, while 70.8% included additional information absent from questionnaires. Physicians frequently omitted details particularly in onset, quality, severity, and modifying factors. Vague symptoms were more likely to be omitted. Documentation practices varied across departments. Record alignment improved when the patients themselves referenced investigations or referrals.
Conclusion:
Discrepancies between patient-reported and physician-documented subjective information are common and may affect diagnostic accuracy and care continuity. Enhancing patient questionnaires, supporting interactive history taking, and preserving original patient expressions may help bridge documentation gaps. These findings support the development of tools and strategies that better integrate patient narratives into clinical documentation.
{"title":"Comparing patient questionnaires and physician documentation in a tertiary hospital setting: A retrospective analysis of subjective information","authors":"Nattawipa Thawinwisan , Chang Liu , Goshiro Yamamoto , Kazumasa Kishimoto , Yukiko Mori , Tomohiro Kuroda","doi":"10.1016/j.imu.2025.101707","DOIUrl":"10.1016/j.imu.2025.101707","url":null,"abstract":"<div><h3>Background:</h3><div>Patient perception is crucial for medical decision-making. However, discrepancies often arise between patients’ reports and physician documentation. This study aims to examine differences in subjective information between patient questionnaires and physician records in a tertiary hospital setting to provide insights for the development of clinical documentation tools.</div></div><div><h3>Methods:</h3><div>We retrospectively analyzed 500 paired patient questionnaires and corresponding physician records from five departments at Kyoto University Hospital. Subjective information from the History of Present Illness (HPI) was manually extracted. AI-assisted comparison identified discrepancies, with outputs reviewed by trained researchers. Discrepancies were graded by severity and examined for symptom characteristics and documentation patterns. Logistic regression assessed associations between discrepancy and patient demographics, department, and additional content indicators, with results expressed as odds ratios (OR) and 95% confidence intervals (CI).</div></div><div><h3>Results:</h3><div>HPI-related subjective information appeared in 72.8% of patient questionnaires and 80.6% of physician records. However, 48.6% of physician records missed portions of patient-reported information, while 70.8% included additional information absent from questionnaires. Physicians frequently omitted details particularly in onset, quality, severity, and modifying factors. Vague symptoms were more likely to be omitted. Documentation practices varied across departments. Record alignment improved when the patients themselves referenced investigations or referrals.</div></div><div><h3>Conclusion:</h3><div>Discrepancies between patient-reported and physician-documented subjective information are common and may affect diagnostic accuracy and care continuity. Enhancing patient questionnaires, supporting interactive history taking, and preserving original patient expressions may help bridge documentation gaps. These findings support the development of tools and strategies that better integrate patient narratives into clinical documentation.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"59 ","pages":"Article 101707"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145419095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the COVID-19 pandemic, Digital Contact Tracing (DCT) tools were deployed worldwide as critical Non-Pharmaceutical Interventions aimed at controlling virus transmission. While simulation studies and some real-world evaluations suggested their potential effectiveness, many were discontinued before the pandemic ended. The reasons behind these decisions and the full lifecycle of these applications remain poorly documented. To address this gap, we conducted a cross-national multi-lingual survey on the status and discontinuation of contact tracing apps.
We developed a registry of countries and their DCT apps by combining existing study results with new online surveys. For each app, we manually collected data on operational status, reasons for termination, and contextual factors. A qualitative analysis was then conducted to identify common patterns and their potential association with national pandemic trajectories.
The registry includes 184 DCT apps across 158 countries and regions. Among these, 45.7% had been terminated by the time of analysis. Termination reasons were categorized into five primary areas: pandemic stage, government policy shifts, privacy concerns, technical challenges, and user acceptance issues. Notably, apps that did not use the Google/Apple Exposure Notification framework were more likely to face privacy and technical barriers, contributing to early shutdowns. We also observed cases where app termination was followed by infection surges.
This study showed that effectiveness and continuity of DCT apps depend not only on technical performance but also on strategic alignment with infection control measures and adequate supporting resources. Based on the findings, Future DCT systems should be designed to remain viable throughout pandemics.
{"title":"Cross-national survey on the termination of Digital Contact Tracing apps: Have we killed the goose that lays the golden eggs?","authors":"Yuki Kamei , Wataru Tanabe , Manabu Ichikawa , Takashi Okumura","doi":"10.1016/j.imu.2025.101694","DOIUrl":"10.1016/j.imu.2025.101694","url":null,"abstract":"<div><div>During the COVID-19 pandemic, Digital Contact Tracing (DCT) tools were deployed worldwide as critical Non-Pharmaceutical Interventions aimed at controlling virus transmission. While simulation studies and some real-world evaluations suggested their potential effectiveness, many were discontinued before the pandemic ended. The reasons behind these decisions and the full lifecycle of these applications remain poorly documented. To address this gap, we conducted a cross-national multi-lingual survey on the status and discontinuation of contact tracing apps.</div><div>We developed a registry of countries and their DCT apps by combining existing study results with new online surveys. For each app, we manually collected data on operational status, reasons for termination, and contextual factors. A qualitative analysis was then conducted to identify common patterns and their potential association with national pandemic trajectories.</div><div>The registry includes 184 DCT apps across 158 countries and regions. Among these, 45.7% had been terminated by the time of analysis. Termination reasons were categorized into five primary areas: pandemic stage, government policy shifts, privacy concerns, technical challenges, and user acceptance issues. Notably, apps that did not use the Google/Apple Exposure Notification framework were more likely to face privacy and technical barriers, contributing to early shutdowns. We also observed cases where app termination was followed by infection surges.</div><div>This study showed that effectiveness and continuity of DCT apps depend not only on technical performance but also on strategic alignment with infection control measures and adequate supporting resources. Based on the findings, Future DCT systems should be designed to remain viable throughout pandemics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"59 ","pages":"Article 101694"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145463706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}