Pub Date : 2025-11-06DOI: 10.1016/j.health.2025.100431
Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard
Healthcare cost models that use a great number of detailed ICD-10 diagnostic codes produce unstable results, yet the underlying causes of this instability have not been well understood. This study provides a mathematical framework linking the variability of model coefficients to the uneven, power-law distribution of diagnostic codes and the structure of the regression model. We propose a transparent approach that improves coefficient stability by merging similar codes through hierarchical truncation. Using Medicare data, we demonstrate how this method clarifies the trade-off between code detail and model reliability, offering analysts and policymakers a practical and interpretable tool for diagnosis-based cost modeling.
{"title":"A log-linear analytics approach to cost model regularization for inpatient stays through diagnostic code merging","authors":"Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard","doi":"10.1016/j.health.2025.100431","DOIUrl":"10.1016/j.health.2025.100431","url":null,"abstract":"<div><div>Healthcare cost models that use a great number of detailed ICD-10 diagnostic codes produce unstable results, yet the underlying causes of this instability have not been well understood. This study provides a mathematical framework linking the variability of model coefficients to the uneven, power-law distribution of diagnostic codes and the structure of the regression model. We propose a transparent approach that improves coefficient stability by merging similar codes through hierarchical truncation. Using Medicare data, we demonstrate how this method clarifies the trade-off between code detail and model reliability, offering analysts and policymakers a practical and interpretable tool for diagnosis-based cost modeling.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100431"},"PeriodicalIF":0.0,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145464881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dengue fever remains a major global health concern that demands rapid and accurate diagnosis to prevent severe complications and support timely patient care. Traditional approaches relying on environmental variables often lack patient-level precision, limiting their clinical applicability. This study focuses on hematological parameters as more reliable indicators for early dengue detection. A novel machine learning framework, DengueStackX-19, was developed using 1,523 clinically verified patient records from Jamalpur 250-Bedded General Hospital, Jamalpur, Bangladesh. The dataset underwent rigorous preprocessing, normalization, and imbalance handling using various resampling techniques. Comparative evaluation across five balancing methods demonstrated that DengueStackX-19 consistently achieved the highest accuracy and robustness, performing effectively both before and after outlier removal. The model achieved 93.65 % accuracy and 89.63 % F1 during 10-fold cross-validation under SMOTEENN, and further attained 96.38 % accuracy and 94.20 % F1 in dengue classification, demonstrating robust generalization and consistent high performance across evaluation phases. Sensitivity analysis further verified its stability under feature perturbations. To ensure interpretability, SHAP and LIME were applied to identify the hematological factors most influential to the model's predictions, and the resulting patterns aligned with established clinical understanding. The model was deployed as an accessible web-based diagnostic tool, allowing healthcare professionals to perform real-time dengue detection without specialized laboratory infrastructure. This study demonstrates that hematology-driven AI models can significantly enhance diagnostic accuracy, reduce decision-making time, and improve patient outcomes, particularly in resource-limited settings.
{"title":"An interpretable machine learning model for dengue detection with clinical hematological data","authors":"Izaz Ahmmed Tuhin , A.K.M.Fazlul Kobir Siam , Md Mahfuzur Rahman Shanto , Md Rajib Mia , Imran Mahmud , Apurba Ghosh","doi":"10.1016/j.health.2025.100430","DOIUrl":"10.1016/j.health.2025.100430","url":null,"abstract":"<div><div>Dengue fever remains a major global health concern that demands rapid and accurate diagnosis to prevent severe complications and support timely patient care. Traditional approaches relying on environmental variables often lack patient-level precision, limiting their clinical applicability. This study focuses on hematological parameters as more reliable indicators for early dengue detection. A novel machine learning framework, DengueStackX-19, was developed using 1,523 clinically verified patient records from Jamalpur 250-Bedded General Hospital, Jamalpur, Bangladesh. The dataset underwent rigorous preprocessing, normalization, and imbalance handling using various resampling techniques. Comparative evaluation across five balancing methods demonstrated that DengueStackX-19 consistently achieved the highest accuracy and robustness, performing effectively both before and after outlier removal. The model achieved 93.65 % accuracy and 89.63 % F1 during 10-fold cross-validation under SMOTEENN, and further attained 96.38 % accuracy and 94.20 % F1 in dengue classification, demonstrating robust generalization and consistent high performance across evaluation phases. Sensitivity analysis further verified its stability under feature perturbations. To ensure interpretability, SHAP and LIME were applied to identify the hematological factors most influential to the model's predictions, and the resulting patterns aligned with established clinical understanding. The model was deployed as an accessible web-based diagnostic tool, allowing healthcare professionals to perform real-time dengue detection without specialized laboratory infrastructure. This study demonstrates that hematology-driven AI models can significantly enhance diagnostic accuracy, reduce decision-making time, and improve patient outcomes, particularly in resource-limited settings.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100430"},"PeriodicalIF":0.0,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145465963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1016/j.health.2025.100427
D. Cenitta , N. Arul , T. Praveen Pai , R. Vijaya Arjunan , Tanuja Shailesh
Ischemic Heart Disease (IHD) stands as one of the primary contributors to worldwide deaths, therefore requiring precise and efficient predictive models. Standard machine learning techniques encounter hurdles, including excessive feature dimensions and unbalanced data distribution together with inappropriate feature group choice that negatively affect model effectiveness. The research introduces an optimized feature selection method by employing an Improved Squirrel Search Algorithm (ISSA) to raise the predictive capacity for IHD classification. The ISSA implements adaptive search features to automatically optimize feature selection, through which it maintains important attributes while eliminating redundant information. The selected features are evaluated using a Random Forest classifier, known for its robustness and interpretability in medical prediction tasks. Experimental results on the University of California Irvine (UCI) Heart Disease dataset show that the Improved Squirrel Search Algorithm–Random Forest (ISSA-RF) model achieves a classification accuracy of 98.12 %, outperforming existing feature selection techniques while reducing computational overhead. Bio-inspired optimization proves effective in medical diagnostics through recent research findings that lead to more efficient predictive healthcare models with interpretable properties.
{"title":"A bio-inspired approach to feature optimization for ischemic heart disease detection","authors":"D. Cenitta , N. Arul , T. Praveen Pai , R. Vijaya Arjunan , Tanuja Shailesh","doi":"10.1016/j.health.2025.100427","DOIUrl":"10.1016/j.health.2025.100427","url":null,"abstract":"<div><div>Ischemic Heart Disease (IHD) stands as one of the primary contributors to worldwide deaths, therefore requiring precise and efficient predictive models. Standard machine learning techniques encounter hurdles, including excessive feature dimensions and unbalanced data distribution together with inappropriate feature group choice that negatively affect model effectiveness. The research introduces an optimized feature selection method by employing an Improved Squirrel Search Algorithm (ISSA) to raise the predictive capacity for IHD classification. The ISSA implements adaptive search features to automatically optimize feature selection, through which it maintains important attributes while eliminating redundant information. The selected features are evaluated using a Random Forest classifier, known for its robustness and interpretability in medical prediction tasks. Experimental results on the University of California Irvine (UCI) Heart Disease dataset show that the Improved Squirrel Search Algorithm–Random Forest (ISSA-RF) model achieves a classification accuracy of 98.12 %, outperforming existing feature selection techniques while reducing computational overhead. Bio-inspired optimization proves effective in medical diagnostics through recent research findings that lead to more efficient predictive healthcare models with interpretable properties.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100427"},"PeriodicalIF":0.0,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1016/j.health.2025.100428
John Wang , Shubin Xu , Yawei Wang , Houda EL Bouhissi
The United States healthcare system relies heavily on Medicaid, which serves nearly 80 million people and accounts for a substantial share of both state and federal budgets. This study employs a range of forecasting methods, including ARIMA, Holt's linear trend, polynomial regressions (degree 2 and 4), Prophet, and piecewise linear regression, as well as machine learning models such as random forest, gradient boosting, and support vector regression, to analyze the growth of Medicaid expenditures. Using data from 1966 to 2024, the analysis identifies historical patterns and evaluates model performance with Root Mean Squared Error (RMSE) and related metrics to project costs through 2035. The results show that the autoregressive model with integrated moving average and Prophet generate the most accurate baseline forecasts, suggesting that Medicaid expenditures are likely to exceed one trillion dollars within the next 15 years. Although the machine learning models produced somewhat lower estimates, they revealed complex relationships between policy variables and expenditure behavior, making them useful for building alternative forecasting scenarios. The discussion emphasizes the policy relevance of these findings, particularly in relation to budget sustainability and healthcare equity, and highlights the importance of employing multiple forecasting approaches. Overall, the study demonstrates the value of decision analytics in healthcare forecasting by highlighting the need for accurate predictions, flexible models, and interpretable outcomes. It provides evidence-based tools to anticipate Medicaid's financial challenges and support the development of sustainable healthcare strategies for the years ahead.
{"title":"An analytics framework for healthcare expenditure forecasting with machine learning","authors":"John Wang , Shubin Xu , Yawei Wang , Houda EL Bouhissi","doi":"10.1016/j.health.2025.100428","DOIUrl":"10.1016/j.health.2025.100428","url":null,"abstract":"<div><div>The United States healthcare system relies heavily on Medicaid, which serves nearly 80 million people and accounts for a substantial share of both state and federal budgets. This study employs a range of forecasting methods, including ARIMA, Holt's linear trend, polynomial regressions (degree 2 and 4), Prophet, and piecewise linear regression, as well as machine learning models such as random forest, gradient boosting, and support vector regression, to analyze the growth of Medicaid expenditures. Using data from 1966 to 2024, the analysis identifies historical patterns and evaluates model performance with Root Mean Squared Error (RMSE) and related metrics to project costs through 2035. The results show that the autoregressive model with integrated moving average and Prophet generate the most accurate baseline forecasts, suggesting that Medicaid expenditures are likely to exceed one trillion dollars within the next 15 years. Although the machine learning models produced somewhat lower estimates, they revealed complex relationships between policy variables and expenditure behavior, making them useful for building alternative forecasting scenarios. The discussion emphasizes the policy relevance of these findings, particularly in relation to budget sustainability and healthcare equity, and highlights the importance of employing multiple forecasting approaches. Overall, the study demonstrates the value of decision analytics in healthcare forecasting by highlighting the need for accurate predictions, flexible models, and interpretable outcomes. It provides evidence-based tools to anticipate Medicaid's financial challenges and support the development of sustainable healthcare strategies for the years ahead.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100428"},"PeriodicalIF":0.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This systematic review explores the advances, technologies, and applications of deep learning in spinal cord magnetic resonance imaging (MRI). The current state of deep-learning techniques used for injury detection, disease diagnosis, and treatment planning in spinal cord imaging is thoroughly examined. This review includes a systematic analysis of over 100 studies from 2018 to 2025, selected based on clinical relevance, model performance, and innovation. Through a comprehensive analysis of recent literature, this review highlights the evolution and effectiveness of various deep-learning models in enhancing the accuracy and reliability of spinal cord MRI interpretations. Significant contributions of this review include identifying the most effective and innovative deep-learning approaches, such as Convolutional Neural Networks (CNNs) for precise lesion segmentation and Generative Adversarial Networks (GANs) for data augmentation. Additionally, it synthesizes current applications, such as improved injury detection and multiple sclerosis diagnosis, and explores deep-learning’s role in treatment planning. The review also addresses the challenges and limitations faced in this domain, including data scarcity, model interpretability, and computational demands, and proposes potential solutions and directions for future research. By offering these insights, this review provides a unique perspective on integrating deep-learning models into clinical workflows and their impact on clinical outcomes and patient care.
{"title":"An in-depth review and analysis of deep learning methods and applications in spinal cord imaging","authors":"Md Sabbir Hossain , Mostafijur Rahman , Mumtahina Ahmed , Ashifur Rahman , Md Mohsin Kabir , M.F. Mridha , Jungpil Shin","doi":"10.1016/j.health.2025.100429","DOIUrl":"10.1016/j.health.2025.100429","url":null,"abstract":"<div><div>This systematic review explores the advances, technologies, and applications of deep learning in spinal cord magnetic resonance imaging (MRI). The current state of deep-learning techniques used for injury detection, disease diagnosis, and treatment planning in spinal cord imaging is thoroughly examined. This review includes a systematic analysis of over 100 studies from 2018 to 2025, selected based on clinical relevance, model performance, and innovation. Through a comprehensive analysis of recent literature, this review highlights the evolution and effectiveness of various deep-learning models in enhancing the accuracy and reliability of spinal cord MRI interpretations. Significant contributions of this review include identifying the most effective and innovative deep-learning approaches, such as Convolutional Neural Networks (CNNs) for precise lesion segmentation and Generative Adversarial Networks (GANs) for data augmentation. Additionally, it synthesizes current applications, such as improved injury detection and multiple sclerosis diagnosis, and explores deep-learning’s role in treatment planning. The review also addresses the challenges and limitations faced in this domain, including data scarcity, model interpretability, and computational demands, and proposes potential solutions and directions for future research. By offering these insights, this review provides a unique perspective on integrating deep-learning models into clinical workflows and their impact on clinical outcomes and patient care.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100429"},"PeriodicalIF":0.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-19DOI: 10.1016/j.health.2025.100426
Félicien Hêche , Philipp Schiller , Oussama Barakat , Thibaut Desmettre , Stephan Robert-Nicoud
This study investigates the impact of 19 external factors, related to weather, road traffic conditions, air quality, and time, on the hourly occurrence of emergencies. The analysis relies on six years of dispatch records (2015–2021) from the Centre Hospitalier Universitaire Vaudois (CHUV), which oversees 18 ambulance stations across the French-speaking region of Switzerland. First, classical statistical methods, including Chi-squared test, Student’s -test, and information value, are employed to identify dependencies between the occurrence of emergencies and the considered parameters. Additionally, SHapley Additive exPlanations (SHAP) values and permutation importance are computed using eXtreme Gradient Boosting (XGBoost) and Multilayer Perceptron (MLP) models. Training and hyperparameter optimization were performed on data from 2015–2020, while the 2021 data were held out for evaluation and for computing model interpretation metrics. Results indicate that temporal features – particularly the hour of the day – are the dominant drivers of emergency occurrences, whereas other external factors contribute minimally once temporal effects are accounted for. Subsequently, performance comparisons with a simplified model that considers only the hour of the day suggest that more complex machine learning approaches offer limited added value in this context. Operationally, this result supports the use of simple time-dependent demand curves for EMS planning. Such models can effectively guide staffing schedules and relocations without the overhead of integrating external data or maintaining complex pipelines. By highlighting the limited utility of external predictors, this study provides practical guidance for EMS organizations seeking efficient, data-driven resource allocation methods.
{"title":"An analytical study of external factors influencing emergency occurrences in healthcare","authors":"Félicien Hêche , Philipp Schiller , Oussama Barakat , Thibaut Desmettre , Stephan Robert-Nicoud","doi":"10.1016/j.health.2025.100426","DOIUrl":"10.1016/j.health.2025.100426","url":null,"abstract":"<div><div>This study investigates the impact of 19 external factors, related to weather, road traffic conditions, air quality, and time, on the hourly occurrence of emergencies. The analysis relies on six years of dispatch records (2015–2021) from the Centre Hospitalier Universitaire Vaudois (CHUV), which oversees 18 ambulance stations across the French-speaking region of Switzerland. First, classical statistical methods, including Chi-squared test, Student’s <span><math><mi>t</mi></math></span>-test, and information value, are employed to identify dependencies between the occurrence of emergencies and the considered parameters. Additionally, SHapley Additive exPlanations (SHAP) values and permutation importance are computed using eXtreme Gradient Boosting (XGBoost) and Multilayer Perceptron (MLP) models. Training and hyperparameter optimization were performed on data from 2015–2020, while the 2021 data were held out for evaluation and for computing model interpretation metrics. Results indicate that temporal features – particularly the hour of the day – are the dominant drivers of emergency occurrences, whereas other external factors contribute minimally once temporal effects are accounted for. Subsequently, performance comparisons with a simplified model that considers only the hour of the day suggest that more complex machine learning approaches offer limited added value in this context. Operationally, this result supports the use of simple time-dependent demand curves for EMS planning. Such models can effectively guide staffing schedules and relocations without the overhead of integrating external data or maintaining complex pipelines. By highlighting the limited utility of external predictors, this study provides practical guidance for EMS organizations seeking efficient, data-driven resource allocation methods.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100426"},"PeriodicalIF":0.0,"publicationDate":"2025-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-17DOI: 10.1016/j.health.2025.100425
Marzieh Amiri Shahbazi , Mohammad Abdullah Al-Mamun , Todd Brothers , Imtiaz Ahmed
Identifying meaningful patient phenotypes is a cornerstone of data-driven healthcare, enabling risk stratification, resource allocation, and the design of personalized care strategies. Achieving this requires robust analytical methods that can uncover hidden structure in high-dimensional clinical data while ensuring stability and interpretability of results. In this study, we present a machine learning framework for phenotypic clustering that combines partition-based (-means) and probabilistic (latent class analysis, LCA) approaches. By comparing subgroup assignments across these complementary methods, the framework provides an internal validation of clustering assignments. Rather than relying on a single method, the framework validates subgroup assignments through cross-method agreement, strengthening confidence in the robustness of the identified phenotypes and their utility for decision support. We apply the proposed framework to patients with chronic kidney disease (CKD) stratified by prior history of acute kidney injury (AKI), illustrating its value in uncovering population-level heterogeneity. While the mechanisms linking AKI to CKD phenotypic patterns remain poorly understood historically, this study investigates CKD trajectories in patients with and without prior AKI and identifies key phenotypic patterns. The analysis revealed consistent phenotypic structures, with over 80% agreement between the two clustering approaches. Distinct phenotypic patterns emerged between the AKI and non-AKI cohorts, with cardiovascular conditions consistently dominating in both groups. These findings demonstrate how stratified clustering can uncover risk signatures that traditional CKD staging systems may overlook. By combining complementary clustering algorithms, the framework strengthens the analytic foundation of phenotyping studies. Moreover, it enables the design of phenotype specific care pathways such as cluster aware monitoring panels and tailored coordination strategies, thus underscoring the broader potential of data-driven analytics to advance personalized medicine and healthcare decision support.
{"title":"A machine learning framework for identifying phenotypes in chronic kidney disease","authors":"Marzieh Amiri Shahbazi , Mohammad Abdullah Al-Mamun , Todd Brothers , Imtiaz Ahmed","doi":"10.1016/j.health.2025.100425","DOIUrl":"10.1016/j.health.2025.100425","url":null,"abstract":"<div><div>Identifying meaningful patient phenotypes is a cornerstone of data-driven healthcare, enabling risk stratification, resource allocation, and the design of personalized care strategies. Achieving this requires robust analytical methods that can uncover hidden structure in high-dimensional clinical data while ensuring stability and interpretability of results. In this study, we present a machine learning framework for phenotypic clustering that combines partition-based (<span><math><mi>k</mi></math></span>-means) and probabilistic (latent class analysis, LCA) approaches. By comparing subgroup assignments across these complementary methods, the framework provides an internal validation of clustering assignments. Rather than relying on a single method, the framework validates subgroup assignments through cross-method agreement, strengthening confidence in the robustness of the identified phenotypes and their utility for decision support. We apply the proposed framework to patients with chronic kidney disease (CKD) stratified by prior history of acute kidney injury (AKI), illustrating its value in uncovering population-level heterogeneity. While the mechanisms linking AKI to CKD phenotypic patterns remain poorly understood historically, this study investigates CKD trajectories in patients with and without prior AKI and identifies key phenotypic patterns. The analysis revealed consistent phenotypic structures, with over 80% agreement between the two clustering approaches. Distinct phenotypic patterns emerged between the AKI and non-AKI cohorts, with cardiovascular conditions consistently dominating in both groups. These findings demonstrate how stratified clustering can uncover risk signatures that traditional CKD staging systems may overlook. By combining complementary clustering algorithms, the framework strengthens the analytic foundation of phenotyping studies. Moreover, it enables the design of phenotype specific care pathways such as cluster aware monitoring panels and tailored coordination strategies, thus underscoring the broader potential of data-driven analytics to advance personalized medicine and healthcare decision support.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100425"},"PeriodicalIF":0.0,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1016/j.health.2025.100422
Gazi Mohammad Imdadul Alam , Tapu Biswas , Sharia Arfin Tanim , M.F. Mridha
Diabetes is a chronic metabolic disorder that heightens the risk of complications for women and presents diagnostic challenges owing to imbalanced datasets and the need for interpretable predictive models. In this study, we propose a 1D Convolutional Neural Network (1D CNN) model that achieves an accuracy of 98.61% on German Patient Dataset, comprising 2,000 samples, and 99.35% on the Bangladeshi Patient Dataset, which includes 465 samples. Our model effectively addresses class imbalance by integrating the Synthetic Minority Over-sampling Technique and Edited Nearest Neighbor (SMOTE-ENN), which significantly enhances performance. Additionally, we conducted a statistical comparison with Multi-Layer Perceptron (MLP), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) models, demonstrating our CNN’s superior accuracy while maintaining reduced complexity and enhanced transparency through the integration of SHapley Additive exPlanations (SHAP). Our SHAP analysis revealed significant variations in feature importance between the two populations, offering culturally relevant insights into the risk factors for diabetes. The SHAP analysis not only facilitates interpretability by allowing healthcare professionals to understand the influence of individual features but also emphasizes the cultural context of diabetes risk. Overall, our findings surpass existing methodologies in terms of accuracy and complexity while underscoring the critical need for demographic diversity in predictive healthcare models, paving the way for more effective diabetes prediction strategies.
{"title":"An explainable analytics framework for predicting diabetes in women using Convolutional Neural Networks","authors":"Gazi Mohammad Imdadul Alam , Tapu Biswas , Sharia Arfin Tanim , M.F. Mridha","doi":"10.1016/j.health.2025.100422","DOIUrl":"10.1016/j.health.2025.100422","url":null,"abstract":"<div><div>Diabetes is a chronic metabolic disorder that heightens the risk of complications for women and presents diagnostic challenges owing to imbalanced datasets and the need for interpretable predictive models. In this study, we propose a 1D Convolutional Neural Network (1D CNN) model that achieves an accuracy of 98.61% on German Patient Dataset, comprising 2,000 samples, and 99.35% on the Bangladeshi Patient Dataset, which includes 465 samples. Our model effectively addresses class imbalance by integrating the Synthetic Minority Over-sampling Technique and Edited Nearest Neighbor (SMOTE-ENN), which significantly enhances performance. Additionally, we conducted a statistical comparison with Multi-Layer Perceptron (MLP), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) models, demonstrating our CNN’s superior accuracy while maintaining reduced complexity and enhanced transparency through the integration of SHapley Additive exPlanations (SHAP). Our SHAP analysis revealed significant variations in feature importance between the two populations, offering culturally relevant insights into the risk factors for diabetes. The SHAP analysis not only facilitates interpretability by allowing healthcare professionals to understand the influence of individual features but also emphasizes the cultural context of diabetes risk. Overall, our findings surpass existing methodologies in terms of accuracy and complexity while underscoring the critical need for demographic diversity in predictive healthcare models, paving the way for more effective diabetes prediction strategies.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100422"},"PeriodicalIF":0.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145320222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liver disease poses a significant global health challenge requiring accurate and timely diagnosis. This research develops a novel deep learning model, named AFLID-Liver, to improve the classification of liver diseases from medical data. The AFLID-Liver model integrates three key techniques: an Attention Mechanism to focus on the most relevant data features, Long Short-Term Memory (LSTM) networks to process potential sequential information, and Focal Loss to effectively handle imbalances between different disease classes in the dataset. This combination enhances the model's ability to learn complex patterns and make robust predictions. We evaluated AFLID-Liver using a dataset of various patient records, including biomarkers and demographics. Our proposed model achieved superior performance, with 99.9 % accuracy, 99.9 % precision, and a 99.9 % F-score, significantly outperforming a baseline Gated Recurrent Unit (GRU) model (99.7 % accuracy, 97.9 % F-score) and existing state-of-the-art approaches. These results demonstrate AFLID-Liver's potential for highly accurate liver disease detection. To validate the generalizability of the proposed model, we performed cross validation using an external dataset which also yielded a good performance depicting the potential of the proposed model. The novelty lies in the synergistic integration of these techniques, offering a robust approach for clinical decision support and improved patient outcomes. Future research will aim to enhance the computational efficiency, paving the way for its adoption in real-time clinical applications.
{"title":"A focal loss and sequential analytics approach for liver disease classification and detection","authors":"Musa Mustapha , Oluwadamilare Harazeem Abdulganiyu , Isah Ndakara Abubakar , Kaloma Usman Majikumna , Garba Suleiman , Mehdi Ech-chariy , Mekila Mbayam Olivier","doi":"10.1016/j.health.2025.100424","DOIUrl":"10.1016/j.health.2025.100424","url":null,"abstract":"<div><div>Liver disease poses a significant global health challenge requiring accurate and timely diagnosis. This research develops a novel deep learning model, named AFLID-Liver, to improve the classification of liver diseases from medical data. The AFLID-Liver model integrates three key techniques: an Attention Mechanism to focus on the most relevant data features, Long Short-Term Memory (LSTM) networks to process potential sequential information, and Focal Loss to effectively handle imbalances between different disease classes in the dataset. This combination enhances the model's ability to learn complex patterns and make robust predictions. We evaluated AFLID-Liver using a dataset of various patient records, including biomarkers and demographics. Our proposed model achieved superior performance, with 99.9 % accuracy, 99.9 % precision, and a 99.9 % F-score, significantly outperforming a baseline Gated Recurrent Unit (GRU) model (99.7 % accuracy, 97.9 % F-score) and existing state-of-the-art approaches. These results demonstrate AFLID-Liver's potential for highly accurate liver disease detection. To validate the generalizability of the proposed model, we performed cross validation using an external dataset which also yielded a good performance depicting the potential of the proposed model. The novelty lies in the synergistic integration of these techniques, offering a robust approach for clinical decision support and improved patient outcomes. Future research will aim to enhance the computational efficiency, paving the way for its adoption in real-time clinical applications.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100424"},"PeriodicalIF":0.0,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-27DOI: 10.1016/j.health.2025.100423
MD Jahin Alam, Md. Kamrul Hasan
Ultrasound shear wave elastography (SWE) is a noninvasive tissue stiffness measurement technique for medical diagnosis. In SWE, an acoustic radiation force creates shear waves (SW) throughout a medium where the shear wave speed (SWS) is related to the medium stiffness. Traditional SWS estimation techniques are not noise-resilient in handling jitter and reflection artifacts. This paper proposes new techniques to estimate SWS in both time and frequency domains. These new methods utilize loss functions which are: (1) optimized by lateral signal shift between known locations, and (2) constrained by neighborhood displacement group shift determined from the time-lateral plane-denoised SW propagation. The proposed constrained optimization is formed by coupling neighboring particles’ losses with a Gaussian kernel, giving an optimum arrival time for the center particle to enforce local stiffness homogeneity and enable noise resilience. The explicit denoising scheme involves isolating SW profiles from time-lateral planes, creating parameterized masks. Additionally, lateral interpolation is performed to enhance reconstruction resolution and thereby improve the reliability of optimization. The proposed scheme is evaluated on a simulation (US-SWS-Digital-Phantoms) and three experimental phantom datasets: (i) Mayo Clinic CIRS049 model, (ii) RSNA-QIBA-US-SWS, (iii) Private data. The constrained optimization performance is compared with three time-of-flight (ToF) and two frequency-domain methods. The evaluations produced visually and quantitatively superior and noise-robust reconstructions compared to classical methods. Due to the quality and minimal error of SWS map formation, the proposed technique can find its application in tissue health inspection and cancer diagnosis.
{"title":"A constrained optimization approach for ultrasound shear wave speed estimation with time-lateral plane cleaning in medical imaging","authors":"MD Jahin Alam, Md. Kamrul Hasan","doi":"10.1016/j.health.2025.100423","DOIUrl":"10.1016/j.health.2025.100423","url":null,"abstract":"<div><div>Ultrasound shear wave elastography (SWE) is a noninvasive tissue stiffness measurement technique for medical diagnosis. In SWE, an acoustic radiation force creates shear waves (SW) throughout a medium where the shear wave speed (SWS) is related to the medium stiffness. Traditional SWS estimation techniques are not noise-resilient in handling jitter and reflection artifacts. This paper proposes new techniques to estimate SWS in both time and frequency domains. These new methods utilize loss functions which are: (1) optimized by lateral signal shift between known locations, and (2) constrained by neighborhood displacement group shift determined from the time-lateral plane-denoised SW propagation. The proposed constrained optimization is formed by coupling neighboring particles’ losses with a Gaussian kernel, giving an optimum arrival time for the center particle to enforce local stiffness homogeneity and enable noise resilience. The explicit denoising scheme involves isolating SW profiles from time-lateral planes, creating parameterized masks. Additionally, lateral interpolation is performed to enhance reconstruction resolution and thereby improve the reliability of optimization. The proposed scheme is evaluated on a simulation (US-SWS-Digital-Phantoms) and three experimental phantom datasets: (i) Mayo Clinic CIRS049 model, (ii) RSNA-QIBA-US-SWS, (iii) Private data. The constrained optimization performance is compared with three time-of-flight (ToF) and two frequency-domain methods. The evaluations produced visually and quantitatively superior and noise-robust reconstructions compared to classical methods. Due to the quality and minimal error of SWS map formation, the proposed technique can find its application in tissue health inspection and cancer diagnosis.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100423"},"PeriodicalIF":0.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}