Pub Date : 2025-09-15DOI: 10.1016/j.health.2025.100419
Aarthi Kannan , Daniel West , Dinesh Kumbhare , Wei-Ting Ting , Md. Younus Ali , Hameem I. Kawsar , Gurmit Singh , Harsha Shanthanna , Eleni Hapidou , Matiar M.R. Howlader
Current clinical methods for chronic pain assessment lack objective, quantitative measures, creating a critical gap in diagnostic accuracy. This review investigates the relationship between chronic pain and key biomarkers detectable in body fluids, such as glutamate, interleukin-6, nitric oxide, and quinolinic acid. We first discuss the biological mechanisms underlying chronic pain and evaluate the relevance of these biomarkers. The review then focuses on recent advancements in non-enzymatic electrochemical biosensors used to monitor these biomarkers. For each sensor, we summarize performance metrics including sensitivity, detection limits, and linear range, while highlighting the analytical methodologies used to establish correlations between biomarker levels and pain intensity. Our findings demonstrate that quantitative analysis of biomarker fluctuations can enhance chronic pain monitoring. The integration of sensor-based biomarker analytics with clinical workflows may offer a path toward personalized treatment plans and improved decision-making in healthcare supply chains. This review emphasizes the need for continued development of high-precision biosensors as analytical tools for translating physiological signals into clinically actionable pain metrics.
{"title":"An analytical review of biosensor-based chronic pain quantification in healthcare","authors":"Aarthi Kannan , Daniel West , Dinesh Kumbhare , Wei-Ting Ting , Md. Younus Ali , Hameem I. Kawsar , Gurmit Singh , Harsha Shanthanna , Eleni Hapidou , Matiar M.R. Howlader","doi":"10.1016/j.health.2025.100419","DOIUrl":"10.1016/j.health.2025.100419","url":null,"abstract":"<div><div>Current clinical methods for chronic pain assessment lack objective, quantitative measures, creating a critical gap in diagnostic accuracy. This review investigates the relationship between chronic pain and key biomarkers detectable in body fluids, such as glutamate, interleukin-6, nitric oxide, and quinolinic acid. We first discuss the biological mechanisms underlying chronic pain and evaluate the relevance of these biomarkers. The review then focuses on recent advancements in non-enzymatic electrochemical biosensors used to monitor these biomarkers. For each sensor, we summarize performance metrics including sensitivity, detection limits, and linear range, while highlighting the analytical methodologies used to establish correlations between biomarker levels and pain intensity. Our findings demonstrate that quantitative analysis of biomarker fluctuations can enhance chronic pain monitoring. The integration of sensor-based biomarker analytics with clinical workflows may offer a path toward personalized treatment plans and improved decision-making in healthcare supply chains. This review emphasizes the need for continued development of high-precision biosensors as analytical tools for translating physiological signals into clinically actionable pain metrics.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100419"},"PeriodicalIF":0.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Psoriasis is a chronic inflammatory skin disease that significantly affects patients’ quality of life (QoL), as measured by the Dermatology Life Quality Index (DLQI). This study employs penalized regression and machine learning (ML) techniques to develop predictive models for DLQI in psoriasis patients. Using a dataset of 149 Thai patients, 16 models including multiple linear regression (MLR), five penalized regression models, five Random Forest (RF) models, and five Support Vector Regression (SVR) models were trained. Feature selection was performed using ridge, LASSO, adaptive LASSO, elastic net, and adaptive elastic net to optimize predictive accuracy and interpretability. Results indicate that RF-L1L2, a Random Forest model trained on elastic net-selected features, achieved the best performance with the lowest Root Mean Square Error (RMSE) of 5.6344, and lowest Mean Absolute Pencentage Error (MAPE) of 35.5404, outperforming traditional regression models. Bland–Altman analysis further confirmed the superiority of RF models in reducing systematic bias and improving predictive agreement. However, our findings should be interpreted with caution due to the limitations of small-sample size modeling. Key features included four psychological stress factors, age, Psoriasis Area and Severity Index (PASI), comorbidities and gender, reinforcing the interplay between physical and mental health. SHapley Additive exPlanations (SHAP) was employed in model explainability. Integrating ML models into clinical decision-making, can enhance patient stratification and personalized treatment strategies, with potential applications in AI-driven healthcare solutions.
{"title":"A penalized regression and machine learning approach for quality-of-life prediction in psoriasis patients","authors":"Teerawat Simmachan , Napatsawan Lerdpraserdpakorn , Jarupa Deesrisuk , Chanadda Sriwipat , Subij Shakya , Pichit Boonkrong","doi":"10.1016/j.health.2025.100417","DOIUrl":"10.1016/j.health.2025.100417","url":null,"abstract":"<div><div>Psoriasis is a chronic inflammatory skin disease that significantly affects patients’ quality of life (QoL), as measured by the Dermatology Life Quality Index (DLQI). This study employs penalized regression and machine learning (ML) techniques to develop predictive models for DLQI in psoriasis patients. Using a dataset of 149 Thai patients, 16 models including multiple linear regression (MLR), five penalized regression models, five Random Forest (RF) models, and five Support Vector Regression (SVR) models were trained. Feature selection was performed using ridge, LASSO, adaptive LASSO, elastic net, and adaptive elastic net to optimize predictive accuracy and interpretability. Results indicate that RF-L1L2, a Random Forest model trained on elastic net-selected features, achieved the best performance with the lowest Root Mean Square Error (RMSE) of 5.6344, and lowest Mean Absolute Pencentage Error (MAPE) of 35.5404, outperforming traditional regression models. Bland–Altman analysis further confirmed the superiority of RF models in reducing systematic bias and improving predictive agreement. However, our findings should be interpreted with caution due to the limitations of small-sample size modeling. Key features included four psychological stress factors, age, Psoriasis Area and Severity Index (PASI), comorbidities and gender, reinforcing the interplay between physical and mental health. SHapley Additive exPlanations (SHAP) was employed in model explainability. Integrating ML models into clinical decision-making, can enhance patient stratification and personalized treatment strategies, with potential applications in AI-driven healthcare solutions.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100417"},"PeriodicalIF":0.0,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145095131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03DOI: 10.1016/j.health.2025.100413
Jiaqi Suo , Claudio Martani , Timothy B. Lescun , Cherri A. Krug
Hospitals face challenges in efficiently adapting treatment delivery to growing and changing demands. The main challenge arises from accommodating diverse patients requiring specific surgical resources and attention. Traditional scheduling methods often fail to address the dynamic nature of these environments, which are characterized by numerous uncertainties and stakeholders’ complex and changing needs. This study presents a novel methodology designed to enhance hospital operational efficiency while considering the interests of all stakeholders, including hospital administrators, medical staff (doctors, nurses, technicians), and patients. This requires a nuanced approach to effectively handle unpredictable treatment demands, resource availability, and patient requirements. The methodology systematically progresses from defining constraints and resources to modeling uncertainties generating and evaluating optimal schedules through iterative processes. This study develops and applies a 12-step method to optimize the surgery scheduling for the farm animal section of the Purdue Veterinary Hospital over a defined period. The application shows the practical benefits of the proposed approach by modeling dynamic surgical demands and exploring various scheduling possibilities within resource constraints. The results reveal that the proposed method effectively accommodates increased operational demands while managing delays, accidents, and illness costs.
{"title":"A scalable methodology for optimizing hospital surgical schedules considering efficiency, flexibility, and improved patient outcomes","authors":"Jiaqi Suo , Claudio Martani , Timothy B. Lescun , Cherri A. Krug","doi":"10.1016/j.health.2025.100413","DOIUrl":"10.1016/j.health.2025.100413","url":null,"abstract":"<div><div>Hospitals face challenges in efficiently adapting treatment delivery to growing and changing demands. The main challenge arises from accommodating diverse patients requiring specific surgical resources and attention. Traditional scheduling methods often fail to address the dynamic nature of these environments, which are characterized by numerous uncertainties and stakeholders’ complex and changing needs. This study presents a novel methodology designed to enhance hospital operational efficiency while considering the interests of all stakeholders, including hospital administrators, medical staff (doctors, nurses, technicians), and patients. This requires a nuanced approach to effectively handle unpredictable treatment demands, resource availability, and patient requirements. The methodology systematically progresses from defining constraints and resources to modeling uncertainties generating and evaluating optimal schedules through iterative processes. This study develops and applies a 12-step method to optimize the surgery scheduling for the farm animal section of the Purdue Veterinary Hospital over a defined period. The application shows the practical benefits of the proposed approach by modeling dynamic surgical demands and exploring various scheduling possibilities within resource constraints. The results reveal that the proposed method effectively accommodates increased operational demands while managing delays, accidents, and illness costs.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100413"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145047823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01DOI: 10.1016/j.health.2025.100415
Yeneneh Tamirat Negash , Faradilah Hanum
Digital healthcare relies on accurate, connected data to deliver safe and efficient patient care. Yet, fragmented management systems create data silos, limit interoperability, and delay clinical and administrative decisions. These conditions impede the promise of personalized, coordinated, and efficient care. Smart Product Service Systems (Smart PSS) integrate intelligent products, digital platforms, and value-added services, thereby providing a pathway to enhanced data management and improved patient care. Prior studies seldom identify or link the specific Smart PSS attributes that shape healthcare data management and organizational performance, particularly from a causal perspective. This study fills that gap by developing an analytical framework for improving healthcare data management and organizational performance. A literature review produced 47 candidate attributes. Thirty-three healthcare experts validated 27 attributes through the Fuzzy Delphi Method. Fuzzy Decision-Making Trial and Evaluation Laboratory then mapped the causal structure among the validated attributes and their associated aspects. Intelligent products, stakeholder collaboration, and service realization emerged as core causal aspects that influence data management and organizational performance. Smart repair, monitoring and early warning, synchronized transactions, information integration, data quality, and organizational readiness ranked as the most influential criteria for practice. By prioritizing these criteria, healthcare managers reduce data fragmentation and improve service outcomes. The study provides a hierarchical Smart PSS framework and managerial guidance for institutions advancing digital healthcare.
{"title":"An analytical framework for improving healthcare data management and organizational performance","authors":"Yeneneh Tamirat Negash , Faradilah Hanum","doi":"10.1016/j.health.2025.100415","DOIUrl":"10.1016/j.health.2025.100415","url":null,"abstract":"<div><div>Digital healthcare relies on accurate, connected data to deliver safe and efficient patient care. Yet, fragmented management systems create data silos, limit interoperability, and delay clinical and administrative decisions. These conditions impede the promise of personalized, coordinated, and efficient care. Smart Product Service Systems (Smart PSS) integrate intelligent products, digital platforms, and value-added services, thereby providing a pathway to enhanced data management and improved patient care. Prior studies seldom identify or link the specific Smart PSS attributes that shape healthcare data management and organizational performance, particularly from a causal perspective. This study fills that gap by developing an analytical framework for improving healthcare data management and organizational performance. A literature review produced 47 candidate attributes. Thirty-three healthcare experts validated 27 attributes through the Fuzzy Delphi Method. Fuzzy Decision-Making Trial and Evaluation Laboratory then mapped the causal structure among the validated attributes and their associated aspects. Intelligent products, stakeholder collaboration, and service realization emerged as core causal aspects that influence data management and organizational performance. Smart repair, monitoring and early warning, synchronized transactions, information integration, data quality, and organizational readiness ranked as the most influential criteria for practice. By prioritizing these criteria, healthcare managers reduce data fragmentation and improve service outcomes. The study provides a hierarchical Smart PSS framework and managerial guidance for institutions advancing digital healthcare.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100415"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145010070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-30DOI: 10.1016/j.health.2025.100414
Ahed Abugabah
Breast cancer is the most commonly diagnosed cancer among women worldwide, accounting for a significant proportion of new cases. Deep learning (DL) has emerged as a powerful tool for the detection and diagnosis of breast cancer, particularly through the analysis of histological images, a critical component of automated diagnostic systems that directly impact patient management. The BreakHis dataset and the Wisconsin Breast Cancer Database (WBCD) are widely used publicly available resources for deep learning–based analyses of breast cancer histological images in cross-disciplinary healthcare research. A computer-assisted approach employs colour normalisation to reduce the effects of the differences in the distribution of breast histopathology images. In this paper, breast tumour areas of interest are segmented utilising Attention-Guided Deep Atrous-Residual U-Net at the segmentation stage. Subsequently, patches are processed to form feature vectors VGG19 and ResNet50 for the extraction of deep features from the patches. Also, to fine-tune these models even further, the breast cancer datasets are employed, and Levy Flight-based Red Fox Optimisation is used to extract features from the pre-trained models without further training. The Efficient Capsule Network is used to improve the feature representation and classification capabilities. AGDATUNet-LFRFO-ECN, which was suggested in the study, performed better than other models when tested on the WBCD dataset, with a sensitivity of 99.17 %, specificity of 99.08 %, and accuracy of 99.23 %. What's more, the AGDATUNet-LFRFO-ECN outperformed the available models on BreakHis with a sensitivity of 99.81 %, a specificity of 99.79 %, and an accuracy of 99.82 %, which are the state-of-the-art.
乳腺癌是全世界妇女中最常见的癌症,占新病例的很大比例。深度学习(DL)已经成为乳腺癌检测和诊断的强大工具,特别是通过对组织学图像的分析,这是直接影响患者管理的自动化诊断系统的关键组成部分。BreakHis数据集和威斯康星乳腺癌数据库(WBCD)是广泛使用的公共资源,用于跨学科医疗保健研究中基于深度学习的乳腺癌组织学图像分析。计算机辅助方法采用颜色归一化来减少乳腺组织病理学图像分布差异的影响。在本文中,在分割阶段利用注意力引导的深度阿鲁斯-残余U-Net对感兴趣的乳腺肿瘤区域进行分割。然后对patch进行处理,形成特征向量VGG19和ResNet50,从patch中提取深度特征。此外,为了进一步微调这些模型,我们使用了乳腺癌数据集,并使用Levy Flight-based Red Fox Optimisation从预先训练的模型中提取特征,而无需进一步训练。高效胶囊网络用于提高特征表示和分类能力。研究中提出的AGDATUNet-LFRFO-ECN模型在WBCD数据集上的测试结果优于其他模型,灵敏度为99.17%,特异性为99.08%,准确率为99.23%。此外,AGDATUNet-LFRFO-ECN的灵敏度为99.81%,特异性为99.79%,准确率为99.82%,优于BreakHis上现有的模型,达到了最先进的水平。
{"title":"A deep learning framework for automated breast cancer diagnosis using intelligent segmentation and classification","authors":"Ahed Abugabah","doi":"10.1016/j.health.2025.100414","DOIUrl":"10.1016/j.health.2025.100414","url":null,"abstract":"<div><div>Breast cancer is the most commonly diagnosed cancer among women worldwide, accounting for a significant proportion of new cases. Deep learning (DL) has emerged as a powerful tool for the detection and diagnosis of breast cancer, particularly through the analysis of histological images, a critical component of automated diagnostic systems that directly impact patient management. The BreakHis dataset and the Wisconsin Breast Cancer Database (WBCD) are widely used publicly available resources for deep learning–based analyses of breast cancer histological images in cross-disciplinary healthcare research. A computer-assisted approach employs colour normalisation to reduce the effects of the differences in the distribution of breast histopathology images. In this paper, breast tumour areas of interest are segmented utilising Attention-Guided Deep Atrous-Residual U-Net at the segmentation stage. Subsequently, patches are processed to form feature vectors VGG19 and ResNet50 for the extraction of deep features from the patches. Also, to fine-tune these models even further, the breast cancer datasets are employed, and Levy Flight-based Red Fox Optimisation is used to extract features from the pre-trained models without further training. The Efficient Capsule Network is used to improve the feature representation and classification capabilities. AGDATUNet-LFRFO-ECN, which was suggested in the study, performed better than other models when tested on the WBCD dataset, with a sensitivity of 99.17 %, specificity of 99.08 %, and accuracy of 99.23 %. What's more, the AGDATUNet-LFRFO-ECN outperformed the available models on BreakHis with a sensitivity of 99.81 %, a specificity of 99.79 %, and an accuracy of 99.82 %, which are the state-of-the-art.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100414"},"PeriodicalIF":0.0,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145010072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Obesity is a growing global health crisis, and traditional regression models often fail to capture the complex relationships between risk factors, limiting predictive accuracy and hindering effective public health interventions. Conventional methods overlook non-linear associations and interaction effects across demographic, socioeconomic, and behavioral predictors, which are particularly important in diverse populations with varying obesity determinants. To address these limitations, we applied Generalized Additive Models for Location, Scale, and Shape (GAMLSS) to analyze obesity predictors in a nationally representative adolescent sample (N 671). Our framework included comprehensive variable selection across demographic, socioeconomic, behavioral, and clinical domains, comparison with three alternative regression models, and validation using the Generalized Akaike Information Criterion (GAIC). The binomial stepwise GAMLSS model demonstrated superior performance (GAIC 624.98). Key findings included strong geographic variation, significant gender disparity, a socioeconomic gradient, and important behavioral predictors such as weight gain attempts. The GAMLSS framework improves obesity risk prediction by modeling complex relationships often missed by traditional methods, offering targeted intervention strategies based on geographic, gender, and socioeconomic factors, and challenging assumptions about dietary influences.
{"title":"A comparative analysis of generalized additive models for obesity risk prediction","authors":"Olushina Olawale Awe , Olawale Abiodun Olaniyan , Ayorinde Emmanuel Olatunde , Ronel SewPaul , Natisha Dukhi","doi":"10.1016/j.health.2025.100410","DOIUrl":"10.1016/j.health.2025.100410","url":null,"abstract":"<div><div>Obesity is a growing global health crisis, and traditional regression models often fail to capture the complex relationships between risk factors, limiting predictive accuracy and hindering effective public health interventions. Conventional methods overlook non-linear associations and interaction effects across demographic, socioeconomic, and behavioral predictors, which are particularly important in diverse populations with varying obesity determinants. To address these limitations, we applied Generalized Additive Models for Location, Scale, and Shape (GAMLSS) to analyze obesity predictors in a nationally representative adolescent sample (N <span><math><mo>=</mo></math></span> 671). Our framework included comprehensive variable selection across demographic, socioeconomic, behavioral, and clinical domains, comparison with three alternative regression models, and validation using the Generalized Akaike Information Criterion (GAIC). The binomial stepwise GAMLSS model demonstrated superior performance (GAIC <span><math><mo>=</mo></math></span> 624.98). Key findings included strong geographic variation, significant gender disparity, a socioeconomic gradient, and important behavioral predictors such as weight gain attempts. The GAMLSS framework improves obesity risk prediction by modeling complex relationships often missed by traditional methods, offering targeted intervention strategies based on geographic, gender, and socioeconomic factors, and challenging assumptions about dietary influences.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100410"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-20DOI: 10.1016/j.health.2025.100412
Behnaz Motamedi, Balázs Villányi
This study posits that a structured preprocessing and feature selection methodology might substantially improve the classification accuracy and generalizability of machine learning (ML) models in predicting stages of hepatitis C virus (HCV) using clinical and demographic data. The HCV is a chronic liver ailment characterized by many phases, necessitating precise and prompt categorization for optimal therapy. Although ML presents opportunities for stage prediction, issues such as class imbalance, missing data, and feature redundancy limit model efficacy and generalizability. To test this theory, we established an extensive four-phase preparation pipeline: Baseline imputes missing values using class-specific means; Refine mitigates outliers through class-specific medians and normalization; Balanced addresses class imbalance across five stages employing localized random affine shadow-sampling; and Augmented incorporates a clustering-based feature derived from an ensemble of K-means and Gaussian mixture models, combined with principal component analysis. The prediction model was developed by optimizing feature selection with the ReliefF approach and a random forest classifier employing random search. The resultant model exhibited outstanding performance, attaining an accuracy of 0.9983, precision of 0.9984, recall of 0.9983, F1-score of 0.9984, and Matthews correlation coefficient (MCC) of 0.9979 on the training set. It achieved an accuracy of 0.9977, precision of 0.9976, recall of 0.9981, F1-score of 0.9978, and MCC of 0.9973 on the independent test. The ensemble clustering component demonstrated reasonable validity, shown by an adjusted Rand index of 1.0, a moderate silhouette coefficient of 0.4702, and a Davies–Bouldin score of 1.1745, modestly outperforming individual clustering methods. The findings support the hypothesis and demonstrate that thorough preprocessing, stringent feature selection, and model optimization provide a highly accurate and generalizable tool for predicting HCV stages, hence improving clinical diagnosis and treatment strategies.
{"title":"A comprehensive diagnostic framework for hepatitis C using structured data and predictive analytics","authors":"Behnaz Motamedi, Balázs Villányi","doi":"10.1016/j.health.2025.100412","DOIUrl":"10.1016/j.health.2025.100412","url":null,"abstract":"<div><div>This study posits that a structured preprocessing and feature selection methodology might substantially improve the classification accuracy and generalizability of machine learning (ML) models in predicting stages of hepatitis C virus (HCV) using clinical and demographic data. The HCV is a chronic liver ailment characterized by many phases, necessitating precise and prompt categorization for optimal therapy. Although ML presents opportunities for stage prediction, issues such as class imbalance, missing data, and feature redundancy limit model efficacy and generalizability. To test this theory, we established an extensive four-phase preparation pipeline: Baseline imputes missing values using class-specific means; Refine mitigates outliers through class-specific medians and normalization; Balanced addresses class imbalance across five stages employing localized random affine shadow-sampling; and Augmented incorporates a clustering-based feature derived from an ensemble of K-means and Gaussian mixture models, combined with principal component analysis. The prediction model was developed by optimizing feature selection with the ReliefF approach and a random forest classifier employing random search. The resultant model exhibited outstanding performance, attaining an accuracy of 0.9983, precision of 0.9984, recall of 0.9983, F1-score of 0.9984, and Matthews correlation coefficient (MCC) of 0.9979 on the training set. It achieved an accuracy of 0.9977, precision of 0.9976, recall of 0.9981, F1-score of 0.9978, and MCC of 0.9973 on the independent test. The ensemble clustering component demonstrated reasonable validity, shown by an adjusted Rand index of 1.0, a moderate silhouette coefficient of 0.4702, and a Davies–Bouldin score of 1.1745, modestly outperforming individual clustering methods. The findings support the hypothesis and demonstrate that thorough preprocessing, stringent feature selection, and model optimization provide a highly accurate and generalizable tool for predicting HCV stages, hence improving clinical diagnosis and treatment strategies.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100412"},"PeriodicalIF":0.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144879452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-19DOI: 10.1016/j.health.2025.100411
Yead Rahman , Prerna Dua
Medicaid data, with its vast scale and heterogeneity, presents significant challenges in predictive modeling and healthcare analytics. This study analyzes over 6.3 million records from the Louisiana Department of Health (LDH) to identify the most effective machine learning models for predicting clinical service utilization, COVID-19 infections, and tobacco use. A rigorous preprocessing pipeline ensured data integrity, while exploratory data analysis (EDA) guided feature selection, ultimately retaining 20 key variables to capture complex interactions. Seven supervised models, i.e., logistic regression, extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), random forest, decision tree, artificial neural networks (ANN), and naïve bayes, were evaluated based on predictive performance, computational efficiency, and feature importance. While ensemble methods such as XGBoost and random forest achieved superior accuracy, their high computational demands highlight the trade-off between performance and efficiency in large-scale healthcare analytics. Simpler models like naïve bayes and decision trees were computationally efficient but less accurate. Key predictors included hospital stay duration for healthcare service utilization, tobacco use for COVID-19 risk, and chronic obstructive pulmonary disease (COPD) for tobacco use. These findings emphasize the impact of comorbidities and demographics on healthcare utilization, offering data-driven insights for healthcare practitioners and policymakers to enhance patient care, optimize costs, and refine policy decisions.
{"title":"A machine learning framework for predicting healthcare utilization and risk factors","authors":"Yead Rahman , Prerna Dua","doi":"10.1016/j.health.2025.100411","DOIUrl":"10.1016/j.health.2025.100411","url":null,"abstract":"<div><div>Medicaid data, with its vast scale and heterogeneity, presents significant challenges in predictive modeling and healthcare analytics. This study analyzes over 6.3 million records from the Louisiana Department of Health (LDH) to identify the most effective machine learning models for predicting clinical service utilization, COVID-19 infections, and tobacco use. A rigorous preprocessing pipeline ensured data integrity, while exploratory data analysis (EDA) guided feature selection, ultimately retaining 20 key variables to capture complex interactions. Seven supervised models, i.e., logistic regression, extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), random forest, decision tree, artificial neural networks (ANN), and naïve bayes, were evaluated based on predictive performance, computational efficiency, and feature importance. While ensemble methods such as XGBoost and random forest achieved superior accuracy, their high computational demands highlight the trade-off between performance and efficiency in large-scale healthcare analytics. Simpler models like naïve bayes and decision trees were computationally efficient but less accurate. Key predictors included hospital stay duration for healthcare service utilization, tobacco use for COVID-19 risk, and chronic obstructive pulmonary disease (COPD) for tobacco use. These findings emphasize the impact of comorbidities and demographics on healthcare utilization, offering data-driven insights for healthcare practitioners and policymakers to enhance patient care, optimize costs, and refine policy decisions.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100411"},"PeriodicalIF":0.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144885885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11DOI: 10.1016/j.health.2025.100409
Deblina Mazumder Setu
The efficient and early detection of Autism Spectrum Disorder (ASD) is a critical objective in improving diagnosis and intervention outcomes. Various methods based on functional Magnetic Resonance Imaging (fMRI) and questionnaires have been explored, among which eye tracking is a promising approach. However, existing methods relying on eye tracking often restrict us to controlled environments, making things complicated and expensive. This study eliminates the requirement for specific parameters by concentrating just on eye movement data for ASD detection, therefore introducing a novel and user-friendly technique. Feature engineering is employed, encompassing preprocessing and extracting relevant gaze movement data. These properties are utilized in machine learning and deep learning model training with hyperparameter adjusting for optimization. Using the Saliency4ASD dataset and looking beyond its usual gaze focus, this study built a model that uses eye movement alone to identify ASD with about 81% accuracy. This safe, low-cost approach has the potential to provide simple technologies that enable early detection of ASD, hence allowing its accessibility to everyone.
{"title":"An analytics-driven model for identifying autism spectrum disorder using eye tracking","authors":"Deblina Mazumder Setu","doi":"10.1016/j.health.2025.100409","DOIUrl":"10.1016/j.health.2025.100409","url":null,"abstract":"<div><div>The efficient and early detection of Autism Spectrum Disorder (ASD) is a critical objective in improving diagnosis and intervention outcomes. Various methods based on functional Magnetic Resonance Imaging (fMRI) and questionnaires have been explored, among which eye tracking is a promising approach. However, existing methods relying on eye tracking often restrict us to controlled environments, making things complicated and expensive. This study eliminates the requirement for specific parameters by concentrating just on eye movement data for ASD detection, therefore introducing a novel and user-friendly technique. Feature engineering is employed, encompassing preprocessing and extracting relevant gaze movement data. These properties are utilized in machine learning and deep learning model training with hyperparameter adjusting for optimization. Using the Saliency4ASD dataset and looking beyond its usual gaze focus, this study built a model that uses eye movement alone to identify ASD with about 81% accuracy. This safe, low-cost approach has the potential to provide simple technologies that enable early detection of ASD, hence allowing its accessibility to everyone.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100409"},"PeriodicalIF":0.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144827735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-04DOI: 10.1016/j.health.2025.100408
Shagufta Henna , Juan Miguel Lopez Alcaraz , Upaka Rathnayake , Mohamed Amjath
Convolutional Neural Networks (CNNs) are widely utilized for their robust feature extraction capabilities, particularly in medical classification tasks. However, their opaque decision-making process presents challenges in clinical settings, where interpretability and trust are paramount. This study investigates the explainability of a custom CNN model developed for Covid-19 and non-Covid-19 classification using dry cough spectrograms, with a focus on interpreting filter-level representations and decision pathways. To improve model transparency, we apply a suite of explainable artificial intelligence (XAI) techniques, including feature visualizations, SmoothGrad, Grad-CAM, and LIME, which explain the relevance of spectro-temporal features in the classification process. Furthermore, we conduct a comparative analysis with a pre-trained MobileNetV2 model using Guided Grad-CAM and Integrated Gradients. The results indicate that while MobileNetV2 yields some degree of visual attribution, its explanations, particularly for Covid-19 predictions are diffuse and inconsistent, limiting their interpretability. In contrast, the custom CNN model exhibits more coherent and class-specific activation patterns, offering improved localization of diagnostically relevant features.
{"title":"An interpretable deep learning framework for medical diagnosis using spectrogram analysis","authors":"Shagufta Henna , Juan Miguel Lopez Alcaraz , Upaka Rathnayake , Mohamed Amjath","doi":"10.1016/j.health.2025.100408","DOIUrl":"10.1016/j.health.2025.100408","url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) are widely utilized for their robust feature extraction capabilities, particularly in medical classification tasks. However, their opaque decision-making process presents challenges in clinical settings, where interpretability and trust are paramount. This study investigates the explainability of a custom CNN model developed for Covid-19 and non-Covid-19 classification using dry cough spectrograms, with a focus on interpreting filter-level representations and decision pathways. To improve model transparency, we apply a suite of explainable artificial intelligence (XAI) techniques, including feature visualizations, SmoothGrad, Grad-CAM, and LIME, which explain the relevance of spectro-temporal features in the classification process. Furthermore, we conduct a comparative analysis with a pre-trained MobileNetV2 model using Guided Grad-CAM and Integrated Gradients. The results indicate that while MobileNetV2 yields some degree of visual attribution, its explanations, particularly for Covid-19 predictions are diffuse and inconsistent, limiting their interpretability. In contrast, the custom CNN model exhibits more coherent and class-specific activation patterns, offering improved localization of diagnostically relevant features.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"8 ","pages":"Article 100408"},"PeriodicalIF":0.0,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}