Antimicrobial resistance (AMR) represents a major public health challenge, significantly complicating infection prevention and treatment. This study employs machine learning and neural network techniques to classify multidrug-resistant Gram-negative bacterial (MDR-GNB) infections using electronic health records from 624 patients at Thatphanom Crown Prince Hospital in Thailand. We compared several algorithms, including Logistic Regression, Random Forest, Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), and Light Gradient Boosting Machine (LightGBM), with the MLP model exhibiting the highest accuracy and specificity. Performance was further enhanced by integrating feature selection methods such as Sequential Forward Selection (SFS), Recursive Feature Elimination with Cross-Validation (RFE-CV), and SelectKBest with data augmentation techniques, including ADASYN and SMOTE variants. Utilizing SHapley Additive exPlanations (SHAP) provided valuable insights into the most influential predictors for MDR-GNB. Notably, the MLP model achieved an AUC of 0.70, surpassing prior studies and highlighting its potential to advance clinical decision-making in managing MDR-GNB infections.
{"title":"A machine learning and neural network approach for classifying multidrug-resistant bacterial infections","authors":"Preeda Mengsiri , Ratchadaporn Ungcharoen , Sethavidh Gertphol","doi":"10.1016/j.health.2025.100388","DOIUrl":"10.1016/j.health.2025.100388","url":null,"abstract":"<div><div>Antimicrobial resistance (AMR) represents a major public health challenge, significantly complicating infection prevention and treatment. This study employs machine learning and neural network techniques to classify multidrug-resistant Gram-negative bacterial (MDR-GNB) infections using electronic health records from 624 patients at Thatphanom Crown Prince Hospital in Thailand. We compared several algorithms, including Logistic Regression, Random Forest, Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), and Light Gradient Boosting Machine (LightGBM), with the MLP model exhibiting the highest accuracy and specificity. Performance was further enhanced by integrating feature selection methods such as Sequential Forward Selection (SFS), Recursive Feature Elimination with Cross-Validation (RFE-CV), and SelectKBest with data augmentation techniques, including ADASYN and SMOTE variants. Utilizing SHapley Additive exPlanations (SHAP) provided valuable insights into the most influential predictors for MDR-GNB. Notably, the MLP model achieved an AUC of 0.70, surpassing prior studies and highlighting its potential to advance clinical decision-making in managing MDR-GNB infections.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100388"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-05-29DOI: 10.1016/j.health.2025.100399
Santanu Roy , Reshma Rachel Cherish , Gifty Roy
Diabetes is a chronic disease due to higher blood sugar (or Glucose) levels in the blood. This study proposes a novel attention-based loss function and a lightweight artificial neural network (ANN) called Diabetic Lite (DB-Lite) for diabetes prediction in the Pima Indian Diabetes Dataset (PIDD). We show that the Pima dataset has many challenges. It is a small and imbalanced dataset; moreover, many features are non-linearly correlated in this dataset. The novelties of this research work are as follows: (i) A novel loss function of attention-based binary cross entropy (ABCE) is proposed for the first time to alleviate the statistical imbalance present within the Pima dataset. This ABCE loss function is incorporated in the DB-Lite model, which is trained from scratch. (ii) A Swish activation function is deployed in the hidden layer of DB-Lite instead of Rectified Linear Unit (ReLU) to deal with the non-linear dependency of features with the final outcome. (iii) The synthetic minority oversampling technique (SMOTE) is used as a pre-processing technique to mitigate the class imbalance problem from the Pima dataset. (iv) An adaptive learning rate is utilized while training the model to speed up the convergence of the DB-Lite model. Our final proposed framework has achieved 99.7% accuracy, 99.4% precision, 99.8% recall, and 99.6% F1 score in testing, which is the best result on this Pima dataset. The Welch t-testing (as a statistical hypothesis testing) and 10-fold cross-validation are utilized to prove the validity of the proposed loss function.
糖尿病是一种由于血液中高血糖(或葡萄糖)水平引起的慢性疾病。本研究提出了一种新的基于注意力的损失函数和一种称为diabetes Lite (DB-Lite)的轻量级人工神经网络(ANN),用于皮马印第安人糖尿病数据集(PIDD)的糖尿病预测。我们表明,Pima数据集存在许多挑战。这是一个小而不平衡的数据集;此外,该数据集中的许多特征是非线性相关的。本研究的新颖之处在于:(1)首次提出了一种新的基于注意力的二元交叉熵(ABCE)损失函数,以缓解Pima数据集中存在的统计不平衡。这个ABCE损失函数被纳入DB-Lite模型中,该模型是从头开始训练的。(ii)在DB-Lite的隐藏层部署Swish激活函数,而不是ReLU (Rectified Linear Unit),以处理特征与最终结果的非线性依赖关系。(iii)采用合成少数派过采样技术(SMOTE)作为预处理技术,缓解了Pima数据集的类不平衡问题。(iv)在训练模型的同时,利用自适应学习率加快DB-Lite模型的收敛速度。我们最终提出的框架在测试中达到了99.7%的准确率,99.4%的精密度,99.8%的召回率和99.6%的F1分数,这是该Pima数据集上的最佳结果。使用Welch t检验(作为统计假设检验)和10倍交叉验证来证明所提出的损失函数的有效性。
{"title":"An attention-based loss function and synthetic minority oversampling technique for alleviating class imbalance in predicting diabetes","authors":"Santanu Roy , Reshma Rachel Cherish , Gifty Roy","doi":"10.1016/j.health.2025.100399","DOIUrl":"10.1016/j.health.2025.100399","url":null,"abstract":"<div><div>Diabetes is a chronic disease due to higher blood sugar (or Glucose) levels in the blood. This study proposes a novel attention-based loss function and a lightweight artificial neural network (ANN) called Diabetic Lite (DB-Lite) for diabetes prediction in the Pima Indian Diabetes Dataset (PIDD). We show that the Pima dataset has many challenges. It is a small and imbalanced dataset; moreover, many features are non-linearly correlated in this dataset. The novelties of this research work are as follows: (i) A novel loss function of attention-based binary cross entropy (ABCE) is proposed for the first time to alleviate the statistical imbalance present within the Pima dataset. This ABCE loss function is incorporated in the DB-Lite model, which is trained from scratch. (ii) A Swish activation function is deployed in the hidden layer of DB-Lite instead of Rectified Linear Unit (ReLU) to deal with the non-linear dependency of features with the final outcome. (iii) The synthetic minority oversampling technique (SMOTE) is used as a pre-processing technique to mitigate the class imbalance problem from the Pima dataset. (iv) An adaptive learning rate is utilized while training the model to speed up the convergence of the DB-Lite model. Our final proposed framework has achieved 99.7% accuracy, 99.4% precision, 99.8% recall, and 99.6% F1 score in testing, which is the best result on this Pima dataset. The Welch t-testing (as a statistical hypothesis testing) and 10-fold cross-validation are utilized to prove the validity of the proposed loss function.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100399"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144185724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-05-03DOI: 10.1016/j.health.2025.100397
Junwu Dong , Minyi Chu , Yirou Xu
Ensuring effective medication adherence is vital for managing chronic diseases, yet global patient adherence remains suboptimal. This study aims to develop a predictive model for medication adherence behaviour (MAB) employing machine learning techniques, addressing the limitations of traditional correlation-based approaches. Based on the Meta-Theoretic Model of Motivation and Personality (3M Model), data from 428 chronic disease patients, included dark triad traits (narcissism, Machiavellianism, psychopathy), general self-efficacy, doctor-patient trust, and demographic variables. Five machine learning algorithms – multiple logistic regression, decision tree, adaptive boosting, random forest and support vector machine (SVM) – were utilized to identify MAB levels and assess feature importance. Among these, the random forest model achieved the highest performance, with an accuracy of 0.637, recall of 0.538, precision of 0.556, and an F1 score of 0.544. Feature ranking revealed that narcissism, Machiavellianism, doctor-patient trust, psychopathy, and general self-efficacy were the most influential predictors. These findings demonstrate that integrating psychological and demographic factors into machine learning models can enhance the prediction of medication adherence. This study presents a novel interdisciplinary framework that integrates behavioural health analytics and data science to inform clinical decision-making. It provides valuable insights into the severity and temporal progression of medication adherence behaviours, offering clinicians a practical reference for developing more effective intervention strategies.
{"title":"A predictive healthcare model using machine learning and psychological factors for medication adherence","authors":"Junwu Dong , Minyi Chu , Yirou Xu","doi":"10.1016/j.health.2025.100397","DOIUrl":"10.1016/j.health.2025.100397","url":null,"abstract":"<div><div>Ensuring effective medication adherence is vital for managing chronic diseases, yet global patient adherence remains suboptimal. This study aims to develop a predictive model for medication adherence behaviour (MAB) employing machine learning techniques, addressing the limitations of traditional correlation-based approaches. Based on the Meta-Theoretic Model of Motivation and Personality (3M Model), data from 428 chronic disease patients, included dark triad traits (narcissism, Machiavellianism, psychopathy), general self-efficacy, doctor-patient trust, and demographic variables. Five machine learning algorithms – multiple logistic regression, decision tree, adaptive boosting, random forest and support vector machine (SVM) – were utilized to identify MAB levels and assess feature importance. Among these, the random forest model achieved the highest performance, with an accuracy of 0.637, recall of 0.538, precision of 0.556, and an F1 score of 0.544. Feature ranking revealed that narcissism, Machiavellianism, doctor-patient trust, psychopathy, and general self-efficacy were the most influential predictors. These findings demonstrate that integrating psychological and demographic factors into machine learning models can enhance the prediction of medication adherence. This study presents a novel interdisciplinary framework that integrates behavioural health analytics and data science to inform clinical decision-making. It provides valuable insights into the severity and temporal progression of medication adherence behaviours, offering clinicians a practical reference for developing more effective intervention strategies.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100397"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-01-03DOI: 10.1016/j.health.2025.100382
Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju
Streptococcus pneumoniae, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.
{"title":"A large-scale risk assessment and classification model for pneumococcus using Finnish national health data","authors":"Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju","doi":"10.1016/j.health.2025.100382","DOIUrl":"10.1016/j.health.2025.100382","url":null,"abstract":"<div><div><em>Streptococcus pneumoniae</em>, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100382"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-05-14DOI: 10.1016/j.health.2025.100396
Sajal Chakroborty
Infectious diseases pose significant global threats to public health and economic stability by causing pandemics. Early detection of infectious diseases is crucial to prevent global outbreaks. Mpox, a contagious viral disease first detected in humans in 1970, has experienced multiple epidemics in recent decades, emphasizing the development of tools for its early detection. In this paper, we propose a hybrid deep learning framework for Mpox detection. This framework allows us to construct hybrid deep learning models combining deep learning architectures as a feature extraction tool with machine learning classifiers and perform a comprehensive analysis of Mpox detection from image data. Our best-performing model consists of MobileNetV2 with LightGBM classifier, which achieves an accuracy of 91.49%, precision of 86.96%, weighted precision of 91.87%, recall of 95.24%, weighted recall of 91.49%, F1 score of 90.91%, weighted F1-score of 91.51% and Matthews Correlation Coefficient score of 0.83.
{"title":"A hybrid deep learning framework for early detection of Mpox using image data","authors":"Sajal Chakroborty","doi":"10.1016/j.health.2025.100396","DOIUrl":"10.1016/j.health.2025.100396","url":null,"abstract":"<div><div>Infectious diseases pose significant global threats to public health and economic stability by causing pandemics. Early detection of infectious diseases is crucial to prevent global outbreaks. Mpox, a contagious viral disease first detected in humans in 1970, has experienced multiple epidemics in recent decades, emphasizing the development of tools for its early detection. In this paper, we propose a hybrid deep learning framework for Mpox detection. This framework allows us to construct hybrid deep learning models combining deep learning architectures as a feature extraction tool with machine learning classifiers and perform a comprehensive analysis of Mpox detection from image data. Our best-performing model consists of MobileNetV2 with LightGBM classifier, which achieves an accuracy of 91.49%, precision of 86.96%, weighted precision of 91.87%, recall of 95.24%, weighted recall of 91.49%, F1 score of 90.91%, weighted F1-score of 91.51% and Matthews Correlation Coefficient score of 0.83.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100396"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number . This implies that equilibrium is stable when is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.
{"title":"An optimal control model with sensitivity analysis for COVID-19 transmission using logistic recruitment rate","authors":"Jonner Nainggolan , Moch. Fandi Ansori , Hengki Tasman","doi":"10.1016/j.health.2024.100375","DOIUrl":"10.1016/j.health.2024.100375","url":null,"abstract":"<div><div>This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>. This implies that equilibrium is stable when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100375"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-01-06DOI: 10.1016/j.health.2024.100379
Nawshin Haque, Tania Islam, Md Erfan
Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100 for Support Vector Classifier and 99.80 for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100 for Support Vector Classifier and 99.96 for Logistic Regression. Furthermore, all algorithms achieved 100 accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100 accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.
{"title":"An exploration of machine learning approaches for early Autism Spectrum Disorder detection","authors":"Nawshin Haque, Tania Islam, Md Erfan","doi":"10.1016/j.health.2024.100379","DOIUrl":"10.1016/j.health.2024.100379","url":null,"abstract":"<div><div>Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.80<span><math><mtext>%</mtext></math></span> for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.96<span><math><mtext>%</mtext></math></span> for Logistic Regression. Furthermore, all algorithms achieved 100<span><math><mtext>%</mtext></math></span> accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100<span><math><mtext>%</mtext></math></span> accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100379"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-04-09DOI: 10.1016/j.health.2025.100391
Raquel Ochoa-Ornelas , Alberto Gudiño-Ochoa , Julio Alberto García-Rodríguez , Sofia Uribe-Toscano
Lung and colon cancers are among the deadliest diseases worldwide, necessitating early and accurate detection to improve patient outcomes. This study utilizes the EfficientNetB3 model, a state-of-the-art transfer learning approach, to enhance the detection of colon and lung cancers from histopathological images. The research leverages the LC25000 dataset, comprising 25,000 histopathological images evenly distributed across five classes: colon adenocarcinoma, benign colon tissue, lung adenocarcinoma, lung squamous cell carcinoma, and benign lung tissue. The EfficientNetB3 model initially achieved an impressive accuracy of 99.39% across all classes. To further validate and enhance the model’s robustness and generalizability, we augmented the dataset by replacing 1,000 cancerous class images with new Genomic Data Commons (GDC) Data Portal - National Cancer Institute images, simulating more diverse clinical scenarios. This modification resulted in an accuracy of 99.39%, with equally high performance across other metrics, including precision, recall, and F1-Score, all reaching 99.39%, and a Matthew’s Correlation Coefficient (MCC) of 99.24%. The Gradient-weighted Class Activation Mapping (Grad-CAM) technique was utilized to visually interpret the model’s decisions, enhancing its transparency and reliability. These findings demonstrate that EfficientNetB3 is an effective and generalizable end-to-end framework for histopathological image analysis with minimal preprocessing. The promising results underscore the potential of EfficientNetB3 to advance automated cancer detection, thereby contributing to earlier diagnosis and more effective treatment strategies.
肺癌和结肠癌是世界上最致命的疾病之一,必须及早准确地发现,以改善患者的预后。本研究利用最先进的迁移学习方法——EfficientNetB3模型,从组织病理学图像中增强结肠癌和肺癌的检测。该研究利用LC25000数据集,包括25000张组织病理学图像,均匀分布在5类:结肠腺癌、良性结肠组织、肺腺癌、肺鳞状细胞癌和良性肺组织。effentnetb3模型最初在所有类中实现了令人印象深刻的99.39%的准确率。为了进一步验证和增强模型的鲁棒性和泛化性,我们通过用新的基因组数据共享(GDC)数据门户-国家癌症研究所图像替换1000个癌症类图像来增强数据集,模拟更多样化的临床场景。这种修改导致准确率达到99.39%,在其他指标上表现同样优异,包括精度,召回率和F1-Score,均达到99.39%,马修相关系数(MCC)为99.24%。利用梯度加权类激活映射(Gradient-weighted Class Activation Mapping, Grad-CAM)技术对模型的决策进行可视化解释,提高了模型的透明度和可靠性。这些发现表明,EfficientNetB3是一种有效的、可推广的端到端组织病理学图像分析框架,只需最少的预处理。这些令人鼓舞的结果强调了EfficientNetB3在推进自动化癌症检测方面的潜力,从而有助于早期诊断和更有效的治疗策略。
{"title":"A robust transfer learning approach with histopathological images for lung and colon cancer detection using EfficientNetB3","authors":"Raquel Ochoa-Ornelas , Alberto Gudiño-Ochoa , Julio Alberto García-Rodríguez , Sofia Uribe-Toscano","doi":"10.1016/j.health.2025.100391","DOIUrl":"10.1016/j.health.2025.100391","url":null,"abstract":"<div><div>Lung and colon cancers are among the deadliest diseases worldwide, necessitating early and accurate detection to improve patient outcomes. This study utilizes the EfficientNetB3 model, a state-of-the-art transfer learning approach, to enhance the detection of colon and lung cancers from histopathological images. The research leverages the LC25000 dataset, comprising 25,000 histopathological images evenly distributed across five classes: colon adenocarcinoma, benign colon tissue, lung adenocarcinoma, lung squamous cell carcinoma, and benign lung tissue. The EfficientNetB3 model initially achieved an impressive accuracy of 99.39% across all classes. To further validate and enhance the model’s robustness and generalizability, we augmented the dataset by replacing 1,000 cancerous class images with new Genomic Data Commons (GDC) Data Portal - National Cancer Institute images, simulating more diverse clinical scenarios. This modification resulted in an accuracy of 99.39%, with equally high performance across other metrics, including precision, recall, and F1-Score, all reaching 99.39%, and a Matthew’s Correlation Coefficient (MCC) of 99.24%. The Gradient-weighted Class Activation Mapping (Grad-CAM) technique was utilized to visually interpret the model’s decisions, enhancing its transparency and reliability. These findings demonstrate that EfficientNetB3 is an effective and generalizable end-to-end framework for histopathological image analysis with minimal preprocessing. The promising results underscore the potential of EfficientNetB3 to advance automated cancer detection, thereby contributing to earlier diagnosis and more effective treatment strategies.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100391"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143806834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-02-17DOI: 10.1016/j.health.2025.100385
Aydin Teymourifar , Onur Kaya , Gurkan Ozturk
This study focuses on a real-world healthcare system with coexisting public and private hospitals with distinct characteristics. While public hospitals have lower costs, they also suffer from long waiting times and diminishing patients’ perceived quality of care. Conversely, despite their higher fees, private hospitals offer shorter waiting times, leading to a more favorable perception of quality. A balanced healthcare system could provide societal benefits. Pricing strategies greatly influence a patient’s hospital selection. For instance, reduced fees in private hospitals attract more patients, consequently reducing overcrowding in public facilities and elevating the overall quality of services provided. This study aims to develop pricing models to foster a balanced and socially advantageous healthcare system. This system determines private hospital pricing through contract mechanisms with the government. Thus, we delve into the ramifications of various contract models between the government and private hospitals on social utility. Our findings underscore the communal advantages of contract mechanisms. Furthermore, we generalize the proposed models to apply to similar systems.
{"title":"A data-driven approach to pricing models for balanced public–private healthcare systems","authors":"Aydin Teymourifar , Onur Kaya , Gurkan Ozturk","doi":"10.1016/j.health.2025.100385","DOIUrl":"10.1016/j.health.2025.100385","url":null,"abstract":"<div><div>This study focuses on a real-world healthcare system with coexisting public and private hospitals with distinct characteristics. While public hospitals have lower costs, they also suffer from long waiting times and diminishing patients’ perceived quality of care. Conversely, despite their higher fees, private hospitals offer shorter waiting times, leading to a more favorable perception of quality. A balanced healthcare system could provide societal benefits. Pricing strategies greatly influence a patient’s hospital selection. For instance, reduced fees in private hospitals attract more patients, consequently reducing overcrowding in public facilities and elevating the overall quality of services provided. This study aims to develop pricing models to foster a balanced and socially advantageous healthcare system. This system determines private hospital pricing through contract mechanisms with the government. Thus, we delve into the ramifications of various contract models between the government and private hospitals on social utility. Our findings underscore the communal advantages of contract mechanisms. Furthermore, we generalize the proposed models to apply to similar systems.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100385"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143430002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-01-10DOI: 10.1016/j.health.2024.100378
Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch
The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.
{"title":"An automated information extraction model for unstructured discharge letters using large language models and GPT-4","authors":"Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch","doi":"10.1016/j.health.2024.100378","DOIUrl":"10.1016/j.health.2024.100378","url":null,"abstract":"<div><div>The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100378"},"PeriodicalIF":0.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}