Healthcare analytics (New York, N.Y.)最新文献_第10页

An ensemble convolutional neural network model for brain stroke prediction using brain computed tomography images 利用脑计算机断层扫描图像预测脑中风的集合卷积神经网络模型

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-10-29 DOI: 10.1016/j.health.2024.100368

Most. Jannatul Ferdous, Rifat Shahriyar

A stroke is a potentially fatal brain attack that causes an interruption in the blood supply to the brain. As a result, brain cells start to die due to a lack of oxygen and nutrients. After a stroke, every minute is critical. A million or more brain cells perish every minute during a stroke. The prompt identification of a stroke can prevent lasting brain damage or even save the patient’s life. Doctors advise computed tomography (CT) images of the brain for earlier stroke detection. If doctors delay CT diagnosis or may make erroneous diagnoses, this can be life-threatening. For that reason, an automatic diagnosis of stroke from a brain CT scan image will be beneficial for stroke patients. This study moderates three pre-trained convolutional neural network (CNN) models named Inceptionv3, MobileNetv2, and Xception by updating the top layer of those models using the transfer-learning technique based on CT images of the brain. A new ensemble convolutional neural network (ENSNET) model is proposed for automatic brain stroke prediction from brain CT scan images. ENSNET is the average of two improved CNN models named InceptionV3 and Xception. We have relied on the following metrics: accuracy, precision, recall, f1-score, confusion matrix, accuracy versus epoch, loss versus epoch, and the receiver operating characteristic (ROC) curve to assess performance matrices. The accuracy of the moderated Inceptionv3 is 97.48%, the moderated MobileNetv2 is 83.29%, and the moderated Xception is 96.11%. Nonetheless, the suggested ensemble model ENSNET performs better than the other models when it comes to the diagnosis of stroke from brain CT scans, providing 98.86% accuracy, 97.71% precision, 98.46% recall, 98.08% f1-score, and 98.74% area under the ROC curve(AUC). Therefore, the proposed model ENSNET can detect strokes from computed tomography images of the brain more successfully than other models.

中风是一种可能致命的脑部疾病，会导致大脑供血中断。因此，脑细胞会因缺氧和缺乏营养而开始死亡。中风后，每一分钟都至关重要。在中风期间，每分钟都有一百万或更多的脑细胞死亡。及时发现中风可以避免对大脑造成持久伤害，甚至挽救患者的生命。医生建议通过脑部计算机断层扫描（CT）图像来尽早发现中风。如果医生延误 CT 诊断或做出错误诊断，可能会危及生命。因此，通过脑部 CT 扫描图像自动诊断中风将对中风患者有益。本研究基于脑部 CT 图像，利用迁移学习技术更新了三个预先训练好的卷积神经网络（CNN）模型，分别命名为 Inceptionv3、MobileNetv2 和 Xception。本文提出了一种新的集合卷积神经网络（ENSNET）模型，用于从脑部 CT 扫描图像自动预测脑中风。ENSNET 是名为 InceptionV3 和 Xception 的两个改进 CNN 模型的平均值。我们采用以下指标来评估性能矩阵：准确度、精确度、召回率、f1-分数、混淆矩阵、准确度与历时的关系、损失与历时的关系以及接收者操作特征曲线（ROC）。经调节的 Inceptionv3 的准确率为 97.48%，经调节的 MobileNetv2 的准确率为 83.29%，经调节的 Xception 的准确率为 96.11%。尽管如此，建议的集合模型 ENSNET 在通过脑 CT 扫描诊断中风方面的表现优于其他模型，准确率为 98.86%，精确率为 97.71%，召回率为 98.46%，f1 分数为 98.08%，ROC 曲线下面积（AUC）为 98.74%。因此，与其他模型相比，所提出的 ENSNET 模型能更成功地从脑部计算机断层扫描图像中检测出脑卒中。

{"title":"An ensemble convolutional neural network model for brain stroke prediction using brain computed tomography images","authors":"Most. Jannatul Ferdous, Rifat Shahriyar","doi":"10.1016/j.health.2024.100368","DOIUrl":"10.1016/j.health.2024.100368","url":null,"abstract":"<div><div>A stroke is a potentially fatal brain attack that causes an interruption in the blood supply to the brain. As a result, brain cells start to die due to a lack of oxygen and nutrients. After a stroke, every minute is critical. A million or more brain cells perish every minute during a stroke. The prompt identification of a stroke can prevent lasting brain damage or even save the patient’s life. Doctors advise computed tomography (CT) images of the brain for earlier stroke detection. If doctors delay CT diagnosis or may make erroneous diagnoses, this can be life-threatening. For that reason, an automatic diagnosis of stroke from a brain CT scan image will be beneficial for stroke patients. This study moderates three pre-trained convolutional neural network (CNN) models named Inceptionv3, MobileNetv2, and Xception by updating the top layer of those models using the transfer-learning technique based on CT images of the brain. A new ensemble convolutional neural network (ENSNET) model is proposed for automatic brain stroke prediction from brain CT scan images. ENSNET is the average of two improved CNN models named InceptionV3 and Xception. We have relied on the following metrics: accuracy, precision, recall, f1-score, confusion matrix, accuracy versus epoch, loss versus epoch, and the receiver operating characteristic (ROC) curve to assess performance matrices. The accuracy of the moderated Inceptionv3 is 97.48%, the moderated MobileNetv2 is 83.29%, and the moderated Xception is 96.11%. Nonetheless, the suggested ensemble model ENSNET performs better than the other models when it comes to the diagnosis of stroke from brain CT scans, providing 98.86% accuracy, 97.71% precision, 98.46% recall, 98.08% f1-score, and 98.74% area under the ROC curve(AUC). Therefore, the proposed model ENSNET can detect strokes from computed tomography images of the brain more successfully than other models.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100368"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An investigation of Susceptible–Exposed–Infectious–Recovered (SEIR) tuberculosis model dynamics with pseudo-recovery and psychological effect 带假康复和心理效应的易感-暴露-感染-康复（SEIR）结核病模型动力学研究

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-09-17 DOI: 10.1016/j.health.2024.100361

Yudi Ari Adi , Suparman

Tuberculosis is one of the most pressing issues of the modern era, posing a severe health risk to humans in recent decades. This study proposes a Susceptible–Exposed–Infectious–Recovered (SEIR) tuberculosis epidemic transmission model with psychological effects and pseudo-recovery. We consider a compartmental mathematical model in which the entire population is divided into four compartments based on their natural features. The model is validated, and parameter values are estimated using Indonesian data from 2002 to 2022. To investigate their epidemiological significance, we proved the positivity and boundedness of solutions, as well as the local and global stability of equilibria. Sensitivity analysis is used to find the most influential parameters with the most significant influence on the basic reproduction number,

R_{0}

. The bifurcation procedure tools of the center manifold theory are used to conduct a bifurcation study. Mathematical conditions ensure the inferred event of forward bifurcation. We performed numerical simulations that support our theoretical findings.

结核病是当代最紧迫的问题之一，近几十年来严重危害人类健康。本研究提出了一种具有心理效应和伪康复的易感-暴露-感染-康复（SEIR）结核病流行传播模型。我们考虑了一个分区数学模型，其中根据自然特征将整个人群分为四个分区。我们利用 2002 年至 2022 年的印尼数据对模型进行了验证，并估算了参数值。为了研究其流行病学意义，我们证明了解的实在性和有界性，以及平衡点的局部和全局稳定性。通过敏感性分析，我们找到了对基本繁殖数 R0 影响最大的参数。利用中心流形理论的分岔程序工具进行分岔研究。数学条件确保了推断的正向分岔事件。我们进行的数值模拟支持了我们的理论发现。

{"title":"An investigation of Susceptible–Exposed–Infectious–Recovered (SEIR) tuberculosis model dynamics with pseudo-recovery and psychological effect","authors":"Yudi Ari Adi , Suparman","doi":"10.1016/j.health.2024.100361","DOIUrl":"10.1016/j.health.2024.100361","url":null,"abstract":"<div><div>Tuberculosis is one of the most pressing issues of the modern era, posing a severe health risk to humans in recent decades. This study proposes a Susceptible–Exposed–Infectious–Recovered (SEIR) tuberculosis epidemic transmission model with psychological effects and pseudo-recovery. We consider a compartmental mathematical model in which the entire population is divided into four compartments based on their natural features. The model is validated, and parameter values are estimated using Indonesian data from 2002 to 2022. To investigate their epidemiological significance, we proved the positivity and boundedness of solutions, as well as the local and global stability of equilibria. Sensitivity analysis is used to find the most influential parameters with the most significant influence on the basic reproduction number, <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>. The bifurcation procedure tools of the center manifold theory are used to conduct a bifurcation study. Mathematical conditions ensure the inferred event of forward bifurcation. We performed numerical simulations that support our theoretical findings.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100361"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial intelligence and diagnostic healthcare using computer vision and medical imaging 使用计算机视觉和医学成像的人工智能和诊断医疗保健

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-06-25 DOI: 10.1016/j.health.2024.100352

Gaurav Dhiman, Wattana Viriyasitavat, Atulya K. Nagar, Oscar Castillo

引用次数: 0

A hierarchical Bayesian approach for identifying socioeconomic factors influencing self-rated health in Japan 用分层贝叶斯方法确定影响日本自我健康评价的社会经济因素

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-10-25 DOI: 10.1016/j.health.2024.100367

Makoto Nakakita , Teruo Nakatsuma

This study identifies socioeconomic factors that potentially influence self-rated health (SRH), an important indicator of health status, in the Japanese population. We used a panel data logit model to simultaneously estimate the effects of personal attributes, living environment, and social conditions. To achieve a stable estimation of the panel data logit model, we applied hierarchical Bayesian modeling and the Markov Chain Monte Carlo (MCMC) method to obtain its estimation. Furthermore, we used the ancillary-sufficiency interweaving strategy (ASIS) algorithm to improve the efficiency of the MCMC method for the panel data logit model. The results indicate that SRH within the Japanese population is affected by demographic and socioeconomic factors (e.g., age, marital status, educational background, and employment status) and daily habits such as frequency of drinking alcohol. We also obtained results that differed from previous studies in the research literature. Differences in the national character among countries may be reflected in these results. Since SRH is a subjective measure of health status and often differs from actual health status, it is crucial to remove the influences of the national character on SRH in evaluating the actual health status of individuals within a population. The study findings provide important insights into addressing these factors to understand SRH in the Japanese context better.

本研究确定了可能影响日本人口自评健康（SRH）这一健康状况重要指标的社会经济因素。我们使用面板数据 logit 模型来同时估计个人属性、生活环境和社会条件的影响。为了实现面板数据 logit 模型的稳定估计，我们采用了分层贝叶斯建模和马尔可夫链蒙特卡罗（MCMC）方法来进行估计。此外，我们还使用了辅助-效率交织策略（ASIS）算法来提高面板数据 logit 模型的 MCMC 方法的效率。结果表明，日本人口的性健康和生殖健康受到人口和社会经济因素（如年龄、婚姻状况、教育背景和就业状况）以及日常习惯（如饮酒频率）的影响。我们还得出了与以往研究文献不同的结果。这些结果可能反映了各国在国民性方面的差异。由于性健康和生殖健康是对健康状况的主观衡量，往往与实际健康状况存在差异，因此在评估人口中个人的实际健康状况时，剔除民族特色对性健康和生殖健康的影响至关重要。研究结果为解决这些因素提供了重要启示，以便更好地了解日本的性健康和生殖健康状况。

{"title":"A hierarchical Bayesian approach for identifying socioeconomic factors influencing self-rated health in Japan","authors":"Makoto Nakakita , Teruo Nakatsuma","doi":"10.1016/j.health.2024.100367","DOIUrl":"10.1016/j.health.2024.100367","url":null,"abstract":"<div><div>This study identifies socioeconomic factors that potentially influence self-rated health (SRH), an important indicator of health status, in the Japanese population. We used a panel data logit model to simultaneously estimate the effects of personal attributes, living environment, and social conditions. To achieve a stable estimation of the panel data logit model, we applied hierarchical Bayesian modeling and the Markov Chain Monte Carlo (MCMC) method to obtain its estimation. Furthermore, we used the ancillary-sufficiency interweaving strategy (ASIS) algorithm to improve the efficiency of the MCMC method for the panel data logit model. The results indicate that SRH within the Japanese population is affected by demographic and socioeconomic factors (e.g., age, marital status, educational background, and employment status) and daily habits such as frequency of drinking alcohol. We also obtained results that differed from previous studies in the research literature. Differences in the national character among countries may be reflected in these results. Since SRH is a subjective measure of health status and often differs from actual health status, it is crucial to remove the influences of the national character on SRH in evaluating the actual health status of individuals within a population. The study findings provide important insights into addressing these factors to understand SRH in the Japanese context better.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100367"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative analysis of machine learning algorithms with tree-structured parzen estimator for liver disease prediction 机器学习算法与树状结构帕尔森估计器在肝病预测方面的比较分析

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-08-16 DOI: 10.1016/j.health.2024.100358

Rakibul Islam, Azrin Sultana, MD. Nuruzzaman Tuhin

The liver is one of the most essential organs in the body, which helps with metabolism and keeping the body healthy. Successful treatments and better patient outcomes depend on early and correct Liver Disease (LD) diagnosis and identification. This study proposes a system for predicting the LD by combining the techniques of Machine Learning (ML) algorithms that include the Decision Tree, Random Forest, Extra Tree Classifier (ETC), LightGBM, and Adaboost, with the Tree-Structured Parzen Estimator (TPE) method for hyperparameter tuning. No previous literature research has utilized ML algorithms with TPE to predict LD. For this research, the Indian Liver Patients’ Dataset with 583 instances and 11 attributes was used. In the pre-processing of the data, techniques such as upsampling have been utilized to address the class imbalance problem. Normalization has been employed to scale the dataset, and feature selection has been applied to choose important features. The proposed model has been analyzed and compared using a 10-fold cross-validation process, with various evaluation metrics including accuracy, precision, recall, and F1-score. The model proposed in this study achieved the best level of accuracy while employing the ETC with the TPE approach, with a recorded accuracy of 95.8%.

肝脏是人体最重要的器官之一，有助于新陈代谢和保持身体健康。成功的治疗和更好的患者预后取决于早期正确的肝病（LD）诊断和识别。本研究提出了一种预测肝病的系统，它结合了机器学习（ML）算法技术，包括决策树、随机森林、额外树分类器（ETC）、LightGBM 和 Adaboost，以及用于超参数调整的树状结构帕尔森估计器（TPE）方法。以前的文献研究还没有利用带有 TPE 的多重L 算法来预测 LD。本研究使用了包含 583 个实例和 11 个属性的印度肝病患者数据集。在对数据进行预处理时，使用了上采样等技术来解决类不平衡问题。采用归一化技术对数据集进行缩放，并应用特征选择技术来选择重要特征。我们使用 10 倍交叉验证流程对所提出的模型进行了分析和比较，并使用了各种评价指标，包括准确率、精确度、召回率和 F1 分数。本研究提出的模型在采用 ETC 和 TPE 方法时达到了最佳准确度水平，准确率为 95.8%。

{"title":"A comparative analysis of machine learning algorithms with tree-structured parzen estimator for liver disease prediction","authors":"Rakibul Islam, Azrin Sultana, MD. Nuruzzaman Tuhin","doi":"10.1016/j.health.2024.100358","DOIUrl":"10.1016/j.health.2024.100358","url":null,"abstract":"<div><p>The liver is one of the most essential organs in the body, which helps with metabolism and keeping the body healthy. Successful treatments and better patient outcomes depend on early and correct Liver Disease (LD) diagnosis and identification. This study proposes a system for predicting the LD by combining the techniques of Machine Learning (ML) algorithms that include the Decision Tree, Random Forest, Extra Tree Classifier (ETC), LightGBM, and Adaboost, with the Tree-Structured Parzen Estimator (TPE) method for hyperparameter tuning. No previous literature research has utilized ML algorithms with TPE to predict LD. For this research, the Indian Liver Patients’ Dataset with 583 instances and 11 attributes was used. In the pre-processing of the data, techniques such as upsampling have been utilized to address the class imbalance problem. Normalization has been employed to scale the dataset, and feature selection has been applied to choose important features. The proposed model has been analyzed and compared using a 10-fold cross-validation process, with various evaluation metrics including accuracy, precision, recall, and F1-score. The model proposed in this study achieved the best level of accuracy while employing the ETC with the TPE approach, with a recorded accuracy of 95.8%.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100358"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000601/pdfft?md5=3aa72f3755c5377eba838fab77bd6aa3&pid=1-s2.0-S2772442524000601-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An integrated location–allocation model for reducing disparities and increasing accessibility to public health screening centers 减少差异和提高公共卫生筛查中心可及性的综合位置分配模式

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-06-19 DOI: 10.1016/j.health.2024.100349

João Flávio de Freitas Almeida , Lásara Fabrícia Rodrigues , Luiz Ricardo Pinto , Francisco Carlos Cardoso de Campos

The tests for tracking diseases in newborns available through the National Neonatal Screening Program of the Brazilian Unified Health Care System cover six diseases. Mass spectrometer equipment is needed to expand and more efficiently and effectively detect new diseases. However, only four neonatal screening centers have the equipment capable of carrying out the extended test, and the expansion of health service capacity should consider both the rationalization of costs and the comprehensiveness and accessibility of care to the population. This study uses analytics to analyze and estimate the cost of centralized or distributed logistics networks and the level of service to perform the expanded test for newborns throughout Brazil. We evaluate the accessibility of the current infrastructure for the neonatal screening program and propose a novel location–allocation model to create a more integrated infrastructure for reducing disparities and increase the accessibility to neonatal screening services.

巴西统一医疗保健系统的国家新生儿筛查计划提供的新生儿疾病跟踪检测涵盖六种疾病。需要质谱仪设备来扩大检测范围，更高效、更有效地检测新的疾病。然而，目前只有四家新生儿筛查中心拥有能够进行扩展检测的设备，因此在扩大医疗服务能力时，既要考虑成本的合理化，也要考虑医疗服务的全面性和可及性。本研究利用分析方法分析并估算了集中式或分布式物流网络的成本，以及在巴西全国范围内为新生儿进行扩大检验的服务水平。我们评估了当前新生儿筛查计划基础设施的可及性，并提出了一个新颖的位置分配模式，以创建一个更加一体化的基础设施，从而减少差异并提高新生儿筛查服务的可及性。

{"title":"An integrated location–allocation model for reducing disparities and increasing accessibility to public health screening centers","authors":"João Flávio de Freitas Almeida , Lásara Fabrícia Rodrigues , Luiz Ricardo Pinto , Francisco Carlos Cardoso de Campos","doi":"10.1016/j.health.2024.100349","DOIUrl":"https://doi.org/10.1016/j.health.2024.100349","url":null,"abstract":"<div><p>The tests for tracking diseases in newborns available through the National Neonatal Screening Program of the Brazilian Unified Health Care System cover six diseases. Mass spectrometer equipment is needed to expand and more efficiently and effectively detect new diseases. However, only four neonatal screening centers have the equipment capable of carrying out the extended test, and the expansion of health service capacity should consider both the rationalization of costs and the comprehensiveness and accessibility of care to the population. This study uses analytics to analyze and estimate the cost of centralized or distributed logistics networks and the level of service to perform the expanded test for newborns throughout Brazil. We evaluate the accessibility of the current infrastructure for the neonatal screening program and propose a novel location–allocation model to create a more integrated infrastructure for reducing disparities and increase the accessibility to neonatal screening services.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100349"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000510/pdfft?md5=8d14260b36fde15e3bb57df49d356689&pid=1-s2.0-S2772442524000510-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141439169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction 采用 LASSO 和 SHAP 特征选择的机器学习模型在乳腺癌预测方面的比较研究

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-06-25 DOI: 10.1016/j.health.2024.100353

Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Shahriar Shakil, Md. Zahid Hasan

In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.

近几十年来，乳腺癌已成为影响全球妇女的最常见癌症类型，这对妇女的死亡率构成了重大风险。乳腺癌的早期识别可能会大大降低患者的死亡率，并大大提高有效治疗的机会。在现代，机器学习模型已成为癌症分类的关键，并能提高诊断和医疗策略的准确性和效率。因此，本研究的重点是利用各种机器学习算法对乳腺癌进行早期检测，并希望通过综合数据集找出最有效的特征选择过程。最初，我们在不同的数据集上评估了五个传统模型和两个元模型。为了找到最有价值的特征，研究使用了最小绝对收缩和选择操作符（LASSO）以及 SHapley Additive exPlanations（SHAP）选择方法，并通过一系列性能规定对它们进行了分析。此外，我们还将这些模型应用于合并数据集，并观察到合并数据集明显有利于乳腺癌诊断。在对特征选择策略进行分析后，我们发现大多数模型在使用 SHAP 方法时表现得更为准确。值得注意的是，三个传统模型和两个元分类器获得了 99.82% 的准确率，与最先进的方法相比，表现出了卓越的性能。这一进步为完善诊断工具和促进医学科学在这一领域的发展奠定了基础，具有至关重要的作用。

{"title":"A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction","authors":"Md. Shazzad Hossain Shaon, Tasmin Karim, Md. Shahriar Shakil, Md. Zahid Hasan","doi":"10.1016/j.health.2024.100353","DOIUrl":"https://doi.org/10.1016/j.health.2024.100353","url":null,"abstract":"<div><p>In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100353"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000558/pdfft?md5=86753ff6e5dca7c27f447a4a08fa5813&pid=1-s2.0-S2772442524000558-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A longitudinal mixed effects model for assessing mortality trends during vaccine rollout 用于评估疫苗推广期间死亡率趋势的纵向混合效应模型

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-06-13 DOI: 10.1016/j.health.2024.100347

Qin Shao , Mounika Polavarapu , Lafleur Small , Shipra Singh , Quoc Nguyen , Kevin Shao

The rapid spread of coronavirus disease 2019 (COVID-19) initially presented unprecedented challenges for clinicians, policymakers, and healthcare systems, as there was limited evidence on the efficacy of various control measures. This study endeavors to provide a detailed and comprehensive overview of the global progression of the COVID-19 mortality in the context of vaccine rollout, utilizing public surveillance data from 145 countries sourced from the World Health Organization and the World Bank. The primary focus is to analyze shifts in the trend of new COVID-19 mortality worldwide before and after the introduction of COVID-19 vaccines. To achieve this, we propose a longitudinal mixed effects model aimed at elucidating the relationship between mortality trend and vaccination rollout, alongside other pertinent covariates. Our modeling approach seeks to accommodate variations in the timing of COVID-19 vaccine rollout among countries, as well as the correlation of observations from within the same country. Our findings highlight the significant impact of new cases, cardiovascular death rate, senior population, stringency index, and reproduction rate on mortality. However, we find that the impact of vaccination is not statistically significant, as evidenced by a relatively large $p$ -value. Furthermore, the study reveals substantial disparities in mortality rates among countries across four income groups.

冠状病毒病 2019（COVID-19）的迅速传播最初给临床医生、政策制定者和医疗保健系统带来了前所未有的挑战，因为各种控制措施的有效性证据有限。本研究试图利用世界卫生组织和世界银行提供的 145 个国家的公共监测数据，详细、全面地概述在疫苗推广背景下 COVID-19 死亡率的全球进展情况。主要重点是分析在引入 COVID-19 疫苗前后全球 COVID-19 新死亡率趋势的变化。为此，我们提出了一个纵向混合效应模型，旨在阐明死亡率趋势与疫苗接种推广以及其他相关协变量之间的关系。我们的建模方法力求适应各国 COVID-19 疫苗推广时间的差异，以及同一国家内观察结果的相关性。我们的研究结果凸显了新发病例、心血管病死亡率、老年人口、严格指数和繁殖率对死亡率的重要影响。然而，我们发现疫苗接种的影响在统计学上并不显著，相对较大的 p 值证明了这一点。此外，研究还揭示了四个收入组别国家之间死亡率的巨大差异。

{"title":"A longitudinal mixed effects model for assessing mortality trends during vaccine rollout","authors":"Qin Shao , Mounika Polavarapu , Lafleur Small , Shipra Singh , Quoc Nguyen , Kevin Shao","doi":"10.1016/j.health.2024.100347","DOIUrl":"10.1016/j.health.2024.100347","url":null,"abstract":"<div><p>The rapid spread of coronavirus disease 2019 (COVID-19) initially presented unprecedented challenges for clinicians, policymakers, and healthcare systems, as there was limited evidence on the efficacy of various control measures. This study endeavors to provide a detailed and comprehensive overview of the global progression of the COVID-19 mortality in the context of vaccine rollout, utilizing public surveillance data from 145 countries sourced from the World Health Organization and the World Bank. The primary focus is to analyze shifts in the trend of new COVID-19 mortality worldwide before and after the introduction of COVID-19 vaccines. To achieve this, we propose a longitudinal mixed effects model aimed at elucidating the relationship between mortality trend and vaccination rollout, alongside other pertinent covariates. Our modeling approach seeks to accommodate variations in the timing of COVID-19 vaccine rollout among countries, as well as the correlation of observations from within the same country. Our findings highlight the significant impact of new cases, cardiovascular death rate, senior population, stringency index, and reproduction rate on mortality. However, we find that the impact of vaccination is not statistically significant, as evidenced by a relatively large <span><math><mi>p</mi></math></span>-value. Furthermore, the study reveals substantial disparities in mortality rates among countries across four income groups.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100347"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000492/pdfft?md5=ae79d48a8a53e7a4841d3c82370b0bf0&pid=1-s2.0-S2772442524000492-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141401659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing the impact on quality of prediction and inference from balancing in multilevel logistic regression 评估多级逻辑回归中的平衡对预测和推断质量的影响

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-08-22 DOI: 10.1016/j.health.2024.100359

Carolina Gonzalez-Canas , Gustavo A. Valencia-Zapata , Ana Maria Estrada Gomez , Zachary Hass

The primary goal of this research is to examine the impact of balancing data on the prediction quality and inference in multilevel logistic regression models. Logistic regression is a valuable approach for modeling binary outcomes expected in health applications. The class imbalance problem, where one of the two outcome categories occurs much more often than the other, is common in healthcare data, such as when modeling the risk factors for rare diseases. The issue is particularly relevant for medical data that contains individual measurements and other data sources measured at a geographic region level, such as environmental risk factors. For this work, both prediction and model interpretation are of interest. A simulation model is proposed to test the impact of balancing strategies on the logistic multilevel model's parameter estimation, inference, and predictive performance. The simulated information emulates characteristics of a Gestational Diabetes Mellitus (GDM) dataset from Indiana's Medicaid program. Several datasets were simulated with varying levels of complexity, involving the balance of the outcome variable and predictors. These datasets exhibited high- or low-frequency occurrences in specific intersections of variables, often called ‘cells.’ The impact of the balancing strategies on prediction and inference was assessed using different techniques, such as the Equivalence (TOST) Test, power analysis, and predictive measures. To the best of our knowledge, this is the first research that explores the impact of using balanced samples on coefficient estimation and prediction measures when using logistic multilevel modeling, finding evidence about the benefits of using balanced samples in this context.

这项研究的主要目的是考察平衡数据对多层次逻辑回归模型的预测质量和推断的影响。逻辑回归是一种对健康应用中预期的二元结果进行建模的重要方法。类不平衡问题，即两个结果类别中的一个类别比另一个类别出现得更频繁，在医疗数据中很常见，例如在对罕见疾病的风险因素建模时。这个问题对于包含个人测量数据和其他在地理区域层面测量的数据源（如环境风险因素）的医疗数据尤为重要。在这项工作中，预测和模型解释都很重要。我们提出了一个仿真模型来测试平衡策略对逻辑多层次模型的参数估计、推理和预测性能的影响。模拟信息模仿了印第安纳州医疗补助计划中妊娠糖尿病（GDM）数据集的特征。模拟的几个数据集具有不同程度的复杂性，涉及结果变量和预测因子的平衡。这些数据集在变量的特定交叉点（通常称为 "单元"）上显示出高频或低频的出现。平衡策略对预测和推理的影响通过不同的技术进行了评估，如等效性（TOST）测试、功率分析和预测措施。据我们所知，这是第一项探索在使用逻辑多层次建模时使用平衡样本对系数估计和预测指标的影响的研究，发现了在这种情况下使用平衡样本的好处。

{"title":"Assessing the impact on quality of prediction and inference from balancing in multilevel logistic regression","authors":"Carolina Gonzalez-Canas , Gustavo A. Valencia-Zapata , Ana Maria Estrada Gomez , Zachary Hass","doi":"10.1016/j.health.2024.100359","DOIUrl":"10.1016/j.health.2024.100359","url":null,"abstract":"<div><p>The primary goal of this research is to examine the impact of balancing data on the prediction quality and inference in multilevel logistic regression models. Logistic regression is a valuable approach for modeling binary outcomes expected in health applications. The class imbalance problem, where one of the two outcome categories occurs much more often than the other, is common in healthcare data, such as when modeling the risk factors for rare diseases. The issue is particularly relevant for medical data that contains individual measurements and other data sources measured at a geographic region level, such as environmental risk factors. For this work, both prediction and model interpretation are of interest. A simulation model is proposed to test the impact of balancing strategies on the logistic multilevel model's parameter estimation, inference, and predictive performance. The simulated information emulates characteristics of a Gestational Diabetes Mellitus (GDM) dataset from Indiana's Medicaid program. Several datasets were simulated with varying levels of complexity, involving the balance of the outcome variable and predictors. These datasets exhibited high- or low-frequency occurrences in specific intersections of variables, often called ‘cells.’ The impact of the balancing strategies on prediction and inference was assessed using different techniques, such as the Equivalence (TOST) Test, power analysis, and predictive measures. To the best of our knowledge, this is the first research that explores the impact of using balanced samples on coefficient estimation and prediction measures when using logistic multilevel modeling, finding evidence about the benefits of using balanced samples in this context.</p></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100359"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772442524000613/pdfft?md5=61d70749e6aeada54ee254cabcd3c429&pid=1-s2.0-S2772442524000613-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142117349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A data envelopment analysis model for optimizing transfer time of ischemic stroke patients under endovascular thrombectomy 优化血管内血栓切除术下缺血性脑卒中患者转院时间的数据包络分析模型

Healthcare analytics (New York, N.Y.)

Pub Date : 2024-12-01 Epub Date: 2024-09-19 DOI: 10.1016/j.health.2024.100364

Mirpouya Mirmozaffari, Noreen Kamal

This study applies Data Envelopment Analysis (DEA) to optimize transfer times and futile transfers of eligible ischemic stroke patients receiving Endovascular Thrombosis (EVT) in Primary Stroke Centers (PSC) in Nova Scotia. The study aims to assess healthcare delivery in Nova Scotia over two periods. It seeks to improve stroke care for rural populations by examining nine inputs, including age and distance between PSCs and the Comprehensive Stroke Centre (CSC) that provided EVT treatment, concerning a single output variable: whether EVT is performed or not. In the first phase, 115 patients were treated as Decision-Making Units (DMUs) for ten PSCs by applying an input-oriented Variable Returns to Scale (VRS) assisted by super-efficiency analysis using the Python-based PyDEA tool. This tool is known for its unrestricted capacity to handle DMUs, inputs, and outputs. In the second phase, eight PSCs with low patient numbers were merged into four DMUs, each consisting of two PSCs. These two merged PSCs have limited patients, and the selected PSCs are also geographically close. Two PSCs have been kept separate because they had sufficient patient volume. In the first phase, VRS generated more reasonable efficiency scores for evaluation, while in the second phase, Constant Returns to Scale (CRS) outperformed VRS, yielding better results. In the initial stage of the second phase, ten PSCs were considered as six DMUs using the input-oriented CRS and VRS for 115 patients. Super-efficiency measures were applied in this stage to improve the evaluation process further. In the second part of the second phase, a comparison between the first period (2018–2019) and the second period (2020–2021) was conducted using the Malmquist Productivity Index (MPI), considering CRS and VRS to evaluate the relative efficiency and productivity change of six DMUs over time.

本研究应用数据包络分析法（DEA）对新斯科舍省初级卒中中心（PSC）接受血管内血栓治疗（EVT）的合格缺血性卒中患者的转院时间和无效转院进行优化。该研究旨在评估新斯科舍省两个时期的医疗服务提供情况。该研究通过对九个输入变量（包括年龄、初级卒中中心与提供 EVT 治疗的综合卒中中心 (CSC) 之间的距离）和一个输出变量（是否实施 EVT）进行研究，力求改善农村人口的卒中治疗。在第一阶段，通过使用基于 Python- 的 PyDEA 工具，在超效率分析的辅助下，应用以输入为导向的规模收益率变量（VRS），将 115 名患者作为 10 个 PSC 的决策单元（DMU）进行处理。该工具以其处理 DMU、输入和输出的无限制能力而著称。在第二阶段，8 个患者人数较少的 PSC 被合并为 4 个 DMU，每个 DMU 由两个 PSC 组成。这两家合并后的初级保健中心的病人数量有限，所选的初级保健中心在地理位置上也很接近。有两家初级保健中心因病人数量充足而被分开。在第一阶段，VRS 得出了更合理的效率评估分数，而在第二阶段，规模恒定收益法（CRS）优于 VRS，取得了更好的结果。在第二阶段的初始阶段，十家初级保健中心被视为六个 DMU，对 115 名患者使用了以投入为导向的 CRS 和 VRS。在这一阶段采用了超效率措施，以进一步改进评估过程。在第二阶段的第二部分，使用马尔奎斯特生产力指数（MPI）对第一阶段（2018-2019 年）和第二阶段（2020-2021 年）进行了比较，考虑了 CRS 和 VRS，以评估六个 DMU 随时间推移的相对效率和生产力变化。

{"title":"A data envelopment analysis model for optimizing transfer time of ischemic stroke patients under endovascular thrombectomy","authors":"Mirpouya Mirmozaffari, Noreen Kamal","doi":"10.1016/j.health.2024.100364","DOIUrl":"10.1016/j.health.2024.100364","url":null,"abstract":"<div><div>This study applies Data Envelopment Analysis (DEA) to optimize transfer times and futile transfers of eligible ischemic stroke patients receiving Endovascular Thrombosis (EVT) in Primary Stroke Centers (PSC) in Nova Scotia. The study aims to assess healthcare delivery in Nova Scotia over two periods. It seeks to improve stroke care for rural populations by examining nine inputs, including age and distance between PSCs and the Comprehensive Stroke Centre (CSC) that provided EVT treatment, concerning a single output variable: whether EVT is performed or not. In the first phase, 115 patients were treated as Decision-Making Units (DMUs) for ten PSCs by applying an input-oriented Variable Returns to Scale (VRS) assisted by super-efficiency analysis using the Python-based PyDEA tool. This tool is known for its unrestricted capacity to handle DMUs, inputs, and outputs. In the second phase, eight PSCs with low patient numbers were merged into four DMUs, each consisting of two PSCs. These two merged PSCs have limited patients, and the selected PSCs are also geographically close. Two PSCs have been kept separate because they had sufficient patient volume. In the first phase, VRS generated more reasonable efficiency scores for evaluation, while in the second phase, Constant Returns to Scale (CRS) outperformed VRS, yielding better results. In the initial stage of the second phase, ten PSCs were considered as six DMUs using the input-oriented CRS and VRS for 115 patients. Super-efficiency measures were applied in this stage to improve the evaluation process further. In the second part of the second phase, a comparison between the first period (2018–2019) and the second period (2020–2021) was conducted using the Malmquist Productivity Index (MPI), considering CRS and VRS to evaluate the relative efficiency and productivity change of six DMUs over time.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"6 ","pages":"Article 100364"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0