首页 > 最新文献

Healthcare analytics (New York, N.Y.)最新文献

英文 中文
A recommender system with multi-objective hybrid Harris Hawk optimization for feature selection and disease diagnosis
Pub Date : 2025-01-31 DOI: 10.1016/j.health.2025.100384
Madhusree Kuanr, Puspanjali Mohapatra
This study proposes a health recommender system to analyze health risk and disease prediction by identifying the most responsible disease-causing factors using a hybrid Genetic–Harris Hawk optimization multi-objective feature selection approach. The proposed recommender system uses the Tree-based Pipeline Optimization Tool (TPOT) automated machine learning model to recommend the most suitable machine learning prediction model with the best classifier in terms of classification accuracy for a disease with the selected features. It also recommends the top three disease-causing features for a particular disease that can be utilized to analyze a person’s health risk. The proposed system has also been compared with the competing prediction approaches using Principal Component Analysis (PCA), Singular Vector Decomposition (SVD), and Autoencoders. We show that the proposed system outperforms competing approaches in terms of classification accuracy.
{"title":"A recommender system with multi-objective hybrid Harris Hawk optimization for feature selection and disease diagnosis","authors":"Madhusree Kuanr,&nbsp;Puspanjali Mohapatra","doi":"10.1016/j.health.2025.100384","DOIUrl":"10.1016/j.health.2025.100384","url":null,"abstract":"<div><div>This study proposes a health recommender system to analyze health risk and disease prediction by identifying the most responsible disease-causing factors using a hybrid Genetic–Harris Hawk optimization multi-objective feature selection approach. The proposed recommender system uses the Tree-based Pipeline Optimization Tool (TPOT) automated machine learning model to recommend the most suitable machine learning prediction model with the best classifier in terms of classification accuracy for a disease with the selected features. It also recommends the top three disease-causing features for a particular disease that can be utilized to analyze a person’s health risk. The proposed system has also been compared with the competing prediction approaches using Principal Component Analysis (PCA), Singular Vector Decomposition (SVD), and Autoencoders. We show that the proposed system outperforms competing approaches in terms of classification accuracy.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100384"},"PeriodicalIF":0.0,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An application of natural language processing for hypoglycemic event identification in patients with diabetes mellitus
Pub Date : 2025-01-21 DOI: 10.1016/j.health.2024.100381
J.E. Camacho-Cogollo , Cristhian Felipe Patiño Zambrano , Christian Lochmuller , Claudia C. Colmenares-Mejia , Nicolas Rozo , Mario A. Isaza-Ruget , Paul Rodriguez , Andrés García
The therapeutic goal for diabetes mellitus is to maintain normal blood glucose levels, but in some cases, hypoglycemia may occur as a consequence of treatment. Identifying patients with hypoglycemia is critical to preventing adverse events and mortality. However, hypoglycemic events are often not accurately documented in electronic health records (EHRs). This study presents a retrospective analysis of the EHRs of patients with diabetes mellitus. We hypothesize that text analytics and machine learning can identify possible hypoglycemic incidents from unstructured physician notes in electronic health records. Our analysis applies these techniques using the Python programming language as a tool. It also considers words that describe symptoms related to hypoglycemia. The analysis involves searching physicians' notes for keywords and applying supervised classification methods to 146,542 records. Natural language processing (NLP) and machine learning algorithms are used to identify possible hypoglycemic events and related symptoms in physicians’ notes. A multi-layer perceptron (MLP) model produces the best classification performance among all the models tested in this study, with an obtained accuracy of 0.87. We show that the NLP approach can effectively identify and automate the text-based detection process of potential hypoglycemic events, and can subsequently be used to make informed decisions about potential patient risks.
{"title":"An application of natural language processing for hypoglycemic event identification in patients with diabetes mellitus","authors":"J.E. Camacho-Cogollo ,&nbsp;Cristhian Felipe Patiño Zambrano ,&nbsp;Christian Lochmuller ,&nbsp;Claudia C. Colmenares-Mejia ,&nbsp;Nicolas Rozo ,&nbsp;Mario A. Isaza-Ruget ,&nbsp;Paul Rodriguez ,&nbsp;Andrés García","doi":"10.1016/j.health.2024.100381","DOIUrl":"10.1016/j.health.2024.100381","url":null,"abstract":"<div><div>The therapeutic goal for diabetes mellitus is to maintain normal blood glucose levels, but in some cases, hypoglycemia may occur as a consequence of treatment. Identifying patients with hypoglycemia is critical to preventing adverse events and mortality. However, hypoglycemic events are often not accurately documented in electronic health records (EHRs). This study presents a retrospective analysis of the EHRs of patients with diabetes mellitus. We hypothesize that text analytics and machine learning can identify possible hypoglycemic incidents from unstructured physician notes in electronic health records. Our analysis applies these techniques using the Python programming language as a tool. It also considers words that describe symptoms related to hypoglycemia. The analysis involves searching physicians' notes for keywords and applying supervised classification methods to 146,542 records. Natural language processing (NLP) and machine learning algorithms are used to identify possible hypoglycemic events and related symptoms in physicians’ notes. A multi-layer perceptron (MLP) model produces the best classification performance among all the models tested in this study, with an obtained accuracy of 0.87. We show that the NLP approach can effectively identify and automate the text-based detection process of potential hypoglycemic events, and can subsequently be used to make informed decisions about potential patient risks.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100381"},"PeriodicalIF":0.0,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An automated information extraction model for unstructured discharge letters using large language models and GPT-4
Pub Date : 2025-01-10 DOI: 10.1016/j.health.2024.100378
Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch
The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.
{"title":"An automated information extraction model for unstructured discharge letters using large language models and GPT-4","authors":"Robert M. Siepmann ,&nbsp;Giulia Baldini ,&nbsp;Cynthia S. Schmidt ,&nbsp;Daniel Truhn ,&nbsp;Gustav Anton Müller-Franzes ,&nbsp;Amin Dada ,&nbsp;Jens Kleesiek ,&nbsp;Felix Nensa ,&nbsp;René Hosch","doi":"10.1016/j.health.2024.100378","DOIUrl":"10.1016/j.health.2024.100378","url":null,"abstract":"<div><div>The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100378"},"PeriodicalIF":0.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optimal control model with sensitivity analysis for COVID-19 transmission using logistic recruitment rate
Pub Date : 2025-01-08 DOI: 10.1016/j.health.2024.100375
Jonner Nainggolan , Moch. Fandi Ansori , Hengki Tasman
This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number R0. This implies that equilibrium is stable when R0 is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when R0 exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of R0 are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.
{"title":"An optimal control model with sensitivity analysis for COVID-19 transmission using logistic recruitment rate","authors":"Jonner Nainggolan ,&nbsp;Moch. Fandi Ansori ,&nbsp;Hengki Tasman","doi":"10.1016/j.health.2024.100375","DOIUrl":"10.1016/j.health.2024.100375","url":null,"abstract":"<div><div>This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>. This implies that equilibrium is stable when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100375"},"PeriodicalIF":0.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deterministic compartmental model for optimal control strategies of Giardiasis infection with saturating incidence and environmental dynamics
Pub Date : 2025-01-07 DOI: 10.1016/j.health.2025.100383
Stephen Edward , Nyimvua Shaban
This study develops a deterministic compartmental model that tracks Giardiasis’s direct and indirect transmission dynamics. The study begins by constructing a model incorporating four constant controls: health education, screening, hospitalization, and sanitation. The analytical results of the model are investigated and presented. The positivity of the solutions and the existence of invariant regions were established. The model exhibits a unique disease-free equilibrium and multiple endemic equilibria. The effective reproduction number was derived using the Next-Generation Matrix (NGM) approach, and its implications for the stability of the equilibria were explored. Local stability of the disease-free equilibrium was confirmed using the Routh–Hurwitz criteria, while global stability results were also presented. Sensitivity analysis was conducted based on the effective reproduction number, identifying the most influential parameters. We introduce an optimal control problem to curb the spread of Giardiasis. We rigorously establish the existence of optimal control solutions and analytically characterize these solutions using Pontryagin’s Maximum Principle. We conduct numerical simulations to evaluate the effectiveness of various control strategies. The results are promising, showing that the simultaneous implementation of all four control measures, education, screening, treatment, and sanitation, can lead to a significant reduction in disease cases, thereby offering a reassuring solution to the spread of Giardiasis.
{"title":"Deterministic compartmental model for optimal control strategies of Giardiasis infection with saturating incidence and environmental dynamics","authors":"Stephen Edward ,&nbsp;Nyimvua Shaban","doi":"10.1016/j.health.2025.100383","DOIUrl":"10.1016/j.health.2025.100383","url":null,"abstract":"<div><div>This study develops a deterministic compartmental model that tracks Giardiasis’s direct and indirect transmission dynamics. The study begins by constructing a model incorporating four constant controls: health education, screening, hospitalization, and sanitation. The analytical results of the model are investigated and presented. The positivity of the solutions and the existence of invariant regions were established. The model exhibits a unique disease-free equilibrium and multiple endemic equilibria. The effective reproduction number was derived using the Next-Generation Matrix (NGM) approach, and its implications for the stability of the equilibria were explored. Local stability of the disease-free equilibrium was confirmed using the Routh–Hurwitz criteria, while global stability results were also presented. Sensitivity analysis was conducted based on the effective reproduction number, identifying the most influential parameters. We introduce an optimal control problem to curb the spread of Giardiasis. We rigorously establish the existence of optimal control solutions and analytically characterize these solutions using Pontryagin’s Maximum Principle. We conduct numerical simulations to evaluate the effectiveness of various control strategies. The results are promising, showing that the simultaneous implementation of all four control measures, education, screening, treatment, and sanitation, can lead to a significant reduction in disease cases, thereby offering a reassuring solution to the spread of Giardiasis.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100383"},"PeriodicalIF":0.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exploration of machine learning approaches for early Autism Spectrum Disorder detection
Pub Date : 2025-01-06 DOI: 10.1016/j.health.2024.100379
Nawshin Haque, Tania Islam, Md Erfan
Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100% for Support Vector Classifier and 99.80% for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100% for Support Vector Classifier and 99.96% for Logistic Regression. Furthermore, all algorithms achieved 100% accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100% accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.
{"title":"An exploration of machine learning approaches for early Autism Spectrum Disorder detection","authors":"Nawshin Haque,&nbsp;Tania Islam,&nbsp;Md Erfan","doi":"10.1016/j.health.2024.100379","DOIUrl":"10.1016/j.health.2024.100379","url":null,"abstract":"<div><div>Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.80<span><math><mtext>%</mtext></math></span> for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.96<span><math><mtext>%</mtext></math></span> for Logistic Regression. Furthermore, all algorithms achieved 100<span><math><mtext>%</mtext></math></span> accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100<span><math><mtext>%</mtext></math></span> accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100379"},"PeriodicalIF":0.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A large-scale risk assessment and classification model for pneumococcus using Finnish national health data
Pub Date : 2025-01-03 DOI: 10.1016/j.health.2025.100382
Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju
Streptococcus pneumoniae, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.
{"title":"A large-scale risk assessment and classification model for pneumococcus using Finnish national health data","authors":"Viljami Männikkö ,&nbsp;Juha Turunen ,&nbsp;Heidi Åhman ,&nbsp;Esa Harju","doi":"10.1016/j.health.2025.100382","DOIUrl":"10.1016/j.health.2025.100382","url":null,"abstract":"<div><div><em>Streptococcus pneumoniae</em>, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100382"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative assessment of machine learning models and algorithms for osteosarcoma cancer detection and classification
Pub Date : 2025-01-02 DOI: 10.1016/j.health.2024.100380
Amoakoh Gyasi-Agyei
Osteosarcoma is a bone-forming tumor that is more common in children and young adults than in adults. Timely detection and classification of its type is crucial to its proper treatment and possible survival. Machine learning (ML) models trained on disease datasets are more effective in detection and classification than the conventional methods with hand-crafted features highly dependent on pathologists’ expertise. A publicly available raw osteosarcoma dataset was explored and then preprocessed using different combinations of data denoising techniques (including principal component analysis, mutual information gain, analysis of variance and Kendall’s rank correlation analysis) and data augmentation to derive seven different datasets. Using the seven derived datasets and eight ML algorithms, this study designed and performed an extensive comparative analysis of seven sets of ML models (altogether over 160 models) with their hyperparameters optimized using grid search. The performance differences between the learned ML models were then validated using repeated stratified 10-fold cross-validation and 5x2 cross-validation paired t-tests to select the best model for our task. The empirical model based on the extra trees algorithm and fitted to class-balanced dataset via random oversampling and multicollinearity removed via principal component analysis proved to be the best, as it detected and classified osteosarcoma cancer in 10 ms with 97.8% area under the receiver operating characteristics curve and acceptably low false alarm and misdetection. Thus, the proposed models can be cutting-edge techniques for automated detection and classification of osteosarcoma tumors to aid timely diagnosis, prognosis, and treatment.
{"title":"A comparative assessment of machine learning models and algorithms for osteosarcoma cancer detection and classification","authors":"Amoakoh Gyasi-Agyei","doi":"10.1016/j.health.2024.100380","DOIUrl":"10.1016/j.health.2024.100380","url":null,"abstract":"<div><div>Osteosarcoma is a bone-forming tumor that is more common in children and young adults than in adults. Timely detection and classification of its type is crucial to its proper treatment and possible survival. Machine learning (ML) models trained on disease datasets are more effective in detection and classification than the conventional methods with hand-crafted features highly dependent on pathologists’ expertise. A publicly available raw osteosarcoma dataset was explored and then preprocessed using different combinations of data denoising techniques (including principal component analysis, mutual information gain, analysis of variance and Kendall’s rank correlation analysis) and data augmentation to <em>derive</em> seven different datasets. Using the seven derived datasets and eight ML algorithms, this study designed and performed an extensive comparative analysis of seven sets of ML models (altogether over 160 models) with their hyperparameters optimized using grid search. The performance differences between the learned ML models were then validated using repeated stratified 10-fold cross-validation and 5x2 cross-validation paired <em>t</em>-tests to select the best model for our task. The empirical model based on the extra trees algorithm and fitted to class-balanced dataset via random oversampling and multicollinearity removed via principal component analysis proved to be the best, as it detected and classified osteosarcoma cancer in 10 ms with 97.8% area under the receiver operating characteristics curve and acceptably low false alarm and misdetection. Thus, the proposed models can be cutting-edge techniques for automated detection and classification of osteosarcoma tumors to aid timely diagnosis, prognosis, and treatment.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100380"},"PeriodicalIF":0.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient blood supply chain network model with multiple echelons for managing outdated products
Pub Date : 2024-12-18 DOI: 10.1016/j.health.2024.100377
Agus Mansur , Ivan Darma Wangsa , Novrianty Rizky , Iwan Vanany
This study examines the lack of coordination between blood production and inventories in the blood supply chain networks. Prior studies neglect to optimize operational costs through blood production, inventory, and waste. We propose a mixed-integer linear programming approach addressing multiple echelons, types of blood, and blood bag shelf lifetime. The model is developed by determining the facility locations, assigning regional blood banks, and allocating the right products. Indonesia's blood supply chain is used as a case study to evaluate the applicability of the proposed model using optimization software. A sensitivity analysis is performed on production rate and patient demand to assess how these factors affect the overall cost of expired products. The results show that the proposed method's total cost and expired products are 4.69%–5.60% and 4.71%–5.75%, respectively.
{"title":"An efficient blood supply chain network model with multiple echelons for managing outdated products","authors":"Agus Mansur ,&nbsp;Ivan Darma Wangsa ,&nbsp;Novrianty Rizky ,&nbsp;Iwan Vanany","doi":"10.1016/j.health.2024.100377","DOIUrl":"10.1016/j.health.2024.100377","url":null,"abstract":"<div><div>This study examines the lack of coordination between blood production and inventories in the blood supply chain networks. Prior studies neglect to optimize operational costs through blood production, inventory, and waste. We propose a mixed-integer linear programming approach addressing multiple echelons, types of blood, and blood bag shelf lifetime. The model is developed by determining the facility locations, assigning regional blood banks, and allocating the right products. Indonesia's blood supply chain is used as a case study to evaluate the applicability of the proposed model using optimization software. A sensitivity analysis is performed on production rate and patient demand to assess how these factors affect the overall cost of expired products. The results show that the proposed method's total cost and expired products are 4.69%–5.60% and 4.71%–5.75%, respectively.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100377"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An enhanced machine learning approach with stacking ensemble learner for accurate liver cancer diagnosis using feature selection and gene expression data
Pub Date : 2024-12-12 DOI: 10.1016/j.health.2024.100373
Amena Mahmoud , Eiko Takaoka
Liver cancer is a significant global health concern, necessitating accurate and timely diagnosis for effective treatment. Machine learning approaches have emerged as promising tools for improving liver cancer classification using gene expression data in recent years. This study presents an advanced machine learning approach for liver cancer diagnosis using gene expression data, combining feature selection techniques with a stacking ensemble learning model. Our method addresses the challenges of high dimensionality and complex patterns in genomic data to improve diagnostic accuracy and interpretability. We employed a feature selection process to identify the most relevant gene expressions associated with liver cancer. This approach reduced the dimensionality of the data while preserving crucial biological information. The selected features were then used to train a stacking ensemble model, which combined multiple base learners, including Multi-Layer Perceptron (MLP), Random Forest (RF) model, K-nearest neighbor (KNN) model, and Support vector machine (SVM), with a meta-learner Extreme Gradient Boosting (Xgboost) model to make final predictions. The stacking ensemble achieved an accuracy of (97%), outperforming individual machine learning algorithms and traditional diagnostic methods. Furthermore, the model demonstrated high sensitivity (96.8%) and specificity (98.1%), crucial for early detection and minimizing false positives.
{"title":"An enhanced machine learning approach with stacking ensemble learner for accurate liver cancer diagnosis using feature selection and gene expression data","authors":"Amena Mahmoud ,&nbsp;Eiko Takaoka","doi":"10.1016/j.health.2024.100373","DOIUrl":"10.1016/j.health.2024.100373","url":null,"abstract":"<div><div>Liver cancer is a significant global health concern, necessitating accurate and timely diagnosis for effective treatment. Machine learning approaches have emerged as promising tools for improving liver cancer classification using gene expression data in recent years. This study presents an advanced machine learning approach for liver cancer diagnosis using gene expression data, combining feature selection techniques with a stacking ensemble learning model. Our method addresses the challenges of high dimensionality and complex patterns in genomic data to improve diagnostic accuracy and interpretability. We employed a feature selection process to identify the most relevant gene expressions associated with liver cancer. This approach reduced the dimensionality of the data while preserving crucial biological information. The selected features were then used to train a stacking ensemble model, which combined multiple base learners, including Multi-Layer Perceptron (MLP), Random Forest (RF) model, K-nearest neighbor (KNN) model, and Support vector machine (SVM), with a meta-learner Extreme Gradient Boosting (Xgboost) model to make final predictions. The stacking ensemble achieved an accuracy of (97%), outperforming individual machine learning algorithms and traditional diagnostic methods. Furthermore, the model demonstrated high sensitivity (96.8%) and specificity (98.1%), crucial for early detection and minimizing false positives.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100373"},"PeriodicalIF":0.0,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Healthcare analytics (New York, N.Y.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1