Healthcare analytics (New York, N.Y.)最新文献

英文中文

An application of natural language processing for hypoglycemic event identification in patients with diabetes mellitus 自然语言处理在糖尿病患者低血糖事件识别中的应用

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-21 DOI: 10.1016/j.health.2024.100381

J.E. Camacho-Cogollo , Cristhian Felipe Patiño Zambrano , Christian Lochmuller , Claudia C. Colmenares-Mejia , Nicolas Rozo , Mario A. Isaza-Ruget , Paul Rodriguez , Andrés García

The therapeutic goal for diabetes mellitus is to maintain normal blood glucose levels, but in some cases, hypoglycemia may occur as a consequence of treatment. Identifying patients with hypoglycemia is critical to preventing adverse events and mortality. However, hypoglycemic events are often not accurately documented in electronic health records (EHRs). This study presents a retrospective analysis of the EHRs of patients with diabetes mellitus. We hypothesize that text analytics and machine learning can identify possible hypoglycemic incidents from unstructured physician notes in electronic health records. Our analysis applies these techniques using the Python programming language as a tool. It also considers words that describe symptoms related to hypoglycemia. The analysis involves searching physicians' notes for keywords and applying supervised classification methods to 146,542 records. Natural language processing (NLP) and machine learning algorithms are used to identify possible hypoglycemic events and related symptoms in physicians’ notes. A multi-layer perceptron (MLP) model produces the best classification performance among all the models tested in this study, with an obtained accuracy of 0.87. We show that the NLP approach can effectively identify and automate the text-based detection process of potential hypoglycemic events, and can subsequently be used to make informed decisions about potential patient risks.

糖尿病的治疗目标是维持正常的血糖水平，但在某些情况下，治疗后可能出现低血糖。识别低血糖患者对于预防不良事件和死亡率至关重要。然而，低血糖事件通常不能准确地记录在电子健康记录（EHRs）中。本研究对糖尿病患者的电子病历进行回顾性分析。我们假设文本分析和机器学习可以从电子健康记录中的非结构化医生笔记中识别可能的低血糖事件。我们的分析使用Python编程语言作为工具来应用这些技术。它还考虑描述与低血糖相关症状的单词。该分析包括搜索医生笔记中的关键词，并对146,542条记录应用监督分类方法。自然语言处理（NLP）和机器学习算法用于识别医生记录中可能的低血糖事件和相关症状。在本研究测试的所有模型中，多层感知器（MLP）模型的分类性能最好，获得的准确率为0.87。我们表明，NLP方法可以有效地识别和自动化基于文本的潜在低血糖事件检测过程，并随后可用于对潜在的患者风险做出明智的决定。

{"title":"An application of natural language processing for hypoglycemic event identification in patients with diabetes mellitus","authors":"J.E. Camacho-Cogollo , Cristhian Felipe Patiño Zambrano , Christian Lochmuller , Claudia C. Colmenares-Mejia , Nicolas Rozo , Mario A. Isaza-Ruget , Paul Rodriguez , Andrés García","doi":"10.1016/j.health.2024.100381","DOIUrl":"10.1016/j.health.2024.100381","url":null,"abstract":"<div><div>The therapeutic goal for diabetes mellitus is to maintain normal blood glucose levels, but in some cases, hypoglycemia may occur as a consequence of treatment. Identifying patients with hypoglycemia is critical to preventing adverse events and mortality. However, hypoglycemic events are often not accurately documented in electronic health records (EHRs). This study presents a retrospective analysis of the EHRs of patients with diabetes mellitus. We hypothesize that text analytics and machine learning can identify possible hypoglycemic incidents from unstructured physician notes in electronic health records. Our analysis applies these techniques using the Python programming language as a tool. It also considers words that describe symptoms related to hypoglycemia. The analysis involves searching physicians' notes for keywords and applying supervised classification methods to 146,542 records. Natural language processing (NLP) and machine learning algorithms are used to identify possible hypoglycemic events and related symptoms in physicians’ notes. A multi-layer perceptron (MLP) model produces the best classification performance among all the models tested in this study, with an obtained accuracy of 0.87. We show that the NLP approach can effectively identify and automate the text-based detection process of potential hypoglycemic events, and can subsequently be used to make informed decisions about potential patient risks.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100381"},"PeriodicalIF":0.0,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An automated information extraction model for unstructured discharge letters using large language models and GPT-4 基于大语言模型和GPT-4的非结构化离职信自动信息提取模型

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-10 DOI: 10.1016/j.health.2024.100378

Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch

The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.

手动从出院信中提取临床信息的管理负担是医疗保健领域的一个常见挑战。本研究旨在探索使用大型语言模型（llm），特别是OpenAI的生成预训练转换器4 (GPT-4)，从出院信中自动提取诊断、药物和过敏。本研究的数据来自德国的两家医疗机构，包括每家机构10名患者的出院信。第一个实验是使用标准化提示进行信息提取。然而，遇到了挑战，在第二次实验中对提示进行了微调，以改善结果。我们进一步测试了开源llm是否可以达到类似的结果。在第一次实验中，原发性诊断的准确率为85%，继发性诊断的准确率为55.8%。药物和过敏反应的提取准确率分别为85.9%和100%。国际疾病分类第10版（ICD-10）对已确定诊断的编码，原发性诊断的准确率为85%，继发性诊断的准确率为60.7%。解剖治疗化学（ATC）编码的识别准确率为78.8%。另一方面，开源法学硕士没有提供类似的准确性，也不能始终如一地填充模板。在第二次实验中，通过及时的微调，初步诊断、二次诊断和药物预测的准确率分别为95%、88.9%和92.2%。GPT-4显示了从出院信中自动提取关键诊断和药物信息的巨大潜力，可能会降低医疗保健专业人员的管理负担，并改善患者的治疗效果。

{"title":"An automated information extraction model for unstructured discharge letters using large language models and GPT-4","authors":"Robert M. Siepmann , Giulia Baldini , Cynthia S. Schmidt , Daniel Truhn , Gustav Anton Müller-Franzes , Amin Dada , Jens Kleesiek , Felix Nensa , René Hosch","doi":"10.1016/j.health.2024.100378","DOIUrl":"10.1016/j.health.2024.100378","url":null,"abstract":"<div><div>The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100378"},"PeriodicalIF":0.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An optimal control model with sensitivity analysis for COVID-19 transmission using logistic recruitment rate 基于logistic招募率的COVID-19传播最优控制模型及敏感性分析

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-08 DOI: 10.1016/j.health.2024.100375

Jonner Nainggolan , Moch. Fandi Ansori , Hengki Tasman

This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number

R_{0}

. This implies that equilibrium is stable when

R_{0}

is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when

R_{0}

exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of

R_{0}

are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.

本研究提出了一个包含logistic招募率的COVID-19传播最优控制模型。观察结果表明，当种群存在阈值超过1时，存在无病平衡。平衡的稳定性由基本繁殖数R0决定。这意味着当R0小于等于1时平衡是稳定的，当R0大于1时平衡是不稳定的。此外，当R0超过1时，记录了地方性平衡和稳定性。为了确定COVID-19传播的影响因素，进行了敏感性指数和R0的敏感性分析。该模型完美地结合了预防和治疗控制。因此，数值模拟结果表明，预防控制比治疗控制在减少COVID-19传播方面更有效。此外，预防和治疗控制同时实施，在缓解COVID-19传播方面优于单独控制方法。最后，在恒定控制下进行的敏感性分析显示了控制对疾病动力学的贡献。

{"title":"An optimal control model with sensitivity analysis for COVID-19 transmission using logistic recruitment rate","authors":"Jonner Nainggolan , Moch. Fandi Ansori , Hengki Tasman","doi":"10.1016/j.health.2024.100375","DOIUrl":"10.1016/j.health.2024.100375","url":null,"abstract":"<div><div>This study proposes an optimal control model for COVID-19 spread, incorporating a logistic recruitment rate. The observations show the disease-free equilibrium exists when the population-existing threshold exceeds 1. The stability of equilibrium is determined by the basic reproduction number <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span>. This implies that equilibrium is stable when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> is less than or equal to 1, but it is unstable when the value is greater than 1. Furthermore, an endemic equilibrium and stability is recorded when <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> exceeds 1. To identify influential factors in COVID-19 spread, sensitivity index and sensitivity analyses of <span><math><msub><mrow><mi>R</mi></mrow><mrow><mn>0</mn></mrow></msub></math></span> are conducted. The model perfectly integrates both prevention and therapy controls. As a result, numerical simulations show that the prevention control is more effective than the treatment control in reducing COVID-19 spread. Moreover, the simultaneous implementation of prevention and treatment controls outperforms individual control methods in mitigating COVID-19 spread. Finally, sensitivity analysis conducted with constant controls shows the contributions of the controls to disease dynamics.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100375"},"PeriodicalIF":0.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deterministic compartmental model for optimal control strategies of Giardiasis infection with saturating incidence and environmental dynamics 具有饱和发病率和环境动态的贾第虫病感染最优控制策略的确定性室室模型

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-07 DOI: 10.1016/j.health.2025.100383

Stephen Edward , Nyimvua Shaban

This study develops a deterministic compartmental model that tracks Giardiasis’s direct and indirect transmission dynamics. The study begins by constructing a model incorporating four constant controls: health education, screening, hospitalization, and sanitation. The analytical results of the model are investigated and presented. The positivity of the solutions and the existence of invariant regions were established. The model exhibits a unique disease-free equilibrium and multiple endemic equilibria. The effective reproduction number was derived using the Next-Generation Matrix (NGM) approach, and its implications for the stability of the equilibria were explored. Local stability of the disease-free equilibrium was confirmed using the Routh–Hurwitz criteria, while global stability results were also presented. Sensitivity analysis was conducted based on the effective reproduction number, identifying the most influential parameters. We introduce an optimal control problem to curb the spread of Giardiasis. We rigorously establish the existence of optimal control solutions and analytically characterize these solutions using Pontryagin’s Maximum Principle. We conduct numerical simulations to evaluate the effectiveness of various control strategies. The results are promising, showing that the simultaneous implementation of all four control measures, education, screening, treatment, and sanitation, can lead to a significant reduction in disease cases, thereby offering a reassuring solution to the spread of Giardiasis.

本研究开发了一个确定性的室室模型，跟踪贾第虫病的直接和间接传播动力学。该研究首先构建了一个包含四个恒定控制因素的模型：健康教育、筛查、住院和卫生。对模型的分析结果进行了研究和介绍。证明了解的正性和不变量区域的存在性。该模型具有独特的无病平衡和多个地方性平衡。利用新一代矩阵（NGM）方法推导了有效繁殖数，并探讨了其对平衡稳定性的影响。利用Routh-Hurwitz准则证实了无病平衡的局部稳定性，同时也给出了全局稳定性结果。根据有效繁殖数进行敏感性分析，找出影响最大的参数。我们引入一个最优控制问题来抑制贾第虫病的传播。我们严格地建立了最优控制解的存在性，并利用庞特里亚金极大值原理对这些解进行了解析表征。我们通过数值模拟来评估各种控制策略的有效性。结果令人鼓舞，表明同时实施所有四项控制措施，即教育、筛查、治疗和卫生，可导致疾病病例显著减少，从而为贾第虫病的传播提供了一种令人放心的解决方案。

{"title":"Deterministic compartmental model for optimal control strategies of Giardiasis infection with saturating incidence and environmental dynamics","authors":"Stephen Edward , Nyimvua Shaban","doi":"10.1016/j.health.2025.100383","DOIUrl":"10.1016/j.health.2025.100383","url":null,"abstract":"<div><div>This study develops a deterministic compartmental model that tracks Giardiasis’s direct and indirect transmission dynamics. The study begins by constructing a model incorporating four constant controls: health education, screening, hospitalization, and sanitation. The analytical results of the model are investigated and presented. The positivity of the solutions and the existence of invariant regions were established. The model exhibits a unique disease-free equilibrium and multiple endemic equilibria. The effective reproduction number was derived using the Next-Generation Matrix (NGM) approach, and its implications for the stability of the equilibria were explored. Local stability of the disease-free equilibrium was confirmed using the Routh–Hurwitz criteria, while global stability results were also presented. Sensitivity analysis was conducted based on the effective reproduction number, identifying the most influential parameters. We introduce an optimal control problem to curb the spread of Giardiasis. We rigorously establish the existence of optimal control solutions and analytically characterize these solutions using Pontryagin’s Maximum Principle. We conduct numerical simulations to evaluate the effectiveness of various control strategies. The results are promising, showing that the simultaneous implementation of all four control measures, education, screening, treatment, and sanitation, can lead to a significant reduction in disease cases, thereby offering a reassuring solution to the spread of Giardiasis.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100383"},"PeriodicalIF":0.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An exploration of machine learning approaches for early Autism Spectrum Disorder detection 机器学习方法在早期自闭症谱系障碍检测中的探索

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-06 DOI: 10.1016/j.health.2024.100379

Nawshin Haque, Tania Islam, Md Erfan

Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100

%

for Support Vector Classifier and 99.80

%

for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100

%

for Support Vector Classifier and 99.96

%

for Logistic Regression. Furthermore, all algorithms achieved 100

%

accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100

%

accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.

自闭症谱系障碍是一种影响个体重复行为、社交技能、语言和非语言沟通能力以及获取新知识能力的神经发育疾病。自闭症的症状通常表现在儿童早期，特别是在6个月到5岁之间，随着时间的推移，自闭症的症状表现出渐进的性质。本研究探讨了逻辑回归、支持向量分类器、k近邻、决策树和随机森林在预测儿童和幼儿自闭症方面的应用，利用机器学习的进步。使用针对两个年龄组的可公开访问的数据集来评估这些技术的有效性。研究结果显示了显著的性能，幼儿数据集实现了100%的支持向量分类器和99.80%的逻辑回归的平均交集（mIoU）。同样，儿童数据集也显示出出色的结果，支持向量分类器的mIoU为100%，逻辑回归的mIoU为99.96%。此外，所有算法在从现实世界中收集的儿童（4-11岁）数据集上都达到了100%的准确率。逻辑回归、随机森林、支持向量分类器和决策树在真实数据集上达到100%的准确率和mIoU。这些结果强调了机器学习在帮助儿童和幼儿早期发现ASD方面的潜力，为未来的研究和临床应用提供了有希望的途径。

{"title":"An exploration of machine learning approaches for early Autism Spectrum Disorder detection","authors":"Nawshin Haque, Tania Islam, Md Erfan","doi":"10.1016/j.health.2024.100379","DOIUrl":"10.1016/j.health.2024.100379","url":null,"abstract":"<div><div>Autism Spectrum Disorder is a neurodevelopmental condition impacting an individual’s repetitive behaviours, social skills, verbal and nonverbal communication abilities, and capacity for acquiring new knowledge. Manifesting typically in early childhood, specifically between 6 months and 5 years, the symptoms of autism exhibit a progressive nature over time. This study explores the application of Logistic Regression, Support Vector Classifier, K-Nearest Neighbour, Decision Tree, and Random Forest for predicting Autism in children and toddlers by leveraging advancements in machine learning. The efficacy of these techniques is evaluated using publicly accessible datasets specific to both age groups. The findings indicate remarkable performance, with the toddler dataset achieving a mean Intersection over Union (mIoU) of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.80<span><math><mtext>%</mtext></math></span> for Logistic Regression. Similarly, the children dataset demonstrates outstanding results, achieving an mIoU of 100<span><math><mtext>%</mtext></math></span> for Support Vector Classifier and 99.96<span><math><mtext>%</mtext></math></span> for Logistic Regression. Furthermore, all algorithms achieved 100<span><math><mtext>%</mtext></math></span> accuracy on the children (age 4–11) dataset collected from real-world sources. Logistic Regression, Random Forest, Support Vector Classifier, and Decision Tree attained 100<span><math><mtext>%</mtext></math></span> accuracy and mIoU with the real-world dataset. These results underscore the potential of machine learning in aiding the early detection of ASD in children and toddlers, offering promising avenues for future research and clinical applications.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100379"},"PeriodicalIF":0.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A large-scale risk assessment and classification model for pneumococcus using Finnish national health data 基于芬兰国家卫生数据的肺炎球菌大规模风险评估和分类模型

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-03 DOI: 10.1016/j.health.2025.100382

Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju

Streptococcus pneumoniae, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.

肺炎链球菌或肺炎球菌具有重大的健康风险，特别是对婴儿、老年人和有潜在疾病的个体。在芬兰，肺炎球菌疫苗接种是国家免疫规划的一部分，仅向幼儿和选定的高危成年人口提供疫苗接种。本研究旨在利用芬兰国家电子健康记录系统Kanta来分析治疗史，并识别疾病风险增加的个体，以改进疫苗接种策略。Kanta提供了一个全面的、全国性的患者治疗史数据库，可用于跟踪个人风险因素和疾病发作。我们根据芬兰卫生与福利研究所和世界卫生组织的指导方针，分析了96,200名具有肺炎球菌疾病危险因素的芬兰居民的健康数据。我们根据已确定的危险因素对个体进行分类，优先为风险最大的人群接种疫苗。这项研究证明了利用国家健康记录数据进行大规模风险分析的潜力，从而允许制定更有针对性和更有效的疫苗接种战略。我们的方法的新颖之处在于自动识别高危人群，这可以为公共卫生措施提供信息，并在人群层面加强对肺炎球菌疾病风险的监测。

{"title":"A large-scale risk assessment and classification model for pneumococcus using Finnish national health data","authors":"Viljami Männikkö , Juha Turunen , Heidi Åhman , Esa Harju","doi":"10.1016/j.health.2025.100382","DOIUrl":"10.1016/j.health.2025.100382","url":null,"abstract":"<div><div><em>Streptococcus pneumoniae</em>, or pneumococcus, poses a significant health risk, particularly to infants, the elderly, and individuals with underlying medical conditions. In Finland, pneumococcal vaccination is part of the national immunization program, with vaccination provided to young children and only selected at-risk adult populations included. This study aims to leverage the Finnish national electronic health record system, Kanta, to analyze treatment histories and identify individuals at increased risk for disease to improve vaccination strategies. Kanta provides a comprehensive, nationwide database of patient treatment histories, which can be utilized to track individual risk factors and disease episodes. We analyzed health data from 96,200 Finnish residents with risk factors for pneumococcal disease following guidelines from the Finnish Institute for Health and Welfare and the World Health Organization. We prioritize vaccination for those at the greatest risk by categorizing individuals based on their identified risk factors. This study demonstrates the potential for using national health record data to conduct large-scale risk analyses, allowing for more targeted and efficient vaccination strategies. The novelty of our approach lies in the automatic identification of high-risk individuals, which can inform public health initiatives and enhance the monitoring of pneumococcal disease risk at a population level.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100382"},"PeriodicalIF":0.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative assessment of machine learning models and algorithms for osteosarcoma cancer detection and classification 骨肉瘤癌症检测和分类的机器学习模型和算法的比较评估

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-02 DOI: 10.1016/j.health.2024.100380

Amoakoh Gyasi-Agyei

Osteosarcoma is a bone-forming tumor that is more common in children and young adults than in adults. Timely detection and classification of its type is crucial to its proper treatment and possible survival. Machine learning (ML) models trained on disease datasets are more effective in detection and classification than the conventional methods with hand-crafted features highly dependent on pathologists’ expertise. A publicly available raw osteosarcoma dataset was explored and then preprocessed using different combinations of data denoising techniques (including principal component analysis, mutual information gain, analysis of variance and Kendall’s rank correlation analysis) and data augmentation to derive seven different datasets. Using the seven derived datasets and eight ML algorithms, this study designed and performed an extensive comparative analysis of seven sets of ML models (altogether over 160 models) with their hyperparameters optimized using grid search. The performance differences between the learned ML models were then validated using repeated stratified 10-fold cross-validation and 5x2 cross-validation paired t-tests to select the best model for our task. The empirical model based on the extra trees algorithm and fitted to class-balanced dataset via random oversampling and multicollinearity removed via principal component analysis proved to be the best, as it detected and classified osteosarcoma cancer in 10 ms with 97.8% area under the receiver operating characteristics curve and acceptably low false alarm and misdetection. Thus, the proposed models can be cutting-edge techniques for automated detection and classification of osteosarcoma tumors to aid timely diagnosis, prognosis, and treatment.

骨肉瘤是一种骨形成肿瘤，在儿童和年轻人中比在成人中更常见。及时发现和分类其类型对其适当治疗和可能的生存至关重要。在疾病数据集上训练的机器学习（ML）模型在检测和分类方面比具有高度依赖病理学家专业知识的手工特征的传统方法更有效。研究人员探索了一个公开可用的原始骨肉瘤数据集，然后使用不同的数据去噪技术组合（包括主成分分析、互信息增益、方差分析和肯德尔秩相关分析）和数据增强进行预处理，得出七个不同的数据集。利用七个衍生数据集和八种机器学习算法，本研究设计并对七组机器学习模型（总共超过160个模型）进行了广泛的比较分析，并使用网格搜索优化了它们的超参数。然后使用重复分层10倍交叉验证和5倍交叉验证配对t检验验证学习ML模型之间的性能差异，以选择最适合我们任务的模型。基于额外树算法并通过随机过采样和主成分分析去除多重共线性拟合到类平衡数据集的经验模型被证明是最好的，因为它在10 ms内检测和分类骨肉瘤癌症，接受者工作特征曲线下面积为97.8%，可接受的低虚警和误检。因此，所提出的模型可以成为骨肉瘤肿瘤自动检测和分类的前沿技术，有助于及时诊断、预后和治疗。

{"title":"A comparative assessment of machine learning models and algorithms for osteosarcoma cancer detection and classification","authors":"Amoakoh Gyasi-Agyei","doi":"10.1016/j.health.2024.100380","DOIUrl":"10.1016/j.health.2024.100380","url":null,"abstract":"<div><div>Osteosarcoma is a bone-forming tumor that is more common in children and young adults than in adults. Timely detection and classification of its type is crucial to its proper treatment and possible survival. Machine learning (ML) models trained on disease datasets are more effective in detection and classification than the conventional methods with hand-crafted features highly dependent on pathologists’ expertise. A publicly available raw osteosarcoma dataset was explored and then preprocessed using different combinations of data denoising techniques (including principal component analysis, mutual information gain, analysis of variance and Kendall’s rank correlation analysis) and data augmentation to <em>derive</em> seven different datasets. Using the seven derived datasets and eight ML algorithms, this study designed and performed an extensive comparative analysis of seven sets of ML models (altogether over 160 models) with their hyperparameters optimized using grid search. The performance differences between the learned ML models were then validated using repeated stratified 10-fold cross-validation and 5x2 cross-validation paired <em>t</em>-tests to select the best model for our task. The empirical model based on the extra trees algorithm and fitted to class-balanced dataset via random oversampling and multicollinearity removed via principal component analysis proved to be the best, as it detected and classified osteosarcoma cancer in 10 ms with 97.8% area under the receiver operating characteristics curve and acceptably low false alarm and misdetection. Thus, the proposed models can be cutting-edge techniques for automated detection and classification of osteosarcoma tumors to aid timely diagnosis, prognosis, and treatment.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100380"},"PeriodicalIF":0.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143169863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-01

引用次数: 0

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-01

引用次数: 0

Healthcare analytics (New York, N.Y.)

Pub Date : 2025-01-01

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Healthcare analytics (New York, N.Y.)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀