首页 > 最新文献

BMC Medical Informatics and Decision Making最新文献

英文 中文
Informatics assessment of COVID-19 data collection: an analysis of UK Biobank questionnaire data. COVID-19 数据收集的信息学评估:英国生物库问卷数据分析。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-31 DOI: 10.1186/s12911-024-02743-5
Craig S Mayer

Background: There have been many efforts to expand existing data collection initiatives to include COVID-19 related data. One program that expanded is UK Biobank, a large-scale research and biomedical data collection resource that added several COVID-19 related data fields including questionnaires (exposures and symptoms), viral testing, and serological data. This study aimed to analyze this COVID-19 data to understand how COVID-19 data was collected and how it can be used to attribute COVID-19 and analyze differences in cohorts and time periods.

Methods: A cohort of COVID-19 infected individuals was defined from the UK Biobank population using viral testing, diagnosis, and self-reported data. Changes over time, from March 2020 to October 2021, in total case counts and changes in case counts by identification source (diagnosis from EHR, measurement from viral testing and self-reported from questionnaire) were also analyzed. For the questionnaires, an analysis of the structure and dynamics of the questionnaires was done which included the amount and type of questions asked, how often and how many individuals answered the questions and what responses were given. In addition, the amount of individuals who provided responses regarding different time segments covered by the questionnaire was calculated along with how often responses changed. The analysis included changes in population level responses over time. The analyses were repeated for COVID and non-COVID individuals and compared responses.

Results: There were 62 042 distinct participants who had COVID-19, with 49 120 identified through diagnosis, 30 553 identified through viral testing and 934 identified through self-reporting, with many identified in multiple methods. This included vast changes in overall cases and distribution of case data source over time. 6 899 of 9 952 participants completing the exposure questionnaire responded regarding every time period covered by the questionnaire including large changes in response over time. The most common change came for employment situation, which was changed by 74.78% of individuals from the first to last time of asking. On a population level, there were changes as face mask usage increased each successive time period. There were decreases in nearly every COVID-19 symptom from the first to the second questionnaire. When comparing COVID to non-COVID participants, COVID participants were more commonly keyworkers (COVID: 33.76%, non-COVID: 15.00%) and more often lived with young people attending school (61.70%, 45.32%).

Conclusion: To develop a robust cohort of COVID-19 participants from the UK Biobank population, multiple types of data were needed. The differences based on time and exposures show the important of comprehensive data capture and the utility of COVID-19 related questionnaire data.

背景:人们一直在努力扩展现有的数据收集计划,以纳入 COVID-19 相关数据。英国生物库(UK Biobank)是一个大规模的研究和生物医学数据收集资源,它增加了几个与 COVID-19 相关的数据字段,包括问卷调查(暴露和症状)、病毒检测和血清学数据。本研究旨在分析 COVID-19 数据,以了解 COVID-19 数据是如何收集的,以及如何将其用于归因 COVID-19 和分析不同队列和时间段的差异:方法:利用病毒检测、诊断和自我报告数据,从英国生物库人群中定义 COVID-19 感染者队列。此外,还分析了 2020 年 3 月至 2021 年 10 月期间总病例数的变化以及按识别来源(电子病历的诊断、病毒检测的测量和问卷调查的自我报告)划分的病例数变化。在问卷调查方面,我们对问卷调查的结构和动态进行了分析,其中包括所提问题的数量和类型、回答问题的频率和人数以及回答的内容。此外,还计算了就问卷所涵盖的不同时间段提供答复的人数,以及答复的变化频率。分析包括人口层面的回答随时间的变化。对 COVID 和非 COVID 个人重复进行了分析,并对回答进行了比较:结果:共有 62 042 名不同的参与者感染了 COVID-19,其中 49 120 人是通过诊断确定的,30 553 人是通过病毒检测确定的,934 人是通过自我报告确定的,许多人是通过多种方法确定的。这包括病例总数和病例数据源分布随时间推移而发生的巨大变化。在 9 952 名填写暴露情况调查问卷的参与者中,有 6 899 人对调查问卷所涵盖的每个时间段都作了回答,其中包括随时间推移在回答方面的巨大变化。最常见的变化是就业情况,74.78%的人从第一次询问到最后一次询问时就业情况发生了变化。从人口层面来看,随着口罩使用率的增加,每个时间段都有变化。从第一次问卷调查到第二次问卷调查,几乎所有 COVID-19 症状都有所减少。如果将 COVID 参与者与非 COVID 参与者进行比较,COVID 参与者更多是关键工作者(COVID:33.76%,非 COVID:15.00%),并且更多与上学的年轻人住在一起(61.70%,45.32%):要从英国生物库人口中建立一个强大的 COVID-19 参与者队列,需要多种类型的数据。基于时间和暴露的差异显示了全面数据采集的重要性以及 COVID-19 相关问卷数据的实用性。
{"title":"Informatics assessment of COVID-19 data collection: an analysis of UK Biobank questionnaire data.","authors":"Craig S Mayer","doi":"10.1186/s12911-024-02743-5","DOIUrl":"10.1186/s12911-024-02743-5","url":null,"abstract":"<p><strong>Background: </strong>There have been many efforts to expand existing data collection initiatives to include COVID-19 related data. One program that expanded is UK Biobank, a large-scale research and biomedical data collection resource that added several COVID-19 related data fields including questionnaires (exposures and symptoms), viral testing, and serological data. This study aimed to analyze this COVID-19 data to understand how COVID-19 data was collected and how it can be used to attribute COVID-19 and analyze differences in cohorts and time periods.</p><p><strong>Methods: </strong>A cohort of COVID-19 infected individuals was defined from the UK Biobank population using viral testing, diagnosis, and self-reported data. Changes over time, from March 2020 to October 2021, in total case counts and changes in case counts by identification source (diagnosis from EHR, measurement from viral testing and self-reported from questionnaire) were also analyzed. For the questionnaires, an analysis of the structure and dynamics of the questionnaires was done which included the amount and type of questions asked, how often and how many individuals answered the questions and what responses were given. In addition, the amount of individuals who provided responses regarding different time segments covered by the questionnaire was calculated along with how often responses changed. The analysis included changes in population level responses over time. The analyses were repeated for COVID and non-COVID individuals and compared responses.</p><p><strong>Results: </strong>There were 62 042 distinct participants who had COVID-19, with 49 120 identified through diagnosis, 30 553 identified through viral testing and 934 identified through self-reporting, with many identified in multiple methods. This included vast changes in overall cases and distribution of case data source over time. 6 899 of 9 952 participants completing the exposure questionnaire responded regarding every time period covered by the questionnaire including large changes in response over time. The most common change came for employment situation, which was changed by 74.78% of individuals from the first to last time of asking. On a population level, there were changes as face mask usage increased each successive time period. There were decreases in nearly every COVID-19 symptom from the first to the second questionnaire. When comparing COVID to non-COVID participants, COVID participants were more commonly keyworkers (COVID: 33.76%, non-COVID: 15.00%) and more often lived with young people attending school (61.70%, 45.32%).</p><p><strong>Conclusion: </strong>To develop a robust cohort of COVID-19 participants from the UK Biobank population, multiple types of data were needed. The differences based on time and exposures show the important of comprehensive data capture and the utility of COVID-19 related questionnaire data.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"321"},"PeriodicalIF":3.3,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142557229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modified multiple-criteria decision-making approach based on a protein-protein interaction network to diagnose latent tuberculosis. 基于蛋白质-蛋白质相互作用网络的改进型多标准决策方法,用于诊断潜伏肺结核。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-30 DOI: 10.1186/s12911-024-02668-z
Somayeh Ayalvari, Marjan Kaedi, Mohammadreza Sehhati

Background: DNA microarrays provide informative data for transcriptional profiling and identifying gene expression signatures to help prevent progression of latent tuberculosis infection (LTBI) to active disease. However, constructing a prognostic model for distinguishing LTBI from active tuberculosis (ATB) is very challenging due to the noisy nature of data and lack of a generally stable analysis approach.

Methods: In the present study, we proposed an accurate predictive model with the help of data fusion at the decision level. In this regard, results of filter feature selection and wrapper feature selection techniques were combined with multiple-criteria decision-making (MCDM) methods to select 10 genes from six microarray datasets that can be the most discriminative genes for diagnosing tuberculosis cases. As the main contribution of this study, the final ranking function was constructed by combining protein-protein interaction (PPI) network with an MCDM method (called Decision-making Trial and Evaluation Laboratory or DEMATEL) to improve the feature ranking approach.

Results: By applying data fusion at the decision level on the 10 introduced genes in terms of fusion of classifiers of random forests (RF) and k-nearest neighbors (KNN) regarding Yager's theory, the proposed algorithm reached a sensitivity of 0.97, specificity of 0.90, and accuracy of 0.95. Finally, with the help of cumulative clustering, the genes involved in the diagnosis of latent and activated tuberculosis have been introduced.

Conclusions: The combination of MCDM methods and PPI networks can significantly improve the diagnosis different states of tuberculosis.

Clinical trial number: Not applicable.

背景:DNA 微阵列为转录谱分析和基因表达特征鉴定提供了翔实的数据,有助于预防潜伏肺结核感染(LTBI)发展为活动性疾病。然而,由于数据的嘈杂性和缺乏普遍稳定的分析方法,构建区分 LTBI 和活动性肺结核(ATB)的预后模型非常具有挑战性:方法:在本研究中,我们借助决策层的数据融合,提出了一个准确的预测模型。为此,我们将过滤特征选择和包装特征选择技术的结果与多重标准决策(MCDM)方法相结合,从六个微阵列数据集中选出了 10 个基因,这些基因可能是诊断肺结核病例最具鉴别力的基因。作为本研究的主要贡献,通过将蛋白质-蛋白质相互作用(PPI)网络与 MCDM 方法(称为决策试验与评估实验室或 DEMATEL)相结合,构建了最终的排序函数,以改进特征排序方法:根据雅格理论,在决策层对 10 个引入基因进行了数据融合,融合了随机森林分类器(RF)和 k-近邻分类器(KNN),所提出的算法灵敏度达到 0.97,特异度达到 0.90,准确度达到 0.95。最后,在累积聚类的帮助下,引入了参与潜伏和激活结核病诊断的基因:结论:MCDM 方法和 PPI 网络的结合能显著提高结核病不同状态的诊断率:不适用。
{"title":"A modified multiple-criteria decision-making approach based on a protein-protein interaction network to diagnose latent tuberculosis.","authors":"Somayeh Ayalvari, Marjan Kaedi, Mohammadreza Sehhati","doi":"10.1186/s12911-024-02668-z","DOIUrl":"10.1186/s12911-024-02668-z","url":null,"abstract":"<p><strong>Background: </strong>DNA microarrays provide informative data for transcriptional profiling and identifying gene expression signatures to help prevent progression of latent tuberculosis infection (LTBI) to active disease. However, constructing a prognostic model for distinguishing LTBI from active tuberculosis (ATB) is very challenging due to the noisy nature of data and lack of a generally stable analysis approach.</p><p><strong>Methods: </strong>In the present study, we proposed an accurate predictive model with the help of data fusion at the decision level. In this regard, results of filter feature selection and wrapper feature selection techniques were combined with multiple-criteria decision-making (MCDM) methods to select 10 genes from six microarray datasets that can be the most discriminative genes for diagnosing tuberculosis cases. As the main contribution of this study, the final ranking function was constructed by combining protein-protein interaction (PPI) network with an MCDM method (called Decision-making Trial and Evaluation Laboratory or DEMATEL) to improve the feature ranking approach.</p><p><strong>Results: </strong>By applying data fusion at the decision level on the 10 introduced genes in terms of fusion of classifiers of random forests (RF) and k-nearest neighbors (KNN) regarding Yager's theory, the proposed algorithm reached a sensitivity of 0.97, specificity of 0.90, and accuracy of 0.95. Finally, with the help of cumulative clustering, the genes involved in the diagnosis of latent and activated tuberculosis have been introduced.</p><p><strong>Conclusions: </strong>The combination of MCDM methods and PPI networks can significantly improve the diagnosis different states of tuberculosis.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"319"},"PeriodicalIF":3.3,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel explainable machine learning-based healthy ageing scale. 基于机器学习的新型可解释健康老龄化量表。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-29 DOI: 10.1186/s12911-024-02714-w
Katarina Gašperlin Stepančič, Ana Ramovš, Jože Ramovš, Andrej Košir

Background: Ageing is one of the most important challenges in our society. Evaluating how one is ageing is important in many aspects, from giving personalized recommendations to providing insight for long-term care eligibility. Machine learning can be utilized for that purpose, however, user reservations towards "black-box" predictions call for increased transparency and explainability of results. This study aimed to explore the potential of developing a machine learning-based healthy ageing scale that provides explainable results that could be trusted and understood by informal carers.

Methods: In this study, we used data from 696 older adults collected via personal field interviews as part of independent research. Explanatory factor analysis was used to find candidate healthy ageing aspects. For visualization of key aspects, a web annotation application was developed. Key aspects were selected by gerontologists who later used web annotation applications to evaluate healthy ageing for each older adult on a Likert scale. Logistic Regression, Decision Tree Classifier, Random Forest, KNN, SVM and XGBoost were used for multi-classification machine learning. AUC OvO, AUC OvR, F1, Precision and Recall were used for evaluation. Finally, SHAP was applied to best model predictions to make them explainable.

Results: The experimental results show that human annotations of healthy ageing could be modelled using machine learning where among several algorithms XGBoost showed superior performance. The use of XGBoost resulted in 0.92 macro-averaged AuC OvO and 0.76 macro-averaged F1. SHAP was applied to generate local explanations for predictions and shows how each feature is influencing the prediction.

Conclusion: The resulting explainable predictions make a step toward practical scale implementation into decision support systems. The development of such a decision support system that would incorporate an explainable model could reduce user reluctance towards the utilization of AI in healthcare and provide explainable and trusted insights to informal carers or healthcare providers as a basis to shape tangible actions for improving ageing. Furthermore, the cooperation with gerontology specialists throughout the process also indicates expert knowledge as integrated into the model.

背景:老龄化是我们社会面临的最重要挑战之一。评估一个人的老龄化程度在很多方面都很重要,从提供个性化建议到为长期护理资格提供见解,不一而足。然而,用户对 "黑箱 "预测持保留意见,这就要求提高预测结果的透明度和可解释性。本研究旨在探索开发基于机器学习的健康老龄化量表的潜力,该量表可提供可解释的结果,非正式照护者可以信赖和理解:在本研究中,我们使用了通过个人实地访谈收集的 696 名老年人的数据,这是独立研究的一部分。我们使用了解释性因素分析来寻找候选的健康老龄化方面。为了使关键方面可视化,我们开发了一个网络注释应用程序。老年学专家随后使用网络注释应用程序,以李克特量表对每位老年人的健康老龄化情况进行评估,并选出关键方面。逻辑回归、决策树分类器、随机森林、KNN、SVM 和 XGBoost 被用于多分类机器学习。评估采用了 AUC OvO、AUC OvR、F1、精确度和召回率。最后,将 SHAP 应用于最佳模型预测,使其具有可解释性:实验结果表明,人类对健康老龄化的注释可以通过机器学习来建模,在几种算法中,XGBoost 表现出了卓越的性能。使用 XGBoost 后,宏观平均 AuC OvO 为 0.92,宏观平均 F1 为 0.76。SHAP 被用于生成预测的局部解释,并显示每个特征是如何影响预测的:得出的可解释预测结果为决策支持系统的实际大规模实施迈出了一步。开发这种包含可解释模型的决策支持系统,可以减少用户对在医疗保健中使用人工智能的不情愿,并为非正式护理人员或医疗保健提供者提供可解释和可信的见解,以此为基础制定改善老龄化的具体行动。此外,在整个过程中与老年学专家的合作也表明,模型中融入了专家知识。
{"title":"A novel explainable machine learning-based healthy ageing scale.","authors":"Katarina Gašperlin Stepančič, Ana Ramovš, Jože Ramovš, Andrej Košir","doi":"10.1186/s12911-024-02714-w","DOIUrl":"10.1186/s12911-024-02714-w","url":null,"abstract":"<p><strong>Background: </strong>Ageing is one of the most important challenges in our society. Evaluating how one is ageing is important in many aspects, from giving personalized recommendations to providing insight for long-term care eligibility. Machine learning can be utilized for that purpose, however, user reservations towards \"black-box\" predictions call for increased transparency and explainability of results. This study aimed to explore the potential of developing a machine learning-based healthy ageing scale that provides explainable results that could be trusted and understood by informal carers.</p><p><strong>Methods: </strong>In this study, we used data from 696 older adults collected via personal field interviews as part of independent research. Explanatory factor analysis was used to find candidate healthy ageing aspects. For visualization of key aspects, a web annotation application was developed. Key aspects were selected by gerontologists who later used web annotation applications to evaluate healthy ageing for each older adult on a Likert scale. Logistic Regression, Decision Tree Classifier, Random Forest, KNN, SVM and XGBoost were used for multi-classification machine learning. AUC OvO, AUC OvR, F1, Precision and Recall were used for evaluation. Finally, SHAP was applied to best model predictions to make them explainable.</p><p><strong>Results: </strong>The experimental results show that human annotations of healthy ageing could be modelled using machine learning where among several algorithms XGBoost showed superior performance. The use of XGBoost resulted in 0.92 macro-averaged AuC OvO and 0.76 macro-averaged F1. SHAP was applied to generate local explanations for predictions and shows how each feature is influencing the prediction.</p><p><strong>Conclusion: </strong>The resulting explainable predictions make a step toward practical scale implementation into decision support systems. The development of such a decision support system that would incorporate an explainable model could reduce user reluctance towards the utilization of AI in healthcare and provide explainable and trusted insights to informal carers or healthcare providers as a basis to shape tangible actions for improving ageing. Furthermore, the cooperation with gerontology specialists throughout the process also indicates expert knowledge as integrated into the model.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"317"},"PeriodicalIF":3.3,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting clinical events characterizing the progression of amyotrophic lateral sclerosis via machine learning approaches using routine visits data: a feasibility study. 利用常规就诊数据,通过机器学习方法预测肌萎缩侧索硬化症进展的临床事件:一项可行性研究。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-29 DOI: 10.1186/s12911-024-02719-5
Alessandro Guazzo, Michele Atzeni, Elena Idi, Isotta Trescato, Erica Tavazzi, Enrico Longato, Umberto Manera, Adriano Chió, Marta Gromicho, Inês Alves, Mamede de Carvalho, Martina Vettoretti, Barbara Di Camillo

Background: Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that results in death within a short time span (3-5 years). One of the major challenges in treating ALS is its highly heterogeneous disease progression and the lack of effective prognostic tools to forecast it. The main aim of this study was, then, to test the feasibility of predicting relevant clinical outcomes that characterize the progression of ALS with a two-year prediction horizon via artificial intelligence techniques using routine visits data.

Methods: Three classification problems were considered: predicting death (binary problem), predicting death or percutaneous endoscopic gastrostomy (PEG) (multiclass problem), and predicting death or non-invasive ventilation (NIV) (multiclass problem). Two supervised learning models, a logistic regression (LR) and a deep learning multilayer perceptron (MLP), were trained ensuring technical robustness and reproducibility. Moreover, to provide insights into model explainability and result interpretability, model coefficients for LR and Shapley values for both LR and MLP were considered to characterize the relationship between each variable and the outcome.

Results: On the one hand, predicting death was successful as both models yielded F1 scores and accuracy well above 0.7. The model explainability analysis performed for this outcome allowed for the understanding of how different methodological approaches consider the input variables when performing the prediction. On the other hand, predicting death alongside PEG or NIV proved to be much more challenging (F1 scores and accuracy in the 0.4-0.6 interval).

Conclusions: In conclusion, predicting death due to ALS proved to be feasible. However, predicting PEG or NIV in a multiclass fashion proved to be unfeasible with these data, regardless of the complexity of the methodological approach. The observed results suggest a potential ceiling on the amount of information extractable from the database, e.g., due to the intrinsic difficulty of the prediction tasks at hand, or to the absence of crucial predictors that are, however, not currently collected during routine practice.

背景:肌萎缩性脊髓侧索硬化症(ALS)是一种进行性神经退行性疾病,患者会在短时间内(3-5 年)死亡。治疗肌萎缩侧索硬化症的主要挑战之一是其高度异质性的疾病进展以及缺乏有效的预后预测工具。因此,本研究的主要目的是利用常规就诊数据,通过人工智能技术测试预测相关临床结果的可行性,这些结果描述了 ALS 在两年内的进展情况:我们考虑了三个分类问题:预测死亡(二元问题)、预测死亡或经皮内镜胃造瘘术(PEG)(多类问题)以及预测死亡或无创通气(NIV)(多类问题)。训练了两个监督学习模型,即逻辑回归(LR)和深度学习多层感知器(MLP),以确保技术的稳健性和可重复性。此外,为了深入了解模型的可解释性和结果的可解释性,还考虑了 LR 的模型系数以及 LR 和 MLP 的 Shapley 值,以描述每个变量与结果之间的关系:一方面,预测死亡是成功的,因为两个模型的 F1 分数和准确率都远高于 0.7。对这一结果进行的模型可解释性分析有助于了解不同的方法在进行预测时是如何考虑输入变量的。另一方面,预测PEG或NIV导致的死亡则更具挑战性(F1得分和准确率在0.4-0.6之间):总之,预测 ALS 引起的死亡是可行的。然而,无论方法的复杂程度如何,通过这些数据以多分类方式预测 PEG 或 NIV 都被证明是不可行的。观察到的结果表明,从数据库中提取的信息量可能存在上限,例如,由于当前预测任务的内在难度,或者由于缺乏目前在日常实践中没有收集到的关键预测因子。
{"title":"Predicting clinical events characterizing the progression of amyotrophic lateral sclerosis via machine learning approaches using routine visits data: a feasibility study.","authors":"Alessandro Guazzo, Michele Atzeni, Elena Idi, Isotta Trescato, Erica Tavazzi, Enrico Longato, Umberto Manera, Adriano Chió, Marta Gromicho, Inês Alves, Mamede de Carvalho, Martina Vettoretti, Barbara Di Camillo","doi":"10.1186/s12911-024-02719-5","DOIUrl":"10.1186/s12911-024-02719-5","url":null,"abstract":"<p><strong>Background: </strong>Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that results in death within a short time span (3-5 years). One of the major challenges in treating ALS is its highly heterogeneous disease progression and the lack of effective prognostic tools to forecast it. The main aim of this study was, then, to test the feasibility of predicting relevant clinical outcomes that characterize the progression of ALS with a two-year prediction horizon via artificial intelligence techniques using routine visits data.</p><p><strong>Methods: </strong>Three classification problems were considered: predicting death (binary problem), predicting death or percutaneous endoscopic gastrostomy (PEG) (multiclass problem), and predicting death or non-invasive ventilation (NIV) (multiclass problem). Two supervised learning models, a logistic regression (LR) and a deep learning multilayer perceptron (MLP), were trained ensuring technical robustness and reproducibility. Moreover, to provide insights into model explainability and result interpretability, model coefficients for LR and Shapley values for both LR and MLP were considered to characterize the relationship between each variable and the outcome.</p><p><strong>Results: </strong>On the one hand, predicting death was successful as both models yielded F1 scores and accuracy well above 0.7. The model explainability analysis performed for this outcome allowed for the understanding of how different methodological approaches consider the input variables when performing the prediction. On the other hand, predicting death alongside PEG or NIV proved to be much more challenging (F1 scores and accuracy in the 0.4-0.6 interval).</p><p><strong>Conclusions: </strong>In conclusion, predicting death due to ALS proved to be feasible. However, predicting PEG or NIV in a multiclass fashion proved to be unfeasible with these data, regardless of the complexity of the methodological approach. The observed results suggest a potential ceiling on the amount of information extractable from the database, e.g., due to the intrinsic difficulty of the prediction tasks at hand, or to the absence of crucial predictors that are, however, not currently collected during routine practice.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 Suppl 4","pages":"318"},"PeriodicalIF":3.3,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pilot deployment of a machine-learning enhanced prediction of need for hemorrhage resuscitation after trauma - the ShockMatrix pilot study. 对创伤后出血复苏需求进行机器学习增强预测的试点部署 - ShockMatrix 试点研究。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-28 DOI: 10.1186/s12911-024-02723-9
Tobias Gauss, Jean-Denis Moyer, Clelia Colas, Manuel Pichon, Nathalie Delhaye, Marie Werner, Veronique Ramonda, Theophile Sempe, Sofiane Medjkoune, Julie Josse, Arthur James, Anatole Harrois

Importance: Decision-making in trauma patients remains challenging and often results in deviation from guidelines. Machine-Learning (ML) enhanced decision-support could improve hemorrhage resuscitation.

Aim: To develop a ML enhanced decision support tool to predict Need for Hemorrhage Resuscitation (NHR) (part I) and test the collection of the predictor variables in real time in a smartphone app (part II).

Design, setting, and participants: Development of a ML model from a registry to predict NHR relying exclusively on prehospital predictors. Several models and imputation techniques were tested. Assess the feasibility to collect the predictors of the model in a customized smartphone app during prealert and generate a prediction in four level-1 trauma centers to compare the predictions to the gestalt of the trauma leader.

Main outcomes and measures: Part 1: Model output was NHR defined by 1) at least one RBC transfusion in resuscitation, 2) transfusion ≥ 4 RBC within 6 h, 3) any hemorrhage control procedure within 6 h or 4) death from hemorrhage within 24 h. The performance metric was the F4-score and compared to reference scores (RED FLAG, ABC). In part 2, the model and clinician prediction were compared with Likelihood Ratios (LR).

Results: From 36,325 eligible patients in the registry (Nov 2010-May 2022), 28,614 were included in the model development (Part 1). Median age was 36 [25-52], median ISS 13 [5-22], 3249/28614 (11%) corresponded to the definition of NHR. A XGBoost model with nine prehospital variables generated the best predictive performance for NHR according to the F4-score with a score of 0.76 [0.73-0.78]. Over a 3-month period (Aug-Oct 2022), 139 of 391 eligible patients were included in part II (38.5%), 22/139 with NHR. Clinician satisfaction was high, no workflow disruption observed and LRs comparable between the model and the clinicians.

Conclusions and relevance: The ShockMatrix pilot study developed a simple ML-enhanced NHR prediction tool demonstrating a comparable performance to clinical reference scores and clinicians. Collecting the predictor variables in real-time on prealert was feasible and caused no workflow disruption.

重要性:创伤患者的决策制定仍然具有挑战性,往往会导致偏离指南。机器学习(ML)增强型决策支持可改善出血复苏。目的:开发一种ML增强型决策支持工具,用于预测出血复苏需求(NHR)(第一部分),并测试在智能手机应用程序中实时收集预测变量(第二部分):设计、环境和参与者:从登记册中建立一个 ML 模型,完全依靠院前预测因素来预测 NHR。测试了多个模型和估算技术。评估在院前警报期间在定制的智能手机应用程序中收集模型预测因子的可行性,并在四个一级创伤中心生成预测结果,将预测结果与创伤负责人的酝酿结果进行比较:第 1 部分:第一部分:模型输出为NHR,定义为:1)复苏中至少输注一次RBC;2)6小时内输注≥4次RBC;3)6小时内任何出血控制过程;或4)24小时内因出血死亡。在第二部分中,用似然比(LR)对模型和临床医生的预测进行了比较:在登记(2010 年 11 月至 2022 年 5 月)的 36,325 名符合条件的患者中,28,614 人被纳入模型开发(第 1 部分)。年龄中位数为 36 [25-52],ISS 中位数为 13 [5-22],3249/28614(11%)人符合 NHR 的定义。根据 F4 评分,包含九个院前变量的 XGBoost 模型对 NHR 的预测效果最好,为 0.76 [0.73-0.78]。在 3 个月的时间里(2022 年 8 月至 10 月),391 名符合条件的患者中有 139 人被纳入第二部分(38.5%),其中 22/139 人患有 NHR。临床医生的满意度很高,没有观察到工作流程中断,模型和临床医生的 LR 值相当:ShockMatrix 试验研究开发了一种简单的 ML 增强型 NHR 预测工具,其性能与临床参考评分和临床医生相当。在预警前实时收集预测变量是可行的,而且不会影响工作流程。
{"title":"Pilot deployment of a machine-learning enhanced prediction of need for hemorrhage resuscitation after trauma - the ShockMatrix pilot study.","authors":"Tobias Gauss, Jean-Denis Moyer, Clelia Colas, Manuel Pichon, Nathalie Delhaye, Marie Werner, Veronique Ramonda, Theophile Sempe, Sofiane Medjkoune, Julie Josse, Arthur James, Anatole Harrois","doi":"10.1186/s12911-024-02723-9","DOIUrl":"10.1186/s12911-024-02723-9","url":null,"abstract":"<p><strong>Importance: </strong>Decision-making in trauma patients remains challenging and often results in deviation from guidelines. Machine-Learning (ML) enhanced decision-support could improve hemorrhage resuscitation.</p><p><strong>Aim: </strong>To develop a ML enhanced decision support tool to predict Need for Hemorrhage Resuscitation (NHR) (part I) and test the collection of the predictor variables in real time in a smartphone app (part II).</p><p><strong>Design, setting, and participants: </strong>Development of a ML model from a registry to predict NHR relying exclusively on prehospital predictors. Several models and imputation techniques were tested. Assess the feasibility to collect the predictors of the model in a customized smartphone app during prealert and generate a prediction in four level-1 trauma centers to compare the predictions to the gestalt of the trauma leader.</p><p><strong>Main outcomes and measures: </strong>Part 1: Model output was NHR defined by 1) at least one RBC transfusion in resuscitation, 2) transfusion ≥ 4 RBC within 6 h, 3) any hemorrhage control procedure within 6 h or 4) death from hemorrhage within 24 h. The performance metric was the F4-score and compared to reference scores (RED FLAG, ABC). In part 2, the model and clinician prediction were compared with Likelihood Ratios (LR).</p><p><strong>Results: </strong>From 36,325 eligible patients in the registry (Nov 2010-May 2022), 28,614 were included in the model development (Part 1). Median age was 36 [25-52], median ISS 13 [5-22], 3249/28614 (11%) corresponded to the definition of NHR. A XGBoost model with nine prehospital variables generated the best predictive performance for NHR according to the F4-score with a score of 0.76 [0.73-0.78]. Over a 3-month period (Aug-Oct 2022), 139 of 391 eligible patients were included in part II (38.5%), 22/139 with NHR. Clinician satisfaction was high, no workflow disruption observed and LRs comparable between the model and the clinicians.</p><p><strong>Conclusions and relevance: </strong>The ShockMatrix pilot study developed a simple ML-enhanced NHR prediction tool demonstrating a comparable performance to clinical reference scores and clinicians. Collecting the predictor variables in real-time on prealert was feasible and caused no workflow disruption.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"315"},"PeriodicalIF":3.3,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face and content validity of the EMPOWER-UP questionnaire: a generic measure of empowerment in relational decision-making and problem-solving. EMPOWER-UP 问卷的面效度和内容效度:在关系决策和问题解决中增强能力的通用测量方法。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-28 DOI: 10.1186/s12911-024-02727-5
Emilie Haarslev Schröder Marqvorsen, Line Lund, Sigrid Normann Biener, Mette Due-Christensen, Gitte R Husted, Rikke Jørgensen, Anne Sophie Mathiesen, Mette Linnet Olesen, Morten Aagaard Petersen, François Pouwer, Bodil Rasmussen, Mette Juel Rothmann, Thordis Thomsen, Kirsty Winkley, Vibeke Zoffmann

Background: Decision-making and problem-solving processes are powerful activities occurring daily across all healthcare settings. Their empowering potential is seldom fully exploited, and they may even be perceived as disempowering. We developed the EMPOWER-UP questionnaire to enable assessment of healthcare users' perception of empowerment across health conditions, healthcare settings, and healthcare providers' professional backgrounds. This article reports the initial development of EMPOWER-UP, including face and content validation.

Methods: Four grounded theories explaining barriers and enablers to empowerment in relational decision-making and problem-solving were reviewed to generate a preliminary item pool, which was subsequently reduced using constant comparison. Preliminary items were evaluated for face and content validity using an expert panel of seven researchers and cognitive interviews in Danish and English with 29 adults diagnosed with diabetes, cancer, or schizophrenia.

Results: A preliminary pool of 139 items was reduced to 46. Independent feedback from expert panel members resulted in further item reduction and modifications supporting content validity and strengthening the potential for generic use. Forty-one preliminary items were evaluated through 29 cognitive interviews, resulting in a 36-item draft questionnaire deemed to have good face and content validity and generic potential.

Conclusions: Face and content validation using an expert panel and cognitive interviews resulted in a 36-item draft questionnaire with a potential for evaluating empowerment in user-provider interactions regardless of health conditions, healthcare settings, and healthcare providers' professional backgrounds.

背景:决策和解决问题的过程是所有医疗环境中每天都会发生的强大活动。它们的赋权潜力很少被充分挖掘,甚至可能被认为是剥夺权力。我们开发了 EMPOWER-UP 问卷,用于评估医疗用户对不同健康状况、医疗环境和医疗服务提供者专业背景下的赋权感知。本文报告了 EMPOWER-UP 的初步开发情况,包括表面和内容验证:方法:对解释关系决策和问题解决中增强能力的障碍和促进因素的四个基础理论进行了回顾,以生成初步的项目库,随后使用恒定比较法对项目库进行了缩减。由七名研究人员组成的专家小组用丹麦语和英语对 29 名被诊断患有糖尿病、癌症或精神分裂症的成年人进行了认知访谈,对初步项目的面效度和内容效度进行了评估:结果:由 139 个项目组成的初步项目库缩减至 46 个。专家组成员提供的独立反馈意见进一步减少了项目,并对项目进行了修改,从而支持了内容的有效性,并增强了通用的可能性。通过 29 次认知访谈,对 41 个初步项目进行了评估,最终形成了 36 个项目的问卷草案,被认为具有良好的表面和内容效度以及通用潜力:通过专家小组和认知访谈对表面和内容进行验证,得出了 36 个项目的问卷草案,该问卷具有评估用户与医疗服务提供者互动中的授权潜力,而不受健康状况、医疗环境和医疗服务提供者专业背景的影响。
{"title":"Face and content validity of the EMPOWER-UP questionnaire: a generic measure of empowerment in relational decision-making and problem-solving.","authors":"Emilie Haarslev Schröder Marqvorsen, Line Lund, Sigrid Normann Biener, Mette Due-Christensen, Gitte R Husted, Rikke Jørgensen, Anne Sophie Mathiesen, Mette Linnet Olesen, Morten Aagaard Petersen, François Pouwer, Bodil Rasmussen, Mette Juel Rothmann, Thordis Thomsen, Kirsty Winkley, Vibeke Zoffmann","doi":"10.1186/s12911-024-02727-5","DOIUrl":"10.1186/s12911-024-02727-5","url":null,"abstract":"<p><strong>Background: </strong>Decision-making and problem-solving processes are powerful activities occurring daily across all healthcare settings. Their empowering potential is seldom fully exploited, and they may even be perceived as disempowering. We developed the EMPOWER-UP questionnaire to enable assessment of healthcare users' perception of empowerment across health conditions, healthcare settings, and healthcare providers' professional backgrounds. This article reports the initial development of EMPOWER-UP, including face and content validation.</p><p><strong>Methods: </strong>Four grounded theories explaining barriers and enablers to empowerment in relational decision-making and problem-solving were reviewed to generate a preliminary item pool, which was subsequently reduced using constant comparison. Preliminary items were evaluated for face and content validity using an expert panel of seven researchers and cognitive interviews in Danish and English with 29 adults diagnosed with diabetes, cancer, or schizophrenia.</p><p><strong>Results: </strong>A preliminary pool of 139 items was reduced to 46. Independent feedback from expert panel members resulted in further item reduction and modifications supporting content validity and strengthening the potential for generic use. Forty-one preliminary items were evaluated through 29 cognitive interviews, resulting in a 36-item draft questionnaire deemed to have good face and content validity and generic potential.</p><p><strong>Conclusions: </strong>Face and content validation using an expert panel and cognitive interviews resulted in a 36-item draft questionnaire with a potential for evaluating empowerment in user-provider interactions regardless of health conditions, healthcare settings, and healthcare providers' professional backgrounds.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"313"},"PeriodicalIF":3.3,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11514851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerated hazard prediction based on age time-scale for women diagnosed with breast cancer using a deep learning method. 利用深度学习方法,基于年龄时间尺度对确诊为乳腺癌的妇女进行加速危险预测。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-28 DOI: 10.1186/s12911-024-02725-7
Zahra Ramezani, Jamshid Yazdani Charati, Reza Alizadeh-Navaei, Mohammad Eslamijouybari

Breast cancer is the most common cancer in women. Previous studies have investigated estimating and predicting the proportional hazard rates and survival in breast cancer. This study deals with predicting accelerated hazards (AH) rate based on age categories in breast cancer patients using deep learning methods. The AH has a time-dependent structure whose rate changes according to time and variable effects. We have collected data related to 1225 female patients with breast cancer at the Mandarin University of Medical Sciences. The patients' demographic and clinical characteristics including family history, age, history of tobacco use, hysterectomy, first menstruation age, gravida, number of breastfeeding, disease grade, marital status, and survival status have been recorded. Initially, we dealt with predicting three age groups of patients: ≤ 40, 41-60, and ≥ 61 years. Then, the prediction of accelerated risk value based on age categories for each breast cancer patient through deep learning and the importance of variables using LightGBM is discussed. Improving clinical management and treatment of breast cancer requires advanced methods such as time-dependent AH calculation. When the behavioral effect is assumed as a time scale change between hazard functions, the AH model is more appropriate for randomized clinical trials. The study results demonstrate the proper performance of the proposed model for predicting AH by age categories based on breast cancer patients' demographic and clinical characteristics.

乳腺癌是女性最常见的癌症。以往的研究已经对乳腺癌的比例危险率和生存率进行了估计和预测。本研究采用深度学习方法,根据乳腺癌患者的年龄类别预测加速危险(AH)率。加速危险率具有时间依赖性结构,其比率随时间和变量影响而变化。我们收集了文华医科大学 1225 名女性乳腺癌患者的相关数据。记录了患者的人口学和临床特征,包括家族史、年龄、吸烟史、子宫切除术、初潮年龄、孕产妇、哺乳次数、疾病分级、婚姻状况和生存状况。首先,我们对三个年龄组的患者进行了预测:≤ 40 岁、41-60 岁和≥ 61 岁。然后,讨论了通过深度学习预测每个乳腺癌患者基于年龄组别的加速风险值,以及使用 LightGBM 预测变量的重要性。改善乳腺癌的临床管理和治疗需要先进的方法,例如随时间变化的 AH 计算。当行为效应被假定为危险函数之间的时间尺度变化时,AH 模型更适合随机临床试验。研究结果表明,根据乳腺癌患者的人口统计学和临床特征,所提出的模型在预测不同年龄段的 AH 方面表现出色。
{"title":"Accelerated hazard prediction based on age time-scale for women diagnosed with breast cancer using a deep learning method.","authors":"Zahra Ramezani, Jamshid Yazdani Charati, Reza Alizadeh-Navaei, Mohammad Eslamijouybari","doi":"10.1186/s12911-024-02725-7","DOIUrl":"10.1186/s12911-024-02725-7","url":null,"abstract":"<p><p>Breast cancer is the most common cancer in women. Previous studies have investigated estimating and predicting the proportional hazard rates and survival in breast cancer. This study deals with predicting accelerated hazards (AH) rate based on age categories in breast cancer patients using deep learning methods. The AH has a time-dependent structure whose rate changes according to time and variable effects. We have collected data related to 1225 female patients with breast cancer at the Mandarin University of Medical Sciences. The patients' demographic and clinical characteristics including family history, age, history of tobacco use, hysterectomy, first menstruation age, gravida, number of breastfeeding, disease grade, marital status, and survival status have been recorded. Initially, we dealt with predicting three age groups of patients: ≤ 40, 41-60, and ≥ 61 years. Then, the prediction of accelerated risk value based on age categories for each breast cancer patient through deep learning and the importance of variables using LightGBM is discussed. Improving clinical management and treatment of breast cancer requires advanced methods such as time-dependent AH calculation. When the behavioral effect is assumed as a time scale change between hazard functions, the AH model is more appropriate for randomized clinical trials. The study results demonstrate the proper performance of the proposed model for predicting AH by age categories based on breast cancer patients' demographic and clinical characteristics.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"314"},"PeriodicalIF":3.3,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11514944/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting the onset of Alzheimer's disease and related dementia using electronic health records: findings from the cache county study on memory in aging (1995-2008). 利用电子健康记录预测阿尔茨海默氏症和相关痴呆症的发病:缓存县老龄记忆研究(1995-2008 年)的发现。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-28 DOI: 10.1186/s12911-024-02728-4
Karen C Schliep, Jeffrey Thornhill, JoAnn T Tschanz, Julio C Facelli, Truls Østbye, Michelle K Sorweid, Ken R Smith, Michael Varner, Richard D Boyce, Christine J Cliatt Brown, Huong Meeks, Samir Abdelrahman

Introduction: Clinical notes, biomarkers, and neuroimaging have proven valuable in dementia prediction models. Whether commonly available structured clinical data can predict dementia is an emerging area of research. We aimed to predict gold-standard, research-based diagnoses of dementia including Alzheimer's disease (AD) and/or Alzheimer's disease related dementias (ADRD), in addition to ICD-based AD and/or ADRD diagnoses, in a well-phenotyped, population-based cohort using a machine learning approach.

Methods: Administrative healthcare data (k = 163 diagnostic features), in addition to census/vital record sociodemographic data (k = 6 features), were linked to the Cache County Study (CCS, 1995-2008).

Results: Among successfully linked UPDB-CCS participants (n = 4206), 522 (12.4%) had incident dementia (AD alone, AD comorbid with ADRD, or ADRD alone) as per the CCS "gold standard" assessments. Random Forest models, with a 1-year prediction window, achieved the best performance with an Area Under the Curve (AUC) of 0.67. Accuracy declined for dementia subtypes: AD/ADRD (AUC = 0.65); ADRD (AUC = 0.49). Accuracy improved when using ICD-based dementia diagnoses (AUC = 0.77).

Discussion: Commonly available structured clinical data (without labs, notes, or prescription information) demonstrate modest ability to predict "gold-standard" research-based AD/ADRD diagnoses, corroborated by prior research. Using ICD diagnostic codes to identify dementia as done in the majority of machine learning dementia prediction models, as compared to "gold-standard" dementia diagnoses, can result in higher accuracy, but whether these models are predicting true dementia warrants further research.

导言:临床笔记、生物标志物和神经影像学已被证明在痴呆症预测模型中很有价值。常见的结构化临床数据能否预测痴呆症是一个新兴的研究领域。我们的目标是利用机器学习方法,在一个表型清晰的人群队列中预测基于研究的金标准痴呆诊断,包括阿尔茨海默病(AD)和/或阿尔茨海默病相关痴呆(ADRD),以及基于 ICD 的 AD 和/或 ADRD 诊断:除了人口普查/病历社会人口学数据(k = 6个特征)外,还将行政医疗保健数据(k = 163个诊断特征)与卡奇县研究(CCS,1995-2008年)进行了链接:在成功连接的UPDB-CCS参与者(n = 4206)中,有522人(12.4%)根据CCS "黄金标准 "评估结果患有痴呆症(单纯AD、AD合并ADRD或单纯ADRD)。随机森林模型的预测窗口期为 1 年,性能最佳,曲线下面积 (AUC) 为 0.67。痴呆症亚型的准确性有所下降:AD/ADDR(AUC = 0.65);ADDR(AUC = 0.49)。当使用基于 ICD 的痴呆诊断时,准确性有所提高(AUC = 0.77):讨论:常见的结构化临床数据(不含实验室、笔记或处方信息)在预测基于研究的 "黄金标准 "AD/ADRD 诊断方面表现出一定的能力,这一点已得到先前研究的证实。与 "黄金标准 "痴呆症诊断相比,大多数机器学习痴呆症预测模型都使用 ICD 诊断代码来识别痴呆症,这可以提高准确率,但这些模型是否能预测真正的痴呆症还需要进一步研究。
{"title":"Predicting the onset of Alzheimer's disease and related dementia using electronic health records: findings from the cache county study on memory in aging (1995-2008).","authors":"Karen C Schliep, Jeffrey Thornhill, JoAnn T Tschanz, Julio C Facelli, Truls Østbye, Michelle K Sorweid, Ken R Smith, Michael Varner, Richard D Boyce, Christine J Cliatt Brown, Huong Meeks, Samir Abdelrahman","doi":"10.1186/s12911-024-02728-4","DOIUrl":"10.1186/s12911-024-02728-4","url":null,"abstract":"<p><strong>Introduction: </strong>Clinical notes, biomarkers, and neuroimaging have proven valuable in dementia prediction models. Whether commonly available structured clinical data can predict dementia is an emerging area of research. We aimed to predict gold-standard, research-based diagnoses of dementia including Alzheimer's disease (AD) and/or Alzheimer's disease related dementias (ADRD), in addition to ICD-based AD and/or ADRD diagnoses, in a well-phenotyped, population-based cohort using a machine learning approach.</p><p><strong>Methods: </strong>Administrative healthcare data (k = 163 diagnostic features), in addition to census/vital record sociodemographic data (k = 6 features), were linked to the Cache County Study (CCS, 1995-2008).</p><p><strong>Results: </strong>Among successfully linked UPDB-CCS participants (n = 4206), 522 (12.4%) had incident dementia (AD alone, AD comorbid with ADRD, or ADRD alone) as per the CCS \"gold standard\" assessments. Random Forest models, with a 1-year prediction window, achieved the best performance with an Area Under the Curve (AUC) of 0.67. Accuracy declined for dementia subtypes: AD/ADRD (AUC = 0.65); ADRD (AUC = 0.49). Accuracy improved when using ICD-based dementia diagnoses (AUC = 0.77).</p><p><strong>Discussion: </strong>Commonly available structured clinical data (without labs, notes, or prescription information) demonstrate modest ability to predict \"gold-standard\" research-based AD/ADRD diagnoses, corroborated by prior research. Using ICD diagnostic codes to identify dementia as done in the majority of machine learning dementia prediction models, as compared to \"gold-standard\" dementia diagnoses, can result in higher accuracy, but whether these models are predicting true dementia warrants further research.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"316"},"PeriodicalIF":3.3,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping the landscape of machine learning models used for predicting transfusions in surgical procedures: a scoping review. 绘制用于预测外科手术输血的机器学习模型图:范围综述。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-25 DOI: 10.1186/s12911-024-02729-3
Olivier Duranteau, Florian Blanchard, Benjamin Popoff, Faridi S van Etten-Jamaludin, Turgay Tuna, Benedikt Preckel

Massive transfusion of blood products poses challenges in determining the need for transfusion and the appropriate volume of blood products. This review explores the use of machine learning (ML) models to predict transfusion risk during surgical procedure, focusing on the methodology, variables, and software employed to predict transfusion. This scoping review investigates the development and current state of machine learning models for predicting transfusion risk during surgical procedure, aiming to inform physicians about the field's progress and potential directions.The review was conducted using the databases Cochrane, Embase, and PubMed. The search included keywords related to blood transfusion, statistical models, and surgical procedures. Peer-reviewed articles were included, while literature reviews, case reports, and non-human studies were excluded.A total of 40 studies met the inclusion criteria. The most frequently studied biological variables included haemoglobin, platelet count, international normalized ratio (INR), activated partial thromboplastin time (aPTT), fibrinogen, creatinine, white blood cells, and albumin. Clinical variables of importance included age, sex, surgery type, blood pressure, weight, surgery duration, american society of anesthesiology (ASA) status, blood loss, and body mass index (BMI). The software employed varied, with Python, R, SPSS, and SAS being the most commonly used. Logistic regression was the predominant methodology used in 20 studies.Our scoping review highlights the need for improved reporting and transparency in methodology, variables, and software used. Future research should focus on providing detailed descriptions and open access to codes of respective models, promoting reproducibility, and enhancing the clinical relevance of transfusion risk prediction models.

大量输注血液制品给确定输血需求和适当的血液制品量带来了挑战。本综述探讨了使用机器学习(ML)模型预测手术过程中的输血风险,重点关注预测输血所采用的方法、变量和软件。本综述调查了用于预测手术过程中输血风险的机器学习模型的发展和现状,旨在让医生了解该领域的进展和潜在方向。该综述使用 Cochrane、Embase 和 PubM 等数据库进行搜索,搜索关键词包括输血、统计模型和外科手术。共有 40 项研究符合纳入标准。最常研究的生物变量包括血红蛋白、血小板计数、国际标准化比值(INR)、活化部分凝血活酶时间(aPTT)、纤维蛋白原、肌酐、白细胞和白蛋白。重要的临床变量包括年龄、性别、手术类型、血压、体重、手术持续时间、美国麻醉学会(ASA)状态、失血量和体重指数(BMI)。所使用的软件各不相同,其中最常用的是 Python、R、SPSS 和 SAS。我们的范围界定综述强调了在方法、变量和所用软件方面改进报告和提高透明度的必要性。未来的研究应侧重于提供详细描述和开放各自模型的代码,提高可重复性,并增强输血风险预测模型的临床相关性。
{"title":"Mapping the landscape of machine learning models used for predicting transfusions in surgical procedures: a scoping review.","authors":"Olivier Duranteau, Florian Blanchard, Benjamin Popoff, Faridi S van Etten-Jamaludin, Turgay Tuna, Benedikt Preckel","doi":"10.1186/s12911-024-02729-3","DOIUrl":"10.1186/s12911-024-02729-3","url":null,"abstract":"<p><p>Massive transfusion of blood products poses challenges in determining the need for transfusion and the appropriate volume of blood products. This review explores the use of machine learning (ML) models to predict transfusion risk during surgical procedure, focusing on the methodology, variables, and software employed to predict transfusion. This scoping review investigates the development and current state of machine learning models for predicting transfusion risk during surgical procedure, aiming to inform physicians about the field's progress and potential directions.The review was conducted using the databases Cochrane, Embase, and PubMed. The search included keywords related to blood transfusion, statistical models, and surgical procedures. Peer-reviewed articles were included, while literature reviews, case reports, and non-human studies were excluded.A total of 40 studies met the inclusion criteria. The most frequently studied biological variables included haemoglobin, platelet count, international normalized ratio (INR), activated partial thromboplastin time (aPTT), fibrinogen, creatinine, white blood cells, and albumin. Clinical variables of importance included age, sex, surgery type, blood pressure, weight, surgery duration, american society of anesthesiology (ASA) status, blood loss, and body mass index (BMI). The software employed varied, with Python, R, SPSS, and SAS being the most commonly used. Logistic regression was the predominant methodology used in 20 studies.Our scoping review highlights the need for improved reporting and transparency in methodology, variables, and software used. Future research should focus on providing detailed descriptions and open access to codes of respective models, promoting reproducibility, and enhancing the clinical relevance of transfusion risk prediction models.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"312"},"PeriodicalIF":3.3,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning assisted cancer disease prediction from gene expression data using WT-GAN. 利用 WT-GAN 从基因表达数据中进行深度学习辅助癌症疾病预测。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-10-24 DOI: 10.1186/s12911-024-02712-y
U Ravindran, C Gunavathi

Several diverse fields including the healthcare system and drug development sectors have benefited immensely through the adoption of deep learning (DL), which is a subset of artificial intelligence (AI) and machine learning (ML). Cancer makes up a significant percentage of the illnesses that cause early human mortality across the globe, and this situation is likely to rise in the coming years, especially when non-communicable illnesses are not considered. As a result, cancer patients would greatly benefit from precise and timely diagnosis and prediction. Deep learning (DL) has become a common technique in healthcare due to the abundance of computational power. Gene expression datasets are frequently used in major DL-based applications for illness detection, notably in cancer therapy. The quantity of medical data, on the other hand, is often insufficient to fulfill deep learning requirements. Microarray gene expression datasets are used for training procedures despite their extreme dimensionality, limited volume of data samples, and sparsely available information. Data augmentation is commonly used to expand the training sample size for gene data. The Wasserstein Tabular Generative Adversarial Network (WT-GAN) model is used for the data augmentation process for generating synthetic data in this proposed work. The correlation-based feature selection technique selects the most relevant characteristics based on threshold values. Deep FNN and ML algorithms train and classify the gene expression samples. The augmented data give better classification results (> 97%) when using WT-GAN for cancer diagnosis.

深度学习是人工智能(AI)和机器学习(ML)的一个子集,包括医疗保健系统和药物开发部门在内的多个领域都因采用深度学习而受益匪浅。在全球导致人类早期死亡的疾病中,癌症占了很大比例,而且这种情况在未来几年可能还会上升,尤其是在不考虑非传染性疾病的情况下。因此,精确、及时的诊断和预测将使癌症患者受益匪浅。由于计算能力丰富,深度学习(DL)已成为医疗保健领域的常用技术。基因表达数据集经常被用于基于深度学习的主要疾病检测应用中,尤其是癌症治疗中。另一方面,医疗数据的数量往往不足以满足深度学习的要求。微阵列基因表达数据集尽管维度极高、数据样本量有限、可用信息稀少,但仍被用于训练程序。数据扩增通常用于扩大基因数据的训练样本规模。在本研究中,数据扩增过程中使用了 Wasserstein 表生成对抗网络(WT-GAN)模型来生成合成数据。基于相关性的特征选择技术根据阈值选择最相关的特征。深度 FNN 和 ML 算法对基因表达样本进行训练和分类。在使用 WT-GAN 进行癌症诊断时,增强数据能提供更好的分类结果(> 97%)。
{"title":"Deep learning assisted cancer disease prediction from gene expression data using WT-GAN.","authors":"U Ravindran, C Gunavathi","doi":"10.1186/s12911-024-02712-y","DOIUrl":"10.1186/s12911-024-02712-y","url":null,"abstract":"<p><p>Several diverse fields including the healthcare system and drug development sectors have benefited immensely through the adoption of deep learning (DL), which is a subset of artificial intelligence (AI) and machine learning (ML). Cancer makes up a significant percentage of the illnesses that cause early human mortality across the globe, and this situation is likely to rise in the coming years, especially when non-communicable illnesses are not considered. As a result, cancer patients would greatly benefit from precise and timely diagnosis and prediction. Deep learning (DL) has become a common technique in healthcare due to the abundance of computational power. Gene expression datasets are frequently used in major DL-based applications for illness detection, notably in cancer therapy. The quantity of medical data, on the other hand, is often insufficient to fulfill deep learning requirements. Microarray gene expression datasets are used for training procedures despite their extreme dimensionality, limited volume of data samples, and sparsely available information. Data augmentation is commonly used to expand the training sample size for gene data. The Wasserstein Tabular Generative Adversarial Network (WT-GAN) model is used for the data augmentation process for generating synthetic data in this proposed work. The correlation-based feature selection technique selects the most relevant characteristics based on threshold values. Deep FNN and ML algorithms train and classify the gene expression samples. The augmented data give better classification results (> 97%) when using WT-GAN for cancer diagnosis.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"311"},"PeriodicalIF":3.3,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515488/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Medical Informatics and Decision Making
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1