BMC Medical Informatics and Decision Making最新文献_第9页

The red blood cell distribution width to albumin ratio was a potential prognostic biomarker for acute respiratory failure: a retrospective study 红细胞分布宽度与白蛋白比值是急性呼吸衰竭的潜在预后生物标志物：一项回顾性研究

IF 3.5 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-13 DOI: 10.1186/s12911-024-02639-4

Qian He, Song Hu, Jun xie, Hui Liu, Chong Li

The association between red blood cell distribution width (RDW) to albumin ratio (RAR) and prognosis in patients with acute respiratory failure (ARF) admitted to the Intensive Care Unit (ICU) remains unclear. This retrospective cohort study aims to investigate this association. Clinical information of ARF patients was collected from the Medical Information Mart for Intensive Care IV (MIMIC-IV) version 2.0 database. The primary outcome was, in-hospital mortality and secondary outcomes included 28-day mortality, 60-day mortality, length of hospital stay, and length of ICU stay. Cox regression models and subgroup analyses were conducted to explore the relationship between RAR and mortality. A total of 4547 patients with acute respiratory failure were enrolled, with 2277 in the low ratio group (RAR < 4.83) and 2270 in the high ratio group (RAR > = 4.83). Kaplan-Meier survival analysis demonstrated a significant difference in survival probability between the two groups. After adjusting for confounding factors, the Cox regression analysis showed that the high RAR ratio had a higher hazard ratio (HR) for in-hospital mortality (HR 1.22, 95% CI 1.07–1.40; P = 0.003), as well as for 28-day mortality and 60-day mortality. Propensity score-matched (PSM) analysis further supported the finding that high RAR was an independent risk factor for ARF. This study reveals that RAR is an independent risk factor for poor clinical prognosis in patients with ARF admitted to the ICU. Higher RAR levels were associated with increased in-hospital, 28-day and 60-day mortality rates.

重症监护室（ICU）收治的急性呼吸衰竭（ARF）患者的红细胞分布宽度（RDW）与白蛋白比值（RAR）与预后之间的关系仍不清楚。这项回顾性队列研究旨在调查这种关联。研究人员从重症监护医学信息市场 IV（MIMIC-IV）2.0 版数据库中收集了 ARF 患者的临床信息。主要结果为院内死亡率，次要结果包括 28 天死亡率、60 天死亡率、住院时间和重症监护室住院时间。研究人员通过 Cox 回归模型和亚组分析来探讨 RAR 与死亡率之间的关系。共有 4547 名急性呼吸衰竭患者入选，其中 2277 人属于低比值组（RAR = 4.83）。卡普兰-米尔生存分析显示，两组患者的生存概率存在显著差异。在调整了混杂因素后，Cox 回归分析表明，高 RAR 比率组的院内死亡率（HR 1.22，95% CI 1.07-1.40；P = 0.003）、28 天死亡率和 60 天死亡率的危险比（HR）更高。倾向评分匹配（PSM）分析进一步证实了高 RAR 是导致 ARF 的独立危险因素。这项研究表明，RAR是导致入住重症监护室的ARF患者临床预后不良的独立危险因素。较高的 RAR 水平与较高的院内死亡率、28 天死亡率和 60 天死亡率相关。

{"title":"The red blood cell distribution width to albumin ratio was a potential prognostic biomarker for acute respiratory failure: a retrospective study","authors":"Qian He, Song Hu, Jun xie, Hui Liu, Chong Li","doi":"10.1186/s12911-024-02639-4","DOIUrl":"https://doi.org/10.1186/s12911-024-02639-4","url":null,"abstract":"The association between red blood cell distribution width (RDW) to albumin ratio (RAR) and prognosis in patients with acute respiratory failure (ARF) admitted to the Intensive Care Unit (ICU) remains unclear. This retrospective cohort study aims to investigate this association. Clinical information of ARF patients was collected from the Medical Information Mart for Intensive Care IV (MIMIC-IV) version 2.0 database. The primary outcome was, in-hospital mortality and secondary outcomes included 28-day mortality, 60-day mortality, length of hospital stay, and length of ICU stay. Cox regression models and subgroup analyses were conducted to explore the relationship between RAR and mortality. A total of 4547 patients with acute respiratory failure were enrolled, with 2277 in the low ratio group (RAR < 4.83) and 2270 in the high ratio group (RAR > = 4.83). Kaplan-Meier survival analysis demonstrated a significant difference in survival probability between the two groups. After adjusting for confounding factors, the Cox regression analysis showed that the high RAR ratio had a higher hazard ratio (HR) for in-hospital mortality (HR 1.22, 95% CI 1.07–1.40; P = 0.003), as well as for 28-day mortality and 60-day mortality. Propensity score-matched (PSM) analysis further supported the finding that high RAR was an independent risk factor for ARF. This study reveals that RAR is an independent risk factor for poor clinical prognosis in patients with ARF admitted to the ICU. Higher RAR levels were associated with increased in-hospital, 28-day and 60-day mortality rates.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harnessing computational tools of the digital era for enhanced infection control 利用数字时代的计算工具加强感染控制

IF 3.5 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-12 DOI: 10.1186/s12911-024-02650-9

Francesco Branda

This paper explores the potential of artificial intelligence, machine learning, and big data analytics in revolutionizing infection control. It addresses the challenges and innovative approaches in combating infectious diseases and antimicrobial resistance, emphasizing the critical role of interdisciplinary collaboration, ethical data practices, and integration of advanced computational tools in modern healthcare.

本文探讨了人工智能、机器学习和大数据分析在彻底改变感染控制方面的潜力。它探讨了抗击传染病和抗菌药耐药性的挑战和创新方法，强调了跨学科合作、合乎道德的数据实践以及先进计算工具在现代医疗保健中的整合所发挥的关键作用。

引用次数: 0

Machine learning-based prognostic model for 30-day mortality prediction in Sepsis-3 基于机器学习的预后模型用于预测败血症-3 的 30 天死亡率

IF 3.5 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-09 DOI: 10.1186/s12911-024-02655-4

Md. Sohanur Rahman, Khandaker Reajul Islam, Johayra Prithula, Jaya Kumar, Mufti Mahmud, Mohammed Fasihul Alam, Mamun Bin Ibne Reaz, Abdulrahman Alqahtani, Muhammad E. H. Chowdhury

Sepsis poses a critical threat to hospitalized patients, particularly those in the Intensive Care Unit (ICU). Rapid identification of Sepsis is crucial for improving survival rates. Machine learning techniques offer advantages over traditional methods for predicting outcomes. This study aimed to develop a prognostic model using a Stacking-based Meta-Classifier to predict 30-day mortality risks in Sepsis-3 patients from the MIMIC-III database. A cohort of 4,240 Sepsis-3 patients was analyzed, with 783 experiencing 30-day mortality and 3,457 surviving. Fifteen biomarkers were selected using feature ranking methods, including Extreme Gradient Boosting (XGBoost), Random Forest, and Extra Tree, and the Logistic Regression (LR) model was used to assess their individual predictability with a fivefold cross-validation approach for the validation of the prediction. The dataset was balanced using the SMOTE-TOMEK LINK technique, and a stacking-based meta-classifier was used for 30-day mortality prediction. The SHapley Additive explanations analysis was performed to explain the model’s prediction. Using the LR classifier, the model achieved an area under the curve or AUC score of 0.99. A nomogram provided clinical insights into the biomarkers' significance. The stacked meta-learner, LR classifier exhibited the best performance with 95.52% accuracy, 95.79% precision, 95.52% recall, 93.65% specificity, and a 95.60% F1-score. In conjunction with the nomogram, the proposed stacking classifier model effectively predicted 30-day mortality in Sepsis patients. This approach holds promise for early intervention and improved outcomes in treating Sepsis cases.

败血症对住院病人，尤其是重症监护病房（ICU）的病人构成严重威胁。快速识别败血症对提高存活率至关重要。在预测结果方面，机器学习技术比传统方法更具优势。本研究旨在使用基于堆栈的元分类器开发一个预后模型，以预测来自 MIMIC-III 数据库的败血症-3 患者的 30 天死亡风险。研究分析了 4240 名败血症-3 患者，其中 783 人 30 天内死亡，3457 人存活。使用特征排序方法（包括极梯度提升（XGBoost）、随机森林和额外树）筛选出了 15 个生物标志物，并使用逻辑回归（LR）模型评估了它们各自的预测能力，同时采用五重交叉验证方法验证了预测结果。使用 SMOTE-TOMEK LINK 技术平衡数据集，并使用基于堆叠的元分类器进行 30 天死亡率预测。对模型的预测结果进行了SHapley加性解释分析。使用 LR 分类器，该模型的曲线下面积或 AUC 得分为 0.99。提名图为临床提供了生物标志物重要性的见解。堆叠元学习器 LR 分类器表现最佳，准确率为 95.52%，精确率为 95.79%，召回率为 95.52%，特异性为 93.65%，F1 分数为 95.60%。结合提名图，所提出的堆叠分类器模型可有效预测败血症患者的 30 天死亡率。这种方法有望对败血症病例进行早期干预并改善治疗效果。

{"title":"Machine learning-based prognostic model for 30-day mortality prediction in Sepsis-3","authors":"Md. Sohanur Rahman, Khandaker Reajul Islam, Johayra Prithula, Jaya Kumar, Mufti Mahmud, Mohammed Fasihul Alam, Mamun Bin Ibne Reaz, Abdulrahman Alqahtani, Muhammad E. H. Chowdhury","doi":"10.1186/s12911-024-02655-4","DOIUrl":"https://doi.org/10.1186/s12911-024-02655-4","url":null,"abstract":"Sepsis poses a critical threat to hospitalized patients, particularly those in the Intensive Care Unit (ICU). Rapid identification of Sepsis is crucial for improving survival rates. Machine learning techniques offer advantages over traditional methods for predicting outcomes. This study aimed to develop a prognostic model using a Stacking-based Meta-Classifier to predict 30-day mortality risks in Sepsis-3 patients from the MIMIC-III database. A cohort of 4,240 Sepsis-3 patients was analyzed, with 783 experiencing 30-day mortality and 3,457 surviving. Fifteen biomarkers were selected using feature ranking methods, including Extreme Gradient Boosting (XGBoost), Random Forest, and Extra Tree, and the Logistic Regression (LR) model was used to assess their individual predictability with a fivefold cross-validation approach for the validation of the prediction. The dataset was balanced using the SMOTE-TOMEK LINK technique, and a stacking-based meta-classifier was used for 30-day mortality prediction. The SHapley Additive explanations analysis was performed to explain the model’s prediction. Using the LR classifier, the model achieved an area under the curve or AUC score of 0.99. A nomogram provided clinical insights into the biomarkers' significance. The stacked meta-learner, LR classifier exhibited the best performance with 95.52% accuracy, 95.79% precision, 95.52% recall, 93.65% specificity, and a 95.60% F1-score. In conjunction with the nomogram, the proposed stacking classifier model effectively predicted 30-day mortality in Sepsis patients. This approach holds promise for early intervention and improved outcomes in treating Sepsis cases.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of anterior segment in primary angle closure suspect with deep learning models 利用深度学习模型分析原发性角闭合疑似症的前段情况

IF 3.5 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-09 DOI: 10.1186/s12911-024-02658-1

Ziwei Fu, Jinwei Xi, Zhi Ji, Ruxue Zhang, Jianping Wang, Rui Shi, Xiaoli Pu, Jingni Yu, Fang Xue, Jianrong Liu, Yanrong Wang, Hua Zhong, Jun Feng, Min Zhang, Yuan He

To analyze primary angle closure suspect (PACS) patients’ anatomical characteristics of anterior chamber configuration, and to establish artificial intelligence (AI)-aided diagnostic system for PACS screening. A total of 1668 scans of 839 patients were included in this cross-sectional study. The subjects were divided into two groups: PACS group and normal group. With anterior segment optical coherence tomography scans, the anatomical diversity between two groups was compared, and anterior segment structure features of PACS were extracted. Then, AI-aided diagnostic system was constructed, which based different algorithms such as classification and regression tree (CART), random forest (RF), logistic regression (LR), VGG-16 and Alexnet. Then the diagnostic efficiencies of different algorithms were evaluated, and compared with junior physicians and experienced ophthalmologists. RF [sensitivity (Se) = 0.84; specificity (Sp) = 0.92; positive predict value (PPV) = 0.82; negative predict value (NPV) = 0.95; area under the curve (AUC) = 0.90] and CART (Se = 0.76, Sp = 0.93, PPV = 0.85, NPV = 0.92, AUC = 0.90) showed better performance than LR (Se = 0.68, Sp = 0.91, PPV = 0.79, NPV = 0.90, AUC = 0.86). In convolutional neural networks (CNN), Alexnet (Se = 0.83, Sp = 0.95, PPV = 0.92, NPV = 0.87, AUC = 0.85) was better than VGG-16 (Se = 0.84, Sp = 0.90, PPV = 0.85, NPV = 0.90, AUC = 0.79). The performance of 2 CNN algorithms was better than 5 junior physicians, and the mean value of diagnostic indicators of 2 CNN algorithm was similar to experienced ophthalmologists. PACS patients have distinct anatomical characteristics compared with health controls. AI models for PACS screening are reliable and powerful, equivalent to experienced ophthalmologists.

分析原发性闭角型疑似病例（PACS）前房构型的解剖学特征，并建立人工智能（AI）辅助诊断系统用于PACS筛查。这项横断面研究共纳入了 839 名患者的 1668 次扫描。研究对象分为两组：PACS 组和正常组。通过前段光学相干断层扫描，比较两组之间的解剖多样性，并提取 PACS 的前段结构特征。然后，构建了基于分类回归树（CART）、随机森林（RF）、逻辑回归（LR）、VGG-16 和 Alexnet 等不同算法的人工智能辅助诊断系统。然后评估了不同算法的诊断效率，并与初级医师和经验丰富的眼科医师进行了比较。RF[灵敏度（Se）= 0.84；特异度（Sp）= 0.92；阳性预测值（PPV）= 0.82；阴性预测值（NPV）= 0.95；曲线下面积（AUC）= 0.90]和 CART（Se = 0.76，Sp = 0.93，PPV = 0.85，NPV = 0.92，AUC = 0.90）的性能优于 LR（Se = 0.68，Sp = 0.91，PPV = 0.79，NPV = 0.90，AUC = 0.86）。在卷积神经网络（CNN）中，Alexnet（Se = 0.83，Sp = 0.95，PPV = 0.92，NPV = 0.87，AUC = 0.85）优于 VGG-16（Se = 0.84，Sp = 0.90，PPV = 0.85，NPV = 0.90，AUC = 0.79）。2 种 CNN 算法的性能优于 5 位初级医师，而 2 种 CNN 算法的诊断指标平均值与经验丰富的眼科医师相近。与健康对照组相比，PACS 患者具有明显的解剖学特征。用于PACS筛查的人工智能模型可靠且功能强大，与经验丰富的眼科医生相当。

{"title":"Analysis of anterior segment in primary angle closure suspect with deep learning models","authors":"Ziwei Fu, Jinwei Xi, Zhi Ji, Ruxue Zhang, Jianping Wang, Rui Shi, Xiaoli Pu, Jingni Yu, Fang Xue, Jianrong Liu, Yanrong Wang, Hua Zhong, Jun Feng, Min Zhang, Yuan He","doi":"10.1186/s12911-024-02658-1","DOIUrl":"https://doi.org/10.1186/s12911-024-02658-1","url":null,"abstract":"To analyze primary angle closure suspect (PACS) patients’ anatomical characteristics of anterior chamber configuration, and to establish artificial intelligence (AI)-aided diagnostic system for PACS screening. A total of 1668 scans of 839 patients were included in this cross-sectional study. The subjects were divided into two groups: PACS group and normal group. With anterior segment optical coherence tomography scans, the anatomical diversity between two groups was compared, and anterior segment structure features of PACS were extracted. Then, AI-aided diagnostic system was constructed, which based different algorithms such as classification and regression tree (CART), random forest (RF), logistic regression (LR), VGG-16 and Alexnet. Then the diagnostic efficiencies of different algorithms were evaluated, and compared with junior physicians and experienced ophthalmologists. RF [sensitivity (Se) = 0.84; specificity (Sp) = 0.92; positive predict value (PPV) = 0.82; negative predict value (NPV) = 0.95; area under the curve (AUC) = 0.90] and CART (Se = 0.76, Sp = 0.93, PPV = 0.85, NPV = 0.92, AUC = 0.90) showed better performance than LR (Se = 0.68, Sp = 0.91, PPV = 0.79, NPV = 0.90, AUC = 0.86). In convolutional neural networks (CNN), Alexnet (Se = 0.83, Sp = 0.95, PPV = 0.92, NPV = 0.87, AUC = 0.85) was better than VGG-16 (Se = 0.84, Sp = 0.90, PPV = 0.85, NPV = 0.90, AUC = 0.79). The performance of 2 CNN algorithms was better than 5 junior physicians, and the mean value of diagnostic indicators of 2 CNN algorithm was similar to experienced ophthalmologists. PACS patients have distinct anatomical characteristics compared with health controls. AI models for PACS screening are reliable and powerful, equivalent to experienced ophthalmologists.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications 临床医生对将法律硕士纳入医疗保健的伦理问题的看法：对伦理问题和影响的专题分析

IF 3.5 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-09 DOI: 10.1186/s12911-024-02656-3

Tala Mirzaei, Leila Amini, Pouyan Esmaeilzadeh

This study aimed to explain and categorize key ethical concerns about integrating large language models (LLMs) in healthcare, drawing particularly from the perspectives of clinicians in online discussions. We analyzed 3049 posts and comments extracted from a self-identified clinician subreddit using unsupervised machine learning via Latent Dirichlet Allocation and a structured qualitative analysis methodology. Analysis uncovered 14 salient themes of ethical implications, which we further consolidated into 4 overarching domains reflecting ethical issues around various clinical applications of LLM in healthcare, LLM coding, algorithm, and data governance, LLM’s role in health equity and the distribution of public health services, and the relationship between users (human) and LLM systems (machine). Mapping themes to ethical frameworks in literature illustrated multifaceted issues covering transparent LLM decisions, fairness, privacy, access disparities, user experiences, and reliability. This study emphasizes the need for ongoing ethical review from stakeholders to ensure responsible innovation and advocates for tailored governance to enhance LLM use in healthcare, aiming to improve clinical outcomes ethically and effectively.

本研究旨在解释和归类在医疗保健领域整合大型语言模型（LLMs）的关键伦理问题，特别是借鉴临床医生在在线讨论中的观点。我们通过 Latent Dirichlet Allocation 无监督机器学习和结构化定性分析方法，分析了从一个自称临床医生的 subreddit 中提取的 3049 篇帖子和评论。通过分析，我们发现了 14 个具有伦理意义的突出主题，并将这些主题进一步整合为 4 个总体领域，分别反映了 LLM 在医疗保健领域的各种临床应用、LLM 编码、算法和数据管理、LLM 在健康公平和公共卫生服务分配中的作用以及用户（人类）与 LLM 系统（机器）之间的关系等方面的伦理问题。将主题与文献中的伦理框架进行映射，可以发现多方面的问题，包括透明的 LLM 决策、公平性、隐私、访问差异、用户体验和可靠性。本研究强调了利益相关者持续进行伦理审查的必要性，以确保负责任的创新，并倡导有针对性的治理，以加强 LLM 在医疗保健领域的应用，从而以合乎伦理的方式有效改善临床结果。

{"title":"Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications","authors":"Tala Mirzaei, Leila Amini, Pouyan Esmaeilzadeh","doi":"10.1186/s12911-024-02656-3","DOIUrl":"https://doi.org/10.1186/s12911-024-02656-3","url":null,"abstract":"This study aimed to explain and categorize key ethical concerns about integrating large language models (LLMs) in healthcare, drawing particularly from the perspectives of clinicians in online discussions. We analyzed 3049 posts and comments extracted from a self-identified clinician subreddit using unsupervised machine learning via Latent Dirichlet Allocation and a structured qualitative analysis methodology. Analysis uncovered 14 salient themes of ethical implications, which we further consolidated into 4 overarching domains reflecting ethical issues around various clinical applications of LLM in healthcare, LLM coding, algorithm, and data governance, LLM’s role in health equity and the distribution of public health services, and the relationship between users (human) and LLM systems (machine). Mapping themes to ethical frameworks in literature illustrated multifaceted issues covering transparent LLM decisions, fairness, privacy, access disparities, user experiences, and reliability. This study emphasizes the need for ongoing ethical review from stakeholders to ensure responsible innovation and advocates for tailored governance to enhance LLM use in healthcare, aiming to improve clinical outcomes ethically and effectively.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data privacy-aware machine learning approach in pancreatic cancer diagnosis. 胰腺癌诊断中的数据隐私感知机器学习方法。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-05 DOI: 10.1186/s12911-024-02657-2

Ömer Faruk Akmeşe

Problem: Pancreatic ductal adenocarcinoma (PDAC) is considered a highly lethal cancer due to its advanced stage diagnosis. The five-year survival rate after diagnosis is less than 10%. However, if diagnosed early, the five-year survival rate can reach up to 70%. Early diagnosis of PDAC can aid treatment and improve survival rates by taking necessary precautions. The challenge is to develop a reliable, data privacy-aware machine learning approach that can accurately diagnose pancreatic cancer with biomarkers.

Aim: The study aims to diagnose a patient's pancreatic cancer while ensuring the confidentiality of patient records. In addition, the study aims to guide researchers and clinicians in developing innovative methods for diagnosing pancreatic cancer.

Methods: Machine learning, a branch of artificial intelligence, can identify patterns by analyzing large datasets. The study pre-processed a dataset containing urine biomarkers with operations such as filling in missing values, cleaning outliers, and feature selection. The data was encrypted using the Fernet encryption algorithm to ensure confidentiality. Ten separate machine learning models were applied to predict individuals with PDAC. Performance metrics such as F1 score, recall, precision, and accuracy were used in the modeling process.

Results: Among the 590 clinical records analyzed, 199 (33.7%) belonged to patients with pancreatic cancer, 208 (35.3%) to patients with non-cancerous pancreatic disorders (such as benign hepatobiliary disease), and 183 (31%) to healthy individuals. The LGBM algorithm showed the highest efficiency by achieving an accuracy of 98.8%. The accuracy of the other algorithms ranged from 98 to 86%. In order to understand which features are more critical and which data the model is based on, the analysis found that the features "plasma_CA19_9", REG1A, TFF1, and LYVE1 have high importance levels. The LIME analysis also analyzed which features of the model are important in the decision-making process.

Conclusions: This research outlines a data privacy-aware machine learning tool for predicting PDAC. The results show that a promising approach can be presented for clinical application. Future research should expand the dataset and focus on validation by applying it to various populations.

问题：胰腺导管腺癌（PDAC）因处于晚期诊断阶段而被认为是一种致死率极高的癌症。确诊后的五年生存率不到 10%。然而，如果早期诊断，五年生存率可高达 70%。通过采取必要的预防措施，早期诊断 PDAC 可以帮助治疗并提高生存率。我们面临的挑战是开发一种可靠的、具有数据隐私意识的机器学习方法，利用生物标记物准确诊断胰腺癌。目的：该研究旨在诊断患者的胰腺癌，同时确保患者记录的保密性。此外，该研究还旨在指导研究人员和临床医生开发诊断胰腺癌的创新方法：机器学习是人工智能的一个分支，可以通过分析大型数据集来识别模式。该研究对包含尿液生物标志物的数据集进行了预处理，包括填补缺失值、清除异常值和特征选择等操作。数据使用 Fernet 加密算法进行加密，以确保保密性。十个独立的机器学习模型被用于预测 PDAC 患者。建模过程中使用了 F1 分数、召回率、精确度和准确度等性能指标：在分析的 590 份临床记录中，199 份（33.7%）属于胰腺癌患者，208 份（35.3%）属于非癌症胰腺疾病（如良性肝胆疾病）患者，183 份（31%）属于健康人。LGBM 算法的准确率达到 98.8%，效率最高。其他算法的准确率在 98% 到 86% 之间。为了了解哪些特征更重要以及模型基于哪些数据，分析发现 "plasma_CA19_9"、REG1A、TFF1 和 LYVE1 等特征的重要程度较高。LIME 分析还分析了模型的哪些特征在决策过程中非常重要：本研究概述了一种用于预测 PDAC 的数据隐私感知机器学习工具。结果表明，这种方法在临床应用中大有可为。未来的研究应扩大数据集，并将重点放在将其应用于不同人群的验证上。

{"title":"Data privacy-aware machine learning approach in pancreatic cancer diagnosis.","authors":"Ömer Faruk Akmeşe","doi":"10.1186/s12911-024-02657-2","DOIUrl":"10.1186/s12911-024-02657-2","url":null,"abstract":"Problem: Pancreatic ductal adenocarcinoma (PDAC) is considered a highly lethal cancer due to its advanced stage diagnosis. The five-year survival rate after diagnosis is less than 10%. However, if diagnosed early, the five-year survival rate can reach up to 70%. Early diagnosis of PDAC can aid treatment and improve survival rates by taking necessary precautions. The challenge is to develop a reliable, data privacy-aware machine learning approach that can accurately diagnose pancreatic cancer with biomarkers.Aim: The study aims to diagnose a patient's pancreatic cancer while ensuring the confidentiality of patient records. In addition, the study aims to guide researchers and clinicians in developing innovative methods for diagnosing pancreatic cancer.Methods: Machine learning, a branch of artificial intelligence, can identify patterns by analyzing large datasets. The study pre-processed a dataset containing urine biomarkers with operations such as filling in missing values, cleaning outliers, and feature selection. The data was encrypted using the Fernet encryption algorithm to ensure confidentiality. Ten separate machine learning models were applied to predict individuals with PDAC. Performance metrics such as F1 score, recall, precision, and accuracy were used in the modeling process.Results: Among the 590 clinical records analyzed, 199 (33.7%) belonged to patients with pancreatic cancer, 208 (35.3%) to patients with non-cancerous pancreatic disorders (such as benign hepatobiliary disease), and 183 (31%) to healthy individuals. The LGBM algorithm showed the highest efficiency by achieving an accuracy of 98.8%. The accuracy of the other algorithms ranged from 98 to 86%. In order to understand which features are more critical and which data the model is based on, the analysis found that the features \"plasma_CA19_9\", REG1A, TFF1, and LYVE1 have high importance levels. The LIME analysis also analyzed which features of the model are important in the decision-making process.Conclusions: This research outlines a data privacy-aware machine learning tool for predicting PDAC. The results show that a promising approach can be presented for clinical application. Future research should expand the dataset and focus on validation by applying it to various populations.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11375871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142139379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trustworthy and ethical AI-enabled cardiovascular care: a rapid review. 值得信赖、符合伦理道德的人工智能心血管护理：快速回顾。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-04 DOI: 10.1186/s12911-024-02653-6

Maryam Mooghali, Austin M Stroud, Dong Whi Yoo, Barbara A Barry, Alyssa A Grimshaw, Joseph S Ross, Xuan Zhu, Jennifer E Miller

Background: Artificial intelligence (AI) is increasingly used for prevention, diagnosis, monitoring, and treatment of cardiovascular diseases. Despite the potential for AI to improve care, ethical concerns and mistrust in AI-enabled healthcare exist among the public and medical community. Given the rapid and transformative recent growth of AI in cardiovascular care, to inform practice guidelines and regulatory policies that facilitate ethical and trustworthy use of AI in medicine, we conducted a literature review to identify key ethical and trust barriers and facilitators from patients' and healthcare providers' perspectives when using AI in cardiovascular care.

Methods: In this rapid literature review, we searched six bibliographic databases to identify publications discussing transparency, trust, or ethical concerns (outcomes of interest) associated with AI-based medical devices (interventions of interest) in the context of cardiovascular care from patients', caregivers', or healthcare providers' perspectives. The search was completed on May 24, 2022 and was not limited by date or study design.

Results: After reviewing 7,925 papers from six databases and 3,603 papers identified through citation chasing, 145 articles were included. Key ethical concerns included privacy, security, or confidentiality issues (n = 59, 40.7%); risk of healthcare inequity or disparity (n = 36, 24.8%); risk of patient harm (n = 24, 16.6%); accountability and responsibility concerns (n = 19, 13.1%); problematic informed consent and potential loss of patient autonomy (n = 17, 11.7%); and issues related to data ownership (n = 11, 7.6%). Major trust barriers included data privacy and security concerns, potential risk of patient harm, perceived lack of transparency about AI-enabled medical devices, concerns about AI replacing human aspects of care, concerns about prioritizing profits over patients' interests, and lack of robust evidence related to the accuracy and limitations of AI-based medical devices. Ethical and trust facilitators included ensuring data privacy and data validation, conducting clinical trials in diverse cohorts, providing appropriate training and resources to patients and healthcare providers and improving their engagement in different phases of AI implementation, and establishing further regulatory oversights.

Conclusion: This review revealed key ethical concerns and barriers and facilitators of trust in AI-enabled medical devices from patients' and healthcare providers' perspectives. Successful integration of AI into cardiovascular care necessitates implementation of mitigation strategies. These strategies should focus on enhanced regulatory oversight on the use of patient data and promoting transparency around the use of AI in patient care.

背景：人工智能（AI）越来越多地被用于心血管疾病的预防、诊断、监测和治疗。尽管人工智能具有改善医疗服务的潜力，但公众和医疗界对人工智能医疗存在道德担忧和不信任。鉴于最近人工智能在心血管医疗领域的快速发展和变革，为了给实践指南和监管政策提供信息，以促进在医疗领域使用人工智能时的伦理和信任，我们进行了一次文献综述，从患者和医疗服务提供者的角度确定在心血管医疗领域使用人工智能时的主要伦理和信任障碍及促进因素：在此次快速文献综述中，我们检索了六个文献数据库，以确定从患者、护理人员或医疗服务提供者的角度讨论心血管护理中与基于人工智能的医疗设备（相关干预措施）相关的透明度、信任或伦理问题（相关结果）的出版物。检索于 2022 年 5 月 24 日完成，不受日期或研究设计的限制：结果：在查阅了六个数据库中的 7925 篇论文和通过引文追逐确定的 3603 篇论文后，共纳入 145 篇文章。主要的伦理问题包括隐私、安全或保密问题（n = 59，40.7%）；医疗保健不公平或差异风险（n = 36，24.8%）；患者伤害风险（n = 24，16.6%）；问责和责任问题（n = 19，13.1%）；知情同意问题和潜在的患者自主权丧失（n = 17，11.7%）；以及与数据所有权相关的问题（n = 11，7.6%）。主要的信任障碍包括数据隐私和安全问题、潜在的患者伤害风险、认为人工智能医疗设备缺乏透明度、担心人工智能取代人工护理、担心利润优先于患者利益，以及缺乏与人工智能医疗设备的准确性和局限性相关的有力证据。促进伦理和信任的因素包括确保数据隐私和数据验证、在不同群体中开展临床试验、为患者和医疗服务提供者提供适当的培训和资源并提高他们在人工智能实施不同阶段的参与度，以及建立进一步的监管监督：本综述从患者和医疗服务提供者的角度揭示了人工智能医疗设备的关键伦理问题、信任障碍和促进因素。要将人工智能成功融入心血管护理，就必须实施缓解策略。这些策略应侧重于加强对患者数据使用的监管，并提高在患者护理中使用人工智能的透明度。

{"title":"Trustworthy and ethical AI-enabled cardiovascular care: a rapid review.","authors":"Maryam Mooghali, Austin M Stroud, Dong Whi Yoo, Barbara A Barry, Alyssa A Grimshaw, Joseph S Ross, Xuan Zhu, Jennifer E Miller","doi":"10.1186/s12911-024-02653-6","DOIUrl":"10.1186/s12911-024-02653-6","url":null,"abstract":"Background: Artificial intelligence (AI) is increasingly used for prevention, diagnosis, monitoring, and treatment of cardiovascular diseases. Despite the potential for AI to improve care, ethical concerns and mistrust in AI-enabled healthcare exist among the public and medical community. Given the rapid and transformative recent growth of AI in cardiovascular care, to inform practice guidelines and regulatory policies that facilitate ethical and trustworthy use of AI in medicine, we conducted a literature review to identify key ethical and trust barriers and facilitators from patients' and healthcare providers' perspectives when using AI in cardiovascular care.Methods: In this rapid literature review, we searched six bibliographic databases to identify publications discussing transparency, trust, or ethical concerns (outcomes of interest) associated with AI-based medical devices (interventions of interest) in the context of cardiovascular care from patients', caregivers', or healthcare providers' perspectives. The search was completed on May 24, 2022 and was not limited by date or study design.Results: After reviewing 7,925 papers from six databases and 3,603 papers identified through citation chasing, 145 articles were included. Key ethical concerns included privacy, security, or confidentiality issues (n = 59, 40.7%); risk of healthcare inequity or disparity (n = 36, 24.8%); risk of patient harm (n = 24, 16.6%); accountability and responsibility concerns (n = 19, 13.1%); problematic informed consent and potential loss of patient autonomy (n = 17, 11.7%); and issues related to data ownership (n = 11, 7.6%). Major trust barriers included data privacy and security concerns, potential risk of patient harm, perceived lack of transparency about AI-enabled medical devices, concerns about AI replacing human aspects of care, concerns about prioritizing profits over patients' interests, and lack of robust evidence related to the accuracy and limitations of AI-based medical devices. Ethical and trust facilitators included ensuring data privacy and data validation, conducting clinical trials in diverse cohorts, providing appropriate training and resources to patients and healthcare providers and improving their engagement in different phases of AI implementation, and establishing further regulatory oversights.Conclusion: This review revealed key ethical concerns and barriers and facilitators of trust in AI-enabled medical devices from patients' and healthcare providers' perspectives. Successful integration of AI into cardiovascular care necessitates implementation of mitigation strategies. These strategies should focus on enhanced regulatory oversight on the use of patient data and promoting transparency around the use of AI in patient care.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142131912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses. lab2clean：一种新算法，用于自动清理用于二次使用的回顾性临床实验室结果数据。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-03 DOI: 10.1186/s12911-024-02652-7

Ahmed Medhat Zayed, Arne Janssens, Pavlos Mamouris, Nicolas Delvaux

Background: The integrity of clinical research and machine learning models in healthcare heavily relies on the quality of underlying clinical laboratory data. However, the preprocessing of this data to ensure its reliability and accuracy remains a significant challenge due to variations in data recording and reporting standards.

Methods: We developed lab2clean, a novel algorithm aimed at automating and standardizing the cleaning of retrospective clinical laboratory results data. lab2clean was implemented as two R functions specifically designed to enhance data conformance and plausibility by standardizing result formats and validating result values. The functionality and performance of the algorithm were evaluated using two extensive electronic medical record (EMR) databases, encompassing various clinical settings.

Results: lab2clean effectively reduced the variability of laboratory results and identified potentially erroneous records. Upon deployment, it demonstrated effective and fast standardization and validation of substantial laboratory data records. The evaluation highlighted significant improvements in the conformance and plausibility of lab results, confirming the algorithm's efficacy in handling large-scale data sets.

Conclusions: lab2clean addresses the challenge of preprocessing and cleaning clinical laboratory data, a critical step in ensuring high-quality data for research outcomes. It offers a straightforward, efficient tool for researchers, improving the quality of clinical laboratory data, a major portion of healthcare data. Thereby, enhancing the reliability and reproducibility of clinical research outcomes and clinical machine learning models. Future developments aim to broaden its functionality and accessibility, solidifying its vital role in healthcare data management.

背景：医疗保健领域临床研究和机器学习模型的完整性在很大程度上依赖于基础临床实验室数据的质量。然而，由于数据记录和报告标准的差异，对这些数据进行预处理以确保其可靠性和准确性仍然是一项重大挑战：我们开发了 Lab2clean，这是一种新型算法，旨在实现回顾性临床实验室结果数据清理的自动化和标准化。Lab2clean 是以两个 R 函数的形式实现的，专门设计用于通过标准化结果格式和验证结果值来提高数据的一致性和可信度。结果：lab2clean 有效降低了化验结果的可变性，并识别了潜在的错误记录。在部署后，它对大量实验室数据记录进行了有效、快速的标准化和验证。结论：lab2clean 解决了临床实验室数据预处理和清理的难题，这是确保研究成果数据高质量的关键一步。它为研究人员提供了一个简单、高效的工具，提高了临床实验室数据（医疗数据的重要组成部分）的质量。从而提高临床研究成果和临床机器学习模型的可靠性和可重复性。未来的发展目标是扩大其功能性和可访问性，巩固其在医疗数据管理中的重要作用。

{"title":"lab2clean: a novel algorithm for automated cleaning of retrospective clinical laboratory results data for secondary uses.","authors":"Ahmed Medhat Zayed, Arne Janssens, Pavlos Mamouris, Nicolas Delvaux","doi":"10.1186/s12911-024-02652-7","DOIUrl":"10.1186/s12911-024-02652-7","url":null,"abstract":"Background: The integrity of clinical research and machine learning models in healthcare heavily relies on the quality of underlying clinical laboratory data. However, the preprocessing of this data to ensure its reliability and accuracy remains a significant challenge due to variations in data recording and reporting standards.Methods: We developed lab2clean, a novel algorithm aimed at automating and standardizing the cleaning of retrospective clinical laboratory results data. lab2clean was implemented as two R functions specifically designed to enhance data conformance and plausibility by standardizing result formats and validating result values. The functionality and performance of the algorithm were evaluated using two extensive electronic medical record (EMR) databases, encompassing various clinical settings.Results: lab2clean effectively reduced the variability of laboratory results and identified potentially erroneous records. Upon deployment, it demonstrated effective and fast standardization and validation of substantial laboratory data records. The evaluation highlighted significant improvements in the conformance and plausibility of lab results, confirming the algorithm's efficacy in handling large-scale data sets.Conclusions: lab2clean addresses the challenge of preprocessing and cleaning clinical laboratory data, a critical step in ensuring high-quality data for research outcomes. It offers a straightforward, efficient tool for researchers, improving the quality of clinical laboratory data, a major portion of healthcare data. Thereby, enhancing the reliability and reproducibility of clinical research outcomes and clinical machine learning models. Future developments aim to broaden its functionality and accessibility, solidifying its vital role in healthcare data management.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Factors affecting the survival of prediabetic patients: comparison of Cox proportional hazards model and random survival forest method. 影响糖尿病前期患者生存的因素：Cox 比例危险模型与随机生存森林法的比较。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-03 DOI: 10.1186/s12911-024-02648-3

Mehdi Sharafi, Mohammad Ali Mohsenpour, Sima Afrashteh, Mohammad Hassan Eftekhari, Azizallah Dehghan, Akram Farhadi, Aboubakr Jafarnezhad, Abdoljabbar Zakeri, Mehdi Azizmohammad Looha

Background: The worldwide prevalence of type 2 diabetes mellitus in adults is experiencing a rapid increase. This study aimed to identify the factors affecting the survival of prediabetic patients using a comparison of the Cox proportional hazards model (CPH) and the Random survival forest (RSF).

Method: This prospective cohort study was performed on 746 prediabetics in southwest Iran. The demographic, lifestyle, and clinical data of the participants were recorded. The CPH and RSF models were used to determine the patients' survival. Furthermore, the concordance index (C-index) and time-dependent receiver operating characteristic (ROC) curve were employed to compare the performance of the Cox proportional hazards (CPH) model and the random survival forest (RSF) model.

Results: The 5-year cumulative T2DM incidence was 12.73%. Based on the results of the CPH model, NAFLD (HR = 1.74, 95% CI: 1.06, 2.85), FBS (HR = 1.008, 95% CI: 1.005, 1.012) and increased abdominal fat (HR = 1.02, 95% CI: 1.01, 1.04) were directly associated with diabetes occurrence in prediabetic patients. The RSF model suggests that factors including FBS, waist circumference, depression, NAFLD, afternoon sleep, and female gender are the most important variables that predict diabetes. The C-index indicated that the RSF model has a higher percentage of agreement than the CPH model, and in the weighted Brier Score index, the RSF model had less error than the Kaplan-Meier and CPH model.

Conclusion: Our findings show that the incidence of diabetes was alarmingly high in Iran. The results suggested that several demographic and clinical factors are associated with diabetes occurrence in prediabetic patients. The high-risk population needs special measures for screening and care programs.

背景：全球成人 2 型糖尿病的发病率正在迅速上升。本研究旨在通过比较考克斯比例危险模型（CPH）和随机生存森林（RSF），确定影响糖尿病前期患者生存的因素：这项前瞻性队列研究的对象是伊朗西南部的 746 名糖尿病前期患者。研究记录了参与者的人口统计学、生活方式和临床数据。CPH 和 RSF 模型用于确定患者的存活率。此外，还采用了一致性指数（C-index）和时间依赖性接收器操作特征曲线（ROC）来比较考克斯比例危险（CPH）模型和随机生存森林（RSF）模型的性能：结果：5 年 T2DM 累计发病率为 12.73%。根据 CPH 模型的结果，非酒精性脂肪肝（HR = 1.74，95% CI：1.06, 2.85）、FBS（HR = 1.008，95% CI：1.005, 1.012）和腹部脂肪增加（HR = 1.02，95% CI：1.01, 1.04）与糖尿病前期患者的糖尿病发生直接相关。RSF 模型表明，包括 FBS、腰围、抑郁、非酒精性脂肪肝、下午睡眠和女性性别在内的因素是预测糖尿病的最重要变量。C指数表明，RSF模型的一致率高于CPH模型，在加权布赖尔评分指数中，RSF模型的误差小于Kaplan-Meier模型和CPH模型：我们的研究结果表明，伊朗的糖尿病发病率高得惊人。结果表明，一些人口和临床因素与糖尿病前期患者的糖尿病发生率有关。高危人群需要采取特别措施进行筛查和护理计划。

{"title":"Factors affecting the survival of prediabetic patients: comparison of Cox proportional hazards model and random survival forest method.","authors":"Mehdi Sharafi, Mohammad Ali Mohsenpour, Sima Afrashteh, Mohammad Hassan Eftekhari, Azizallah Dehghan, Akram Farhadi, Aboubakr Jafarnezhad, Abdoljabbar Zakeri, Mehdi Azizmohammad Looha","doi":"10.1186/s12911-024-02648-3","DOIUrl":"10.1186/s12911-024-02648-3","url":null,"abstract":"Background: The worldwide prevalence of type 2 diabetes mellitus in adults is experiencing a rapid increase. This study aimed to identify the factors affecting the survival of prediabetic patients using a comparison of the Cox proportional hazards model (CPH) and the Random survival forest (RSF).Method: This prospective cohort study was performed on 746 prediabetics in southwest Iran. The demographic, lifestyle, and clinical data of the participants were recorded. The CPH and RSF models were used to determine the patients' survival. Furthermore, the concordance index (C-index) and time-dependent receiver operating characteristic (ROC) curve were employed to compare the performance of the Cox proportional hazards (CPH) model and the random survival forest (RSF) model.Results: The 5-year cumulative T2DM incidence was 12.73%. Based on the results of the CPH model, NAFLD (HR = 1.74, 95% CI: 1.06, 2.85), FBS (HR = 1.008, 95% CI: 1.005, 1.012) and increased abdominal fat (HR = 1.02, 95% CI: 1.01, 1.04) were directly associated with diabetes occurrence in prediabetic patients. The RSF model suggests that factors including FBS, waist circumference, depression, NAFLD, afternoon sleep, and female gender are the most important variables that predict diabetes. The C-index indicated that the RSF model has a higher percentage of agreement than the CPH model, and in the weighted Brier Score index, the RSF model had less error than the Kaplan-Meier and CPH model.Conclusion: Our findings show that the incidence of diabetes was alarmingly high in Iran. The results suggested that several demographic and clinical factors are associated with diabetes occurrence in prediabetic patients. The high-risk population needs special measures for screening and care programs.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142124887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study. 多组学数据中多种数据类型的结合会提高还是阻碍生存预测的性能？大规模基准研究的启示。

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making

Pub Date : 2024-09-02 DOI: 10.1186/s12911-024-02642-9

Yingxia Li, Tobias Herold, Ulrich Mansmann, Roman Hornung

Background: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.

Methods: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.

Results: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.

Conclusions: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.

背景：基于多组学数据的预测建模结合了同一患者的多种组学数据，已显示出优于单组学预测建模的潜力。尽管获取数据的复杂性和成本较高，但这一领域的大多数研究都侧重于纳入多种类型的数据。普遍的假设是，增加数据类型的数量必然会提高预测性能。然而，整合信息量较少或冗余的数据类型可能会阻碍预测性能的提高。因此，确定能提高预测性能的 omics 数据类型的最有效组合，对于成本效益和准确预测至关重要：在这项研究中，我们利用 TCGA 数据库公开提供的 14 个具有右删失生存结果的癌症数据集，系统评估了所有 31 种可能组合的预测性能，其中至少包括五种基因组数据类型（mRNA、miRNA、甲基化、DNAseq 和拷贝数变异）中的一种。我们采用了各种预测方法，并在每个模型中增加了临床数据的权重，以充分利用它们的预测重要性。哈雷尔 C 指数和综合布赖尔得分被用作性能衡量标准。为了评估研究结果的稳健性，我们在所纳入数据集的层面上进行了引导分析。我们对关键结果进行了统计测试，并限制了测试次数，以确保出现假阳性结果的风险较低：结果：与预期相反，我们发现仅使用 mRNA 数据或结合使用 mRNA 和 miRNA 数据就足以分析大多数癌症类型。对于某些癌症类型，额外加入甲基化数据可改善预测结果。引入更多的数据类型非但不能提高性能，反而常常导致性能下降，而性能下降的幅度在两种性能测量方法之间有所不同：我们的研究结果对目前流行的观点提出了质疑，即在多组学生存预测中结合多种组学数据类型可提高预测性能。因此，应该重新考虑多组学预测中普遍采用的尽可能多的数据类型的方法，以避免预测结果不理想和不必要的开支。

{"title":"Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study.","authors":"Yingxia Li, Tobias Herold, Ulrich Mansmann, Roman Hornung","doi":"10.1186/s12911-024-02642-9","DOIUrl":"10.1186/s12911-024-02642-9","url":null,"abstract":"Background: Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions.Methods: In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives.Results: Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures.Conclusions: Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370316/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142119055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0