首页 > 最新文献

BMC Medical Informatics and Decision Making最新文献

英文 中文
Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis. 将机器学习方法应用于胰腺导管腺癌转移,鉴定出可靠且一致的候选生物标记物。
IF 3.3 3区 医学 Q1 Medicine Pub Date : 2024-06-20 DOI: 10.1186/s12911-024-02578-0
Tanakamol Mahawan, Teifion Luckett, Ainhoa Mielgo Iza, Natapol Pornputtapong, Eva Caamaño Gutiérrez

Background: Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources.

Methods: We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA.

Results: We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub.

Conclusion: This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path.

背景:机器学习(ML)在生物医学研究中发挥着至关重要的作用。然而,它在数据整合和不可再现性方面仍有局限性。为了应对这些挑战,我们需要稳健的方法。胰腺导管腺癌(PDAC)是一种侵袭性很强的癌症,早期发现率和存活率都很低,本研究将其作为一个案例进行研究。PDAC 缺乏可靠的诊断生物标志物,尤其是转移性生物标志物,而这一需求仍未得到满足。在本研究中,我们提出了一种基于 ML 的发现疾病生物标志物的方法,并将其应用于 PDAC 转移性复合生物标志物候选物的鉴定,同时展示了利用数据资源的优势:我们利用了来自五个公共存储库的原发性肿瘤 RNAseq 数据,汇集样本以最大限度地提高统计能力,并通过校正技术差异来整合数据。数据分为训练集和验证集。训练数据集通过 10 倍交叉验证过程进行变量选择,该过程结合了三种算法,每倍 100 个模型。在至少 80% 的模型和五次折叠中发现的基因被认为是稳健的,从而建立了一个共识多变量模型。使用训练数据集中的选定基因构建随机森林模型,并在验证集中进行测试。我们还通过仅使用验证数据重新校准模型来评估预测的准确性。通过使用 QIAGEN Ingenuity Pathway Analysis 和 GeneMANIA 进行富集和通路分析,探索了信号的生物学背景和相关性:结果:我们开发了一种管道,它能检测出稳健的特征,从而构建复合生物标记物。我们利用不同来源的转录组学数据在 PDAC 中测试了该管道,提出了一个由 15 个基因组成的候选复合生物标志物,这些基因被一致选中,显示出非常好的预测能力。生物学背景揭示了这些基因与癌症进展和转移的联系,强调了它们的潜在相关性。所有代码均可在 GitHub 上获取:本研究建立了一个稳健的框架,用于识别各种疾病背景下的复合生物标记物。我们为 PDAC 转移提出了一个可信的候选复合生物标记物,从而证明了该框架的潜力。通过重复使用公共存储库中的数据,我们强调了研究的可持续性以及我们的管道的广泛应用。初步研究结果为我们指明了一条前景广阔的验证和应用之路。
{"title":"Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis.","authors":"Tanakamol Mahawan, Teifion Luckett, Ainhoa Mielgo Iza, Natapol Pornputtapong, Eva Caamaño Gutiérrez","doi":"10.1186/s12911-024-02578-0","DOIUrl":"10.1186/s12911-024-02578-0","url":null,"abstract":"<p><strong>Background: </strong>Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources.</p><p><strong>Methods: </strong>We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA.</p><p><strong>Results: </strong>We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub.</p><p><strong>Conclusion: </strong>This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11191155/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141431473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic online nomogram for predicting renal outcomes of idiopathic membranous nephropathy. 用于预测特发性膜性肾病肾脏预后的动态在线提名图。
IF 3.3 3区 医学 Q1 Medicine Pub Date : 2024-06-19 DOI: 10.1186/s12911-024-02568-2
Feng Wang, Jiayi Xu, Fumei Wang, Xu Yang, Yang Xia, Hongli Zhou, Na Yi, Congcong Jiao, Xuesong Su, Beiru Zhang, Hua Zhou, Yanqiu Wang

Background: Because spontaneous remission is common in IMN, and there are adverse effects of immunosuppressive therapy, it is important to assess the risk of progressive loss of renal function before deciding whether and when to initiate immunosuppressive therapy. Therefore, this study aimed to establish a risk prediction model to predict patient prognosis and treatment response to help clinicians evaluate patient prognosis and decide on the best treatment regimen.

Methods: From September 2019 to December 2020, a total of 232 newly diagnosed IMN patients from three hospitals in Liaoning Province were enrolled. Logistic regression analysis selected the risk factors affecting the prognosis, and a dynamic online nomogram prognostic model was constructed based on extreme gradient boost, random forest, logistic regression machine learning algorithms. Receiver operating characteristic and calibration curves and decision curve analysis were utilized to assess the performance and clinical utility of the developed model.

Results: A total of 130 patients were in the training cohort and 102 patients in the validation cohort. Logistic regression analysis identified four risk factors: course ≥ 6 months, UTP, D-dimer and sPLA2R-Ab. The random forest algorithm showed the best performance with the highest AUROC (0.869). The nomogram had excellent discrimination ability, calibration ability and clinical practicability in both the training cohort and the validation cohort.

Conclusions: The dynamic online nomogram model can effectively assess the prognosis and treatment response of IMN patients. This will help clinicians assess the patient's prognosis more accurately, communicate with the patient in advance, and jointly select the most appropriate treatment plan.

背景:由于IMN的自发缓解很常见,且免疫抑制治疗存在不良反应,因此在决定是否及何时开始免疫抑制治疗之前,评估肾功能进行性丧失的风险非常重要。因此,本研究旨在建立风险预测模型,预测患者预后和治疗反应,帮助临床医生评估患者预后,决定最佳治疗方案:方法:从2019年9月至2020年12月,共纳入辽宁省三家医院232例新诊断的IMN患者。逻辑回归分析筛选出影响预后的危险因素,并基于极梯度提升、随机森林、逻辑回归机器学习算法构建了动态在线提名图预后模型。利用接收者操作特征曲线、校准曲线和决策曲线分析来评估所开发模型的性能和临床实用性:共有 130 名患者进入训练队列,102 名患者进入验证队列。逻辑回归分析确定了四个风险因素:病程≥6个月、UTP、D-二聚体和sPLA2R-Ab。随机森林算法表现最佳,AUROC(0.869)最高。该提名图在训练队列和验证队列中均具有出色的判别能力、校准能力和临床实用性:动态在线提名图模型能有效评估 IMN 患者的预后和治疗反应。结论:动态在线提名图模型能有效评估 IMN 患者的预后和治疗反应,有助于临床医生更准确地评估患者的预后,提前与患者沟通,共同选择最合适的治疗方案。
{"title":"A dynamic online nomogram for predicting renal outcomes of idiopathic membranous nephropathy.","authors":"Feng Wang, Jiayi Xu, Fumei Wang, Xu Yang, Yang Xia, Hongli Zhou, Na Yi, Congcong Jiao, Xuesong Su, Beiru Zhang, Hua Zhou, Yanqiu Wang","doi":"10.1186/s12911-024-02568-2","DOIUrl":"10.1186/s12911-024-02568-2","url":null,"abstract":"<p><strong>Background: </strong>Because spontaneous remission is common in IMN, and there are adverse effects of immunosuppressive therapy, it is important to assess the risk of progressive loss of renal function before deciding whether and when to initiate immunosuppressive therapy. Therefore, this study aimed to establish a risk prediction model to predict patient prognosis and treatment response to help clinicians evaluate patient prognosis and decide on the best treatment regimen.</p><p><strong>Methods: </strong>From September 2019 to December 2020, a total of 232 newly diagnosed IMN patients from three hospitals in Liaoning Province were enrolled. Logistic regression analysis selected the risk factors affecting the prognosis, and a dynamic online nomogram prognostic model was constructed based on extreme gradient boost, random forest, logistic regression machine learning algorithms. Receiver operating characteristic and calibration curves and decision curve analysis were utilized to assess the performance and clinical utility of the developed model.</p><p><strong>Results: </strong>A total of 130 patients were in the training cohort and 102 patients in the validation cohort. Logistic regression analysis identified four risk factors: course ≥ 6 months, UTP, D-dimer and sPLA2R-Ab. The random forest algorithm showed the best performance with the highest AUROC (0.869). The nomogram had excellent discrimination ability, calibration ability and clinical practicability in both the training cohort and the validation cohort.</p><p><strong>Conclusions: </strong>The dynamic online nomogram model can effectively assess the prognosis and treatment response of IMN patients. This will help clinicians assess the patient's prognosis more accurately, communicate with the patient in advance, and jointly select the most appropriate treatment plan.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11186104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current status of digital health interventions in the health system in Burkina Faso. 布基纳法索卫生系统中数字卫生干预措施的现状。
IF 3.3 3区 医学 Q1 Medicine Pub Date : 2024-06-19 DOI: 10.1186/s12911-024-02574-4
Bry Sylla, Boukary Ouedraogo, Salif Traore, Ousseni Ouedraogo, Léon Gueswendé Blaise Savadogo, Gayo Diallo

Background: Digital health is being used as an accelerator to improve the traditional healthcare system, aiding countries in achieving their sustainable development goals. Burkina Faso aims to harmonize its digital health interventions to guide its digital health strategy for the coming years. The current assessment represents upstream work to steer the development of this strategic plan.

Methods: This was a quantitative, descriptive study conducted between September 2022 and April 2023. It involved a two-part survey: a self-administered questionnaire distributed to healthcare information managers in facilities, and direct interviews conducted with software developers. This was complemented by a documentary review of the country's strategic and standards documents on digital transformation.

Results: Burkina Faso possesses a relatively comprehensive collection of governance documents pertaining to digital transformation. The study identified a total of 35 digital health interventions. Analysis showed that 89% of funding originated from technical and financial partners as well as the private sector. While the use of open-source technologies for the development of the applications, software, or platforms used to implement these digital health interventions is well established (77%), there remains a deficiency in the integration of data from different platforms. Furthermore, the classification of digital health interventions revealed an uneven distribution between the different elements across domains: the health system, the classification of digital health interventions (DHI), and the subsystems of the National Health Information System (NHIS). Most digital health intervention projects are still in the pilot phase (66%), with isolated electronic patient record initiatives remaining incomplete. Within the public sector, these records typically take the form of electronic registers or isolated specialty records in a hospital. Within the private sector, tool implementation varies based on expressed needs. Challenges persist in adhering to interoperability norms and standards during tool design, with minimal utilization of the data generated by the implemented tools.

Conclusion: This study provides an insightful overview of the digital health environment in Burkina Faso and highlights significant challenges regarding intervention strategies. The findings serve as a foundational resource for developing the digital health strategic plan. By addressing the identified shortcomings, this plan will provide a framework for guiding future digital health initiatives effectively.

背景:数字医疗正被用作改善传统医疗系统的加速器,帮助各国实现可持续发展目标。布基纳法索旨在协调其数字医疗干预措施,为未来几年的数字医疗战略提供指导。目前的评估是指导制定这一战略计划的上游工作:这是一项定量描述性研究,在 2022 年 9 月至 2023 年 4 月期间进行。调查由两部分组成:向医疗机构的医疗信息管理人员发放的自填式问卷,以及对软件开发人员进行的直接访谈。此外,还对该国有关数字化转型的战略和标准文件进行了文件审查:结果:布基纳法索拥有相对全面的与数字化转型相关的管理文件。研究共确定了 35 项数字医疗干预措施。分析表明,89% 的资金来自技术和金融合作伙伴以及私营部门。虽然使用开源技术开发用于实施这些数字健康干预措施的应用程序、软件或平台的做法已得到广泛认可(77%),但在整合来自不同平台的数据方面仍存在不足。此外,数字医疗干预措施的分类显示,不同领域的不同要素之间分布不均:医疗系统、数字医疗干预措施分类(DHI)和国家医疗信息系统(NHIS)的子系统。大多数数字医疗干预项目仍处于试点阶段(66%),个别电子病历项目仍未完成。在公共部门,这些记录通常采用电子登记簿或医院中孤立的专科记录的形式。在私营部门,工具的实施则根据明确的需求而有所不同。在工具设计过程中,在遵守互操作性规范和标准方面仍然存在挑战,对已实施工具所生成数据的利用率极低:本研究对布基纳法索的数字医疗环境进行了深入概述,并强调了干预策略方面的重大挑战。研究结果可作为制定数字医疗战略计划的基础资源。通过解决已发现的不足,该计划将为有效指导未来的数字健康计划提供一个框架。
{"title":"Current status of digital health interventions in the health system in Burkina Faso.","authors":"Bry Sylla, Boukary Ouedraogo, Salif Traore, Ousseni Ouedraogo, Léon Gueswendé Blaise Savadogo, Gayo Diallo","doi":"10.1186/s12911-024-02574-4","DOIUrl":"10.1186/s12911-024-02574-4","url":null,"abstract":"<p><strong>Background: </strong>Digital health is being used as an accelerator to improve the traditional healthcare system, aiding countries in achieving their sustainable development goals. Burkina Faso aims to harmonize its digital health interventions to guide its digital health strategy for the coming years. The current assessment represents upstream work to steer the development of this strategic plan.</p><p><strong>Methods: </strong>This was a quantitative, descriptive study conducted between September 2022 and April 2023. It involved a two-part survey: a self-administered questionnaire distributed to healthcare information managers in facilities, and direct interviews conducted with software developers. This was complemented by a documentary review of the country's strategic and standards documents on digital transformation.</p><p><strong>Results: </strong>Burkina Faso possesses a relatively comprehensive collection of governance documents pertaining to digital transformation. The study identified a total of 35 digital health interventions. Analysis showed that 89% of funding originated from technical and financial partners as well as the private sector. While the use of open-source technologies for the development of the applications, software, or platforms used to implement these digital health interventions is well established (77%), there remains a deficiency in the integration of data from different platforms. Furthermore, the classification of digital health interventions revealed an uneven distribution between the different elements across domains: the health system, the classification of digital health interventions (DHI), and the subsystems of the National Health Information System (NHIS). Most digital health intervention projects are still in the pilot phase (66%), with isolated electronic patient record initiatives remaining incomplete. Within the public sector, these records typically take the form of electronic registers or isolated specialty records in a hospital. Within the private sector, tool implementation varies based on expressed needs. Challenges persist in adhering to interoperability norms and standards during tool design, with minimal utilization of the data generated by the implemented tools.</p><p><strong>Conclusion: </strong>This study provides an insightful overview of the digital health environment in Burkina Faso and highlights significant challenges regarding intervention strategies. The findings serve as a foundational resource for developing the digital health strategic plan. By addressing the identified shortcomings, this plan will provide a framework for guiding future digital health initiatives effectively.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11186100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hematoma expansion prediction based on SMOTE and XGBoost algorithm. 基于 SMOTE 和 XGBoost 算法的血肿扩张预测。
IF 3.3 3区 医学 Q1 Medicine Pub Date : 2024-06-19 DOI: 10.1186/s12911-024-02561-9
Yan Li, Chaonan Du, Sikai Ge, Ruonan Zhang, Yiming Shao, Keyu Chen, Zhepeng Li, Fei Ma

Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.

血肿扩大(HE)是重大事故或疾病后自发性脑内出血(ICH)患者的一种高危症状,发生率很高。提前正确预测 HE 的发生对于帮助医生确定下一步的医疗措施至关重要。现有研究大多只关注 ICH 发生后 6 小时内的 HE 发生情况,而实际上有相当多的患者在前 6 小时后但在 24 小时内发生 HE。在本研究中,根据医生的建议,我们重点预测 24 小时内 HE 的发生情况,以及 24 小时内每 6 小时 HE 的发生情况。基于人口统计学和计算机断层扫描(CT)图像提取信息,我们使用了 XGBoost 方法来预测 24 小时内 HE 的发生率。在本研究中,为了解决医疗数据分析中经常出现的数据集高度不平衡的问题,我们使用了 SMOTE 算法来增强数据。为了对我们的方法进行评估,我们使用了由 582 份病历组成的数据集,并比较了我们提出的方法和几种机器学习方法的结果。实验结果表明,XGBoost 在 SMOTE 算法处理的平衡数据集上取得了最佳预测性能,准确率为 0.82,F1 分数为 0.82。此外,我们提出的方法对 6、12、18 和 24 小时内 HE 发生率的预测准确率分别为 0.89、0.82、0.87 和 0.94,表明我们提出的方法可以准确预测 24 小时内 HE 的发生率。
{"title":"Hematoma expansion prediction based on SMOTE and XGBoost algorithm.","authors":"Yan Li, Chaonan Du, Sikai Ge, Ruonan Zhang, Yiming Shao, Keyu Chen, Zhepeng Li, Fei Ma","doi":"10.1186/s12911-024-02561-9","DOIUrl":"10.1186/s12911-024-02561-9","url":null,"abstract":"<p><p>Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11186182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Extracting patient lifestyle characteristics from Dutch clinical text with BERT models. 更正:利用 BERT 模型从荷兰临床文本中提取患者的生活方式特征。
IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-06-17 DOI: 10.1186/s12911-024-02575-3
Hielke Muizelaar, Marcel Haas, Koert van Dortmont, Peter van der Putten, Marco Spruit
{"title":"Correction: Extracting patient lifestyle characteristics from Dutch clinical text with BERT models.","authors":"Hielke Muizelaar, Marcel Haas, Koert van Dortmont, Peter van der Putten, Marco Spruit","doi":"10.1186/s12911-024-02575-3","DOIUrl":"10.1186/s12911-024-02575-3","url":null,"abstract":"","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11184856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GEN-RWD Sandbox: bridging the gap between hospital data privacy and external research insights with distributed analytics. GEN-RWD 沙盒:利用分布式分析技术缩小医院数据隐私与外部研究见解之间的差距。
IF 3.5 3区 医学 Q1 Medicine Pub Date : 2024-06-17 DOI: 10.1186/s12911-024-02549-5
Benedetta Gottardelli, Roberto Gatta, Leonardo Nucciarelli, Andrada Mihaela Tudor, Erica Tavazzi, Mauro Vallati, Stefania Orini, Nicoletta Di Giorgi, Andrea Damiani

Background: Artificial intelligence (AI) has become a pivotal tool in advancing contemporary personalised medicine, with the goal of tailoring treatments to individual patient conditions. This has heightened the demand for access to diverse data from clinical practice and daily life for research, posing challenges due to the sensitive nature of medical information, including genetics and health conditions. Regulations like the Health Insurance Portability and Accountability Act (HIPAA) in the U.S. and the General Data Protection Regulation (GDPR) in Europe aim to strike a balance between data security, privacy, and the imperative for access.

Results: We present the Gemelli Generator - Real World Data (GEN-RWD) Sandbox, a modular multi-agent platform designed for distributed analytics in healthcare. Its primary objective is to empower external researchers to leverage hospital data while upholding privacy and ownership, obviating the need for direct data sharing. Docker compatibility adds an extra layer of flexibility, and scalability is assured through modular design, facilitating combinations of Proxy and Processor modules with various graphical interfaces. Security and reliability are reinforced through components like Identity and Access Management (IAM) agent, and a Blockchain-based notarisation module. Certification processes verify the identities of information senders and receivers.

Conclusions: The GEN-RWD Sandbox architecture achieves a good level of usability while ensuring a blend of flexibility, scalability, and security. Featuring a user-friendly graphical interface catering to diverse technical expertise, its external accessibility enables personnel outside the hospital to use the platform. Overall, the GEN-RWD Sandbox emerges as a comprehensive solution for healthcare distributed analytics, maintaining a delicate equilibrium between accessibility, scalability, and security.

背景:人工智能(AI)已成为推进当代个性化医疗的重要工具,其目标是根据患者的具体情况量身定制治疗方案。这提高了研究人员对获取临床实践和日常生活中各种数据的需求,但由于医疗信息(包括遗传学和健康状况)的敏感性,这也带来了挑战。美国的《健康保险可携性与责任法案》(HIPAA)和欧洲的《通用数据保护条例》(GDPR)等法规旨在平衡数据安全、隐私和访问的必要性:我们介绍了 Gemelli Generator - Real World Data (GEN-RWD) Sandbox,这是一个模块化多代理平台,专为医疗保健领域的分布式分析而设计。其主要目的是让外部研究人员能够利用医院数据,同时维护隐私和所有权,避免直接共享数据。Docker 兼容性增加了额外的灵活性,通过模块化设计确保了可扩展性,方便了代理和处理器模块与各种图形界面的组合。通过身份和访问管理(IAM)代理以及基于区块链的公证模块等组件,安全性和可靠性得到了加强。认证流程可验证信息发送方和接收方的身份:GEN-RWD 沙盒架构实现了良好的可用性,同时确保了灵活性、可扩展性和安全性的融合。GEN-RWD 沙盒具有用户友好的图形界面,可满足不同专业技术人员的需求,其外部可访问性使医院以外的人员也能使用该平台。总体而言,GEN-RWD 沙盒是医疗分布式分析的综合解决方案,在易用性、可扩展性和安全性之间保持了微妙的平衡。
{"title":"GEN-RWD Sandbox: bridging the gap between hospital data privacy and external research insights with distributed analytics.","authors":"Benedetta Gottardelli, Roberto Gatta, Leonardo Nucciarelli, Andrada Mihaela Tudor, Erica Tavazzi, Mauro Vallati, Stefania Orini, Nicoletta Di Giorgi, Andrea Damiani","doi":"10.1186/s12911-024-02549-5","DOIUrl":"10.1186/s12911-024-02549-5","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) has become a pivotal tool in advancing contemporary personalised medicine, with the goal of tailoring treatments to individual patient conditions. This has heightened the demand for access to diverse data from clinical practice and daily life for research, posing challenges due to the sensitive nature of medical information, including genetics and health conditions. Regulations like the Health Insurance Portability and Accountability Act (HIPAA) in the U.S. and the General Data Protection Regulation (GDPR) in Europe aim to strike a balance between data security, privacy, and the imperative for access.</p><p><strong>Results: </strong>We present the Gemelli Generator - Real World Data (GEN-RWD) Sandbox, a modular multi-agent platform designed for distributed analytics in healthcare. Its primary objective is to empower external researchers to leverage hospital data while upholding privacy and ownership, obviating the need for direct data sharing. Docker compatibility adds an extra layer of flexibility, and scalability is assured through modular design, facilitating combinations of Proxy and Processor modules with various graphical interfaces. Security and reliability are reinforced through components like Identity and Access Management (IAM) agent, and a Blockchain-based notarisation module. Certification processes verify the identities of information senders and receivers.</p><p><strong>Conclusions: </strong>The GEN-RWD Sandbox architecture achieves a good level of usability while ensuring a blend of flexibility, scalability, and security. Featuring a user-friendly graphical interface catering to diverse technical expertise, its external accessibility enables personnel outside the hospital to use the platform. Overall, the GEN-RWD Sandbox emerges as a comprehensive solution for healthcare distributed analytics, maintaining a delicate equilibrium between accessibility, scalability, and security.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11184891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WeChat assisted electronic symptom measurement for patients with adenomyosis. 针对子宫腺肌症患者的微信辅助电子症状测量。
IF 3.5 3区 医学 Q1 Medicine Pub Date : 2024-06-17 DOI: 10.1186/s12911-024-02570-8
Wei Xu, Xin Zhang, Fan Xu, Yuan Yuan, Ying Tang, Qiuling Shi

Purpose: Symptom assessment is central to appropriate adenomyosis management. Using a WeChat mini-program-based portal, we aimed to establish a valid symptom assessment scale of adenomyosis (AM-SAS) to precisely and timely identify needs of symptom management and ultimately, to alert disease recurrence.

Methods: A combination of intensive interviews of patients with adenomyosis and natural language processing on WeChat clinician-patient group communication was used to generate a pool of symptom items-related to adenomyosis. An expert panel shortened the list to form the provisional AM-SAS. The AM-SAS was built in a Wechat mini-programmer and sent to patients to exam the psychotically validity and clinical applicability through classic test theory and item response theory.

Results: Total 338 patients with adenomyosis (29 for interview, 179 for development, and 130 for external validation) and 86 gynecologists were included. The over 90% compliance to the WeChat-based symptom evaluate. The AM-SAS demonstrated the uni-dimensionality through Rasch analysis, good internal consistency (all Cronbach's alphas above 0.8), and test-retest reliability (intraclass correlation coefficients ranging from 0.65 to 0.84). Differences symptom severity score between patients in the anemic and normal hemoglobin groups (3.04 ± 3.17 vs. 5.68 ± 3.41, P < 0.001). In external validation, AM-SAS successfully detected differences in symptom burden and physical status between those with or without relapse.

Conclusion: Electronic PRO-based AM-SAS is a valuable instrument for monitoring AM-related symptoms. As an outcome measure of multiple symptoms in clinical trials, the AM-SAS may identify patients who need extensive care after discharge and capture significant beneficial changes of patients may have been overlooked.

Trial registration: This trial was approved by the institutional review board of the Chongqing Medical University and three participating hospitals (Medical Ethics Committee of Nanchong Central Hospital, Medical Ethics Committee of Affiliated Hospital of Southwest Medical University, and Medical Ethics Committee of Haifu Hospital) and registered in the Chinese Clinical Trial Registry (registration number ChiCTR2000038590), date of registration was 26/10/2020.

目的:症状评估是子宫腺肌症合理治疗的核心。通过基于微信小程序的入口,我们旨在建立一个有效的子宫腺肌症症状评估量表(AM-SAS),以准确、及时地识别症状管理需求,并最终警惕疾病复发:方法:通过对子宫腺肌症患者的深入访谈以及对微信医患交流群的自然语言处理,建立了子宫腺肌症相关症状项目库。专家小组对该清单进行了缩减,最终形成了临时的 AM-SAS。将 AM-SAS 制作成微信小程序并发送给患者,通过经典测验理论和项目反应理论检验其心理效度和临床适用性:共纳入338名子宫腺肌症患者(29人参与访谈,179人参与开发,130人参与外部验证)和86名妇科医生。基于微信的症状评估符合率超过 90%。通过 Rasch 分析,AM-SAS 具有单维性、良好的内部一致性(Cronbach's alphas 均高于 0.8)和测试-再测可靠性(类内相关系数在 0.65 至 0.84 之间)。贫血组和血红蛋白正常组患者的症状严重程度得分存在差异(3.04 ± 3.17 vs. 5.68 ± 3.41,P 结论:AM-SAS 是一种基于电子 PRO 的症状严重程度评分系统:基于PRO的电子AM-SAS是监测AM相关症状的重要工具。作为临床试验中多种症状的结果测量,AM-SAS 可以识别出出院后需要大量护理的患者,并捕捉到患者可能被忽视的重大有益变化:本试验经重庆医科大学机构审查委员会和三家参与医院(南充市中心医院医学伦理委员会、西南医科大学附属医院医学伦理委员会和海扶医院医学伦理委员会)批准,并在中国临床试验注册中心注册(注册号为ChiCTR2000038590),注册日期为2020年10月26日。
{"title":"WeChat assisted electronic symptom measurement for patients with adenomyosis.","authors":"Wei Xu, Xin Zhang, Fan Xu, Yuan Yuan, Ying Tang, Qiuling Shi","doi":"10.1186/s12911-024-02570-8","DOIUrl":"10.1186/s12911-024-02570-8","url":null,"abstract":"<p><strong>Purpose: </strong>Symptom assessment is central to appropriate adenomyosis management. Using a WeChat mini-program-based portal, we aimed to establish a valid symptom assessment scale of adenomyosis (AM-SAS) to precisely and timely identify needs of symptom management and ultimately, to alert disease recurrence.</p><p><strong>Methods: </strong>A combination of intensive interviews of patients with adenomyosis and natural language processing on WeChat clinician-patient group communication was used to generate a pool of symptom items-related to adenomyosis. An expert panel shortened the list to form the provisional AM-SAS. The AM-SAS was built in a Wechat mini-programmer and sent to patients to exam the psychotically validity and clinical applicability through classic test theory and item response theory.</p><p><strong>Results: </strong>Total 338 patients with adenomyosis (29 for interview, 179 for development, and 130 for external validation) and 86 gynecologists were included. The over 90% compliance to the WeChat-based symptom evaluate. The AM-SAS demonstrated the uni-dimensionality through Rasch analysis, good internal consistency (all Cronbach's alphas above 0.8), and test-retest reliability (intraclass correlation coefficients ranging from 0.65 to 0.84). Differences symptom severity score between patients in the anemic and normal hemoglobin groups (3.04 ± 3.17 vs. 5.68 ± 3.41, P < 0.001). In external validation, AM-SAS successfully detected differences in symptom burden and physical status between those with or without relapse.</p><p><strong>Conclusion: </strong>Electronic PRO-based AM-SAS is a valuable instrument for monitoring AM-related symptoms. As an outcome measure of multiple symptoms in clinical trials, the AM-SAS may identify patients who need extensive care after discharge and capture significant beneficial changes of patients may have been overlooked.</p><p><strong>Trial registration: </strong>This trial was approved by the institutional review board of the Chongqing Medical University and three participating hospitals (Medical Ethics Committee of Nanchong Central Hospital, Medical Ethics Committee of Affiliated Hospital of Southwest Medical University, and Medical Ethics Committee of Haifu Hospital) and registered in the Chinese Clinical Trial Registry (registration number ChiCTR2000038590), date of registration was 26/10/2020.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11181603/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative learning from distributed data with differentially private synthetic data. 利用不同的私有合成数据从分布式数据中进行协作学习。
IF 3.5 3区 医学 Q1 Medicine Pub Date : 2024-06-14 DOI: 10.1186/s12911-024-02563-7
Lukas Prediger, Joonas Jälkö, Antti Honkela, Samuel Kaski

Background: Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible due to privacy concerns and parties are unable to engage in centrally coordinated joint computation. We study the feasibility of combining privacy preserving synthetic data sets in place of the original data for collaborative learning on real-world health data from the UK Biobank.

Methods: We perform an empirical evaluation based on an existing prospective cohort study from the literature. Multiple parties were simulated by splitting the UK Biobank cohort along assessment centers, for which we generate synthetic data using differentially private generative modelling techniques. We then apply the original study's Poisson regression analysis on the combined synthetic data sets and evaluate the effects of 1) the size of local data set, 2) the number of participating parties, and 3) local shifts in distributions, on the obtained likelihood scores.

Results: We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of the regression parameters compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become up to a certain limit. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups.

Conclusions: Based on our results we conclude that sharing of synthetic data is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. Lack of access to distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.

背景:考虑这样一种情况:持有敏感数据的多方旨在合作学习人口级统计数据,但由于隐私问题,无法汇集敏感数据集,各方也无法进行集中协调的联合计算。我们研究了结合隐私保护合成数据集代替原始数据,对英国生物库的真实健康数据进行协作学习的可行性:我们根据文献中现有的前瞻性队列研究进行了实证评估。我们通过将英国生物库队列按评估中心拆分来模拟多方,并使用差异化私有生成建模技术生成合成数据。然后,我们将原始研究的泊松回归分析应用于合并的合成数据集,并评估 1)本地数据集的大小;2)参与方的数量;3)分布的局部变化对所得似然比分数的影响:我们发现,与仅使用本地数据相比,通过共享合成数据参与协作学习的各方能获得更准确的回归参数估计。这一发现也适用于小型异构数据集的困难情况。此外,参与方越多,在一定限度内,改进幅度越大,一致性越强。最后,我们发现数据共享尤其有助于数据中包含代表性不足群体的各方对这些群体进行更好的调整分析:根据我们的研究结果,我们得出结论:共享合成数据是一种可行的方法,可以在不违反隐私限制的情况下从敏感数据中进行学习,即使单个数据集很小或不能很好地代表整个群体。无法访问分布式敏感数据往往是生物医学研究的瓶颈,而我们的研究表明,保护隐私的协作学习方法可以缓解这一问题。
{"title":"Collaborative learning from distributed data with differentially private synthetic data.","authors":"Lukas Prediger, Joonas Jälkö, Antti Honkela, Samuel Kaski","doi":"10.1186/s12911-024-02563-7","DOIUrl":"10.1186/s12911-024-02563-7","url":null,"abstract":"<p><strong>Background: </strong>Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible due to privacy concerns and parties are unable to engage in centrally coordinated joint computation. We study the feasibility of combining privacy preserving synthetic data sets in place of the original data for collaborative learning on real-world health data from the UK Biobank.</p><p><strong>Methods: </strong>We perform an empirical evaluation based on an existing prospective cohort study from the literature. Multiple parties were simulated by splitting the UK Biobank cohort along assessment centers, for which we generate synthetic data using differentially private generative modelling techniques. We then apply the original study's Poisson regression analysis on the combined synthetic data sets and evaluate the effects of 1) the size of local data set, 2) the number of participating parties, and 3) local shifts in distributions, on the obtained likelihood scores.</p><p><strong>Results: </strong>We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of the regression parameters compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become up to a certain limit. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups.</p><p><strong>Conclusions: </strong>Based on our results we conclude that sharing of synthetic data is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. Lack of access to distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11179391/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surprising and novel multivariate sequential patterns using odds ratio for temporal evolution in healthcare. 利用几率比对医疗保健中的时间演化,发现令人惊讶的新型多变量序列模式。
IF 3.5 3区 医学 Q1 Medicine Pub Date : 2024-06-13 DOI: 10.1186/s12911-024-02566-4
Isidoro J Casanova, Manuel Campos, Jose M Juarez, Antonio Gomariz, Bernardo Canovas-Segura, Marta Lorente-Ros, Jose A Lorente

Background: Pattern mining techniques are helpful tools when extracting new knowledge in real practice, but the overwhelming number of patterns is still a limiting factor in the health-care domain. Current efforts concerning the definition of measures of interest for patterns are focused on reducing the number of patterns and quantifying their relevance (utility/usefulness). However, although the temporal dimension plays a key role in medical records, few efforts have been made to extract temporal knowledge about the patient's evolution from multivariate sequential patterns.

Methods: In this paper, we propose a method to extract a new type of patterns in the clinical domain called Jumping Diagnostic Odds Ratio Sequential Patterns (JDORSP). The aim of this method is to employ the odds ratio to identify a concise set of sequential patterns that represent a patient's state with a statistically significant protection factor (i.e., a pattern associated with patients that survive) and those extensions whose evolution suddenly changes the patient's clinical state, thus making the sequential patterns a statistically significant risk factor (i.e., a pattern associated with patients that do not survive), or vice versa.

Results: The results of our experiments highlight that our method reduces the number of sequential patterns obtained with state-of-the-art pattern reduction methods by over 95%. Only by achieving this drastic reduction can medical experts carry out a comprehensive clinical evaluation of the patterns that might be considered medical knowledge regarding the temporal evolution of the patients. We have evaluated the surprisingness and relevance of the sequential patterns with clinicians, and the most interesting fact is the high surprisingness of the extensions of the patterns that become a protection factor, that is, the patients that recover after several days of being at high risk of dying.

Conclusions: Our proposed method with which to extract JDORSP generates a set of interpretable multivariate sequential patterns with new knowledge regarding the temporal evolution of the patients. The number of patterns is greatly reduced when compared to those generated by other methods and measures of interest. An additional advantage of this method is that it does not require any parameters or thresholds, and that the reduced number of patterns allows a manual evaluation.

背景:模式挖掘技术是在实际工作中提取新知识的有用工具,但在医疗保健领域,模式数量过多仍然是一个限制因素。目前,有关模式兴趣度量定义的工作主要集中在减少模式数量和量化其相关性(实用性/有用性)上。然而,尽管时间维度在医疗记录中起着关键作用,但从多变量序列模式中提取有关患者演变的时间知识的工作却少之又少:在本文中,我们提出了一种在临床领域提取新型模式的方法,称为跳跃诊断赔率序列模式(JDORSP)。该方法的目的是利用几率比来识别一组简明的序列模式,这些模式代表了具有统计意义的保护因素的患者状态(即与存活患者相关的模式),以及那些其演变会突然改变患者临床状态的扩展,从而使序列模式成为具有统计意义的风险因素(即与不存活患者相关的模式),反之亦然:实验结果表明,我们的方法将最先进的模式缩减方法获得的序列模式数量减少了 95% 以上。只有通过这种大幅减少的方法,医学专家才能对模式进行全面的临床评估,这些模式可被视为有关患者时间演变的医学知识。我们与临床医生一起评估了序列模式的出奇性和相关性,最有趣的事实是模式扩展的出奇性很高,这些模式成为了一个保护因素,也就是说,患者在面临死亡高风险数天后恢复了健康:我们提出的提取 JDORSP 的方法可以生成一组可解释的多变量序列模式,并提供有关患者时间演变的新知识。与其他方法和相关指标相比,该方法生成的模式数量大大减少。这种方法的另一个优点是,它不需要任何参数或阈值,而且由于模式数量减少,可以进行人工评估。
{"title":"Surprising and novel multivariate sequential patterns using odds ratio for temporal evolution in healthcare.","authors":"Isidoro J Casanova, Manuel Campos, Jose M Juarez, Antonio Gomariz, Bernardo Canovas-Segura, Marta Lorente-Ros, Jose A Lorente","doi":"10.1186/s12911-024-02566-4","DOIUrl":"10.1186/s12911-024-02566-4","url":null,"abstract":"<p><strong>Background: </strong>Pattern mining techniques are helpful tools when extracting new knowledge in real practice, but the overwhelming number of patterns is still a limiting factor in the health-care domain. Current efforts concerning the definition of measures of interest for patterns are focused on reducing the number of patterns and quantifying their relevance (utility/usefulness). However, although the temporal dimension plays a key role in medical records, few efforts have been made to extract temporal knowledge about the patient's evolution from multivariate sequential patterns.</p><p><strong>Methods: </strong>In this paper, we propose a method to extract a new type of patterns in the clinical domain called Jumping Diagnostic Odds Ratio Sequential Patterns (JDORSP). The aim of this method is to employ the odds ratio to identify a concise set of sequential patterns that represent a patient's state with a statistically significant protection factor (i.e., a pattern associated with patients that survive) and those extensions whose evolution suddenly changes the patient's clinical state, thus making the sequential patterns a statistically significant risk factor (i.e., a pattern associated with patients that do not survive), or vice versa.</p><p><strong>Results: </strong>The results of our experiments highlight that our method reduces the number of sequential patterns obtained with state-of-the-art pattern reduction methods by over 95%. Only by achieving this drastic reduction can medical experts carry out a comprehensive clinical evaluation of the patterns that might be considered medical knowledge regarding the temporal evolution of the patients. We have evaluated the surprisingness and relevance of the sequential patterns with clinicians, and the most interesting fact is the high surprisingness of the extensions of the patterns that become a protection factor, that is, the patients that recover after several days of being at high risk of dying.</p><p><strong>Conclusions: </strong>Our proposed method with which to extract JDORSP generates a set of interpretable multivariate sequential patterns with new knowledge regarding the temporal evolution of the patients. The number of patterns is greatly reduced when compared to those generated by other methods and measures of interest. An additional advantage of this method is that it does not require any parameters or thresholds, and that the reduced number of patterns allows a manual evaluation.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11170878/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141316791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative prediction of postpartum hemorrhage in cesarean section on machine learning. 利用机器学习定量预测剖腹产产后出血。
IF 3.5 3区 医学 Q1 Medicine Pub Date : 2024-06-13 DOI: 10.1186/s12911-024-02571-7
Meng Wang, Gao Yi, Yunjia Zhang, Mei Li, Jin Zhang

Background: Cesarean section-induced postpartum hemorrhage (PPH) potentially causes anemia and hypovolemic shock in pregnant women. Hence, it is helpful for obstetricians and anesthesiologists to prepare pre-emptive prevention when predicting PPH occurrence in advance. However, current works on PPH prediction focus on whether PPH occurs rather than assessing PPH amount. To this end, this work studies quantitative PPH prediction with machine learning (ML).

Methods: The study cohort in this paper was selected from individuals with PPH who were hospitalized at Shijiazhuang Obstetrics and Gynecology Hospital from 2020 to 2022. In this study cohort, we built a dataset with 6,144 subjects covering clinical parameters, anesthesia operation records, laboratory examination results, and other information in the electronic medical record system. Based on our built dataset, we exploit six different ML models, including logistic regression, linear regression, gradient boosting, XGBoost, multilayer perceptron, and random forest, to automatically predict the amount of bleeding during cesarean section. Eighty percent of the dataset was used as model training, and 20 % was used for verification. Those ML models are constantly verified and improved by root mean squared error(RMSE) and mean absolute error(MAE). Moreover, we also leverage the importance of permutation and partial dependence plot (PDP) to discuss their feasibility.

Result: The experiment results show that random forest obtains the highest accuracy for PPH amount prediction compared to other ML methods. Random forest reaches the mean absolute error of 21.7, less than 5.4 % prediction error. It also gains the root mean squared error of 33.75, less than 9.3 % prediction error. On the other hand, the experimental results also disclose indicators that contributed most to PPH prediction, including Ca, hemoglobin, white blood cells, platelets, Na, and K.

Conclusion: It effectively predicts the amount of PPH during a cesarean section by ML methods, especially random forest. With the above insight, ML predicting PPH amounts provides early warning for clinicians, thus reducing complications and improving cesarean sections' safety. Furthermore, the importance of ML and permutation, complemented by incorporating PDP, promises to provide clinicians with a transparent indication of individual risk prediction.

背景:剖宫产引起的产后出血(PPH)可能会导致孕妇贫血和低血容量性休克。因此,提前预测 PPH 的发生有助于产科医生和麻醉师做好预防准备。然而,目前有关 PPH 预测的研究主要集中在是否发生 PPH,而不是评估 PPH 的量。为此,本研究利用机器学习(ML)对 PPH 进行定量预测:本文的研究队列选自 2020 年至 2022 年在石家庄市妇产医院住院治疗的 PPH 患者。在该研究队列中,我们建立了一个包含 6,144 名受试者的数据集,涵盖了电子病历系统中的临床参数、麻醉操作记录、实验室检查结果和其他信息。根据建立的数据集,我们利用六种不同的 ML 模型,包括逻辑回归、线性回归、梯度提升、XGBoost、多层感知器和随机森林,来自动预测剖宫产术中的出血量。数据集的 80% 用于模型训练,20% 用于验证。这些 ML 模型通过均方根误差(RMSE)和平均绝对误差(MAE)不断得到验证和改进。此外,我们还利用排列和部分依赖图(PDP)的重要性来讨论其可行性:实验结果表明,与其他 ML 方法相比,随机森林预测 PPH 的准确率最高。随机森林的平均绝对误差为 21.7,小于 5.4%的预测误差。它还获得了 33.75 的均方根误差,小于 9.3 % 的预测误差。另一方面,实验结果还显示了对 PPH 预测贡献最大的指标,包括 Ca、血红蛋白、白细胞、血小板、Na 和 K:通过 ML 方法,尤其是随机森林方法,可以有效预测剖宫产过程中 PPH 的发生量。有了上述认识,ML 预测 PPH 量可为临床医生提供早期预警,从而减少并发症,提高剖宫产手术的安全性。此外,ML 和置换的重要性,再辅以 PDP,有望为临床医生提供透明的个体风险预测指示。
{"title":"Quantitative prediction of postpartum hemorrhage in cesarean section on machine learning.","authors":"Meng Wang, Gao Yi, Yunjia Zhang, Mei Li, Jin Zhang","doi":"10.1186/s12911-024-02571-7","DOIUrl":"10.1186/s12911-024-02571-7","url":null,"abstract":"<p><strong>Background: </strong>Cesarean section-induced postpartum hemorrhage (PPH) potentially causes anemia and hypovolemic shock in pregnant women. Hence, it is helpful for obstetricians and anesthesiologists to prepare pre-emptive prevention when predicting PPH occurrence in advance. However, current works on PPH prediction focus on whether PPH occurs rather than assessing PPH amount. To this end, this work studies quantitative PPH prediction with machine learning (ML).</p><p><strong>Methods: </strong>The study cohort in this paper was selected from individuals with PPH who were hospitalized at Shijiazhuang Obstetrics and Gynecology Hospital from 2020 to 2022. In this study cohort, we built a dataset with 6,144 subjects covering clinical parameters, anesthesia operation records, laboratory examination results, and other information in the electronic medical record system. Based on our built dataset, we exploit six different ML models, including logistic regression, linear regression, gradient boosting, XGBoost, multilayer perceptron, and random forest, to automatically predict the amount of bleeding during cesarean section. Eighty percent of the dataset was used as model training, and 20 <math><mo>%</mo></math> was used for verification. Those ML models are constantly verified and improved by root mean squared error(RMSE) and mean absolute error(MAE). Moreover, we also leverage the importance of permutation and partial dependence plot (PDP) to discuss their feasibility.</p><p><strong>Result: </strong>The experiment results show that random forest obtains the highest accuracy for PPH amount prediction compared to other ML methods. Random forest reaches the mean absolute error of 21.7, less than 5.4 <math><mo>%</mo></math> prediction error. It also gains the root mean squared error of 33.75, less than 9.3 <math><mo>%</mo></math> prediction error. On the other hand, the experimental results also disclose indicators that contributed most to PPH prediction, including Ca, hemoglobin, white blood cells, platelets, Na, and K.</p><p><strong>Conclusion: </strong>It effectively predicts the amount of PPH during a cesarean section by ML methods, especially random forest. With the above insight, ML predicting PPH amounts provides early warning for clinicians, thus reducing complications and improving cesarean sections' safety. Furthermore, the importance of ML and permutation, complemented by incorporating PDP, promises to provide clinicians with a transparent indication of individual risk prediction.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11177388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141316790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Medical Informatics and Decision Making
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1