首页 > 最新文献

Diagnostic and prognostic research最新文献

英文 中文
Quantifying the versatility of routinely measured prognostic factors. 量化常规测量预后因素的通用性。
IF 2.6 Pub Date : 2025-12-04 DOI: 10.1186/s41512-025-00206-7
Hamish Innes, Philip J Johnson

Background: Some prognostic factors (PF) are "versatile" insofar as they predict diverse health outcomes (age is an exemplar par excellence). In this study, we sought to quantify the versatility of commonly measured PFs.

Methods: Participants from the UKB (UK biobank) study were followed from enrolment until the date of outcome or censoring. Over 800 incident adverse outcomes were considered, based on a unique 3-digit ICD code (A00, A01, A02, etc.). Twenty-four routine PFs-including renal, liver function and blood count biomarkers-featured in this analysis. Cox regression was used to determine the association between each PF with time to each outcome event. The number of statistically significant associations, direction of the association (positive/negative) and the median log hazard ratio (LHR) were determined for each PF. Data were visualised using Volcano plots.

Results: The analysis included up to 502,408 UKB participants followed for 12.4 years. PFs with the greatest number of statistically significant associations were age (563/836; median LHR: 0.47), waist-hip ratio (530/836; LHR: 0.37) and hand-grip strength (416/836; LHR: 0.27). Conversely, PFs with the lowest number of significant associations were diastolic blood pressure (138/835; LHR: 0.11) and total protein (138/835; LHR: 0.11). Positive correlation was observed between the number of events a PF was associated with and the average effect size for those associations.

Conclusion: A wide spectrum exists between the most and least versatile PFs. In addition to age, waist-hip ratio and handgrip strength exhibit high versatility. Understanding PF versatility has implications for optimising the development/performance of prognostic models.

背景:一些预后因素(PF)是“通用的”,因为它们预测不同的健康结果(年龄是一个典型的例子)。在这项研究中,我们试图量化通常测量的PFs的通用性。方法:来自UKB(英国生物银行)研究的参与者从入组到结果或审查日期进行随访。根据唯一的3位数ICD代码(A00, A01, A02等),考虑了800多个事件不良后果。24项常规pfs(包括肾功能、肝功能和血细胞计数生物标志物)出现在本分析中。使用Cox回归来确定每个PF与时间与每个结果事件之间的关联。统计上显著关联的数量、关联方向(正/负)和中位对数风险比(LHR)被确定为每个PF。数据使用Volcano图可视化。结果:分析包括多达502408名UKB参与者,随访12.4年。年龄(563/836,中位LHR: 0.47)、腰臀比(530/836,中位LHR: 0.37)和握力(416/836,中位LHR: 0.27)是有统计学意义的PFs。相反,显著相关性最低的PFs是舒张压(138/835,LHR: 0.11)和总蛋白(138/835,LHR: 0.11)。在与PF相关的事件数与这些关联的平均效应大小之间观察到正相关。结论:大多数和最不通用的PFs之间存在广泛的谱。除了年龄,腰臀比和握力表现出高度的通用性。了解PF的多功能性有助于优化预后模型的开发/性能。
{"title":"Quantifying the versatility of routinely measured prognostic factors.","authors":"Hamish Innes, Philip J Johnson","doi":"10.1186/s41512-025-00206-7","DOIUrl":"10.1186/s41512-025-00206-7","url":null,"abstract":"<p><strong>Background: </strong>Some prognostic factors (PF) are \"versatile\" insofar as they predict diverse health outcomes (age is an exemplar par excellence). In this study, we sought to quantify the versatility of commonly measured PFs.</p><p><strong>Methods: </strong>Participants from the UKB (UK biobank) study were followed from enrolment until the date of outcome or censoring. Over 800 incident adverse outcomes were considered, based on a unique 3-digit ICD code (A00, A01, A02, etc.). Twenty-four routine PFs-including renal, liver function and blood count biomarkers-featured in this analysis. Cox regression was used to determine the association between each PF with time to each outcome event. The number of statistically significant associations, direction of the association (positive/negative) and the median log hazard ratio (LHR) were determined for each PF. Data were visualised using Volcano plots.</p><p><strong>Results: </strong>The analysis included up to 502,408 UKB participants followed for 12.4 years. PFs with the greatest number of statistically significant associations were age (563/836; median LHR: 0.47), waist-hip ratio (530/836; LHR: 0.37) and hand-grip strength (416/836; LHR: 0.27). Conversely, PFs with the lowest number of significant associations were diastolic blood pressure (138/835; LHR: 0.11) and total protein (138/835; LHR: 0.11). Positive correlation was observed between the number of events a PF was associated with and the average effect size for those associations.</p><p><strong>Conclusion: </strong>A wide spectrum exists between the most and least versatile PFs. In addition to age, waist-hip ratio and handgrip strength exhibit high versatility. Understanding PF versatility has implications for optimising the development/performance of prognostic models.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"25"},"PeriodicalIF":2.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12676796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145673009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Why we do need explainable AI for healthcare. 为什么我们在医疗领域需要可解释的人工智能。
IF 2.6 Pub Date : 2025-12-02 DOI: 10.1186/s41512-025-00209-4
Giovanni Cinà, Tabea E Röber, Rob Goedhart, Ş İlker Birbil

The recent uptake in certified Artificial Intelligence (AI) tools for healthcare applications has renewed the debate around their adoption. Explainable AI, the sub-discipline promising to render AI devices more transparent and trustworthy, has also come under scrutiny as part of this discussion. Some experts in the medical AI space debate the reliability of Explainable AI techniques, expressing concerns on their use and inclusion in guidelines and standards. Revisiting such criticisms, this article offers a balanced perspective on the utility of Explainable AI, focusing on the specificity of clinical applications of AI and placing them in the context of healthcare interventions. Against its detractors and despite valid concerns, we argue that the Explainable AI research program is still central to human-machine interaction and ultimately a useful tool against loss of control, a danger that cannot be prevented by rigorous clinical validation alone.

最近,经过认证的人工智能(AI)工具在医疗保健应用中的应用再次引发了围绕它们的采用的争论。可解释的人工智能(可解释的人工智能)是一门有望使人工智能设备更加透明和可信的分支学科,作为讨论的一部分,它也受到了密切关注。医疗人工智能领域的一些专家对可解释人工智能技术的可靠性进行了辩论,对其使用和纳入准则和标准表示关切。本文回顾了这些批评,对可解释人工智能的效用提供了一个平衡的视角,重点关注人工智能临床应用的特殊性,并将其置于医疗保健干预的背景下。尽管存在合理的担忧,但我们认为,可解释的人工智能研究项目仍然是人机交互的核心,并最终成为防止失控的有用工具,而这种危险仅靠严格的临床验证是无法预防的。
{"title":"Why we do need explainable AI for healthcare.","authors":"Giovanni Cinà, Tabea E Röber, Rob Goedhart, Ş İlker Birbil","doi":"10.1186/s41512-025-00209-4","DOIUrl":"10.1186/s41512-025-00209-4","url":null,"abstract":"<p><p>The recent uptake in certified Artificial Intelligence (AI) tools for healthcare applications has renewed the debate around their adoption. Explainable AI, the sub-discipline promising to render AI devices more transparent and trustworthy, has also come under scrutiny as part of this discussion. Some experts in the medical AI space debate the reliability of Explainable AI techniques, expressing concerns on their use and inclusion in guidelines and standards. Revisiting such criticisms, this article offers a balanced perspective on the utility of Explainable AI, focusing on the specificity of clinical applications of AI and placing them in the context of healthcare interventions. Against its detractors and despite valid concerns, we argue that the Explainable AI research program is still central to human-machine interaction and ultimately a useful tool against loss of control, a danger that cannot be prevented by rigorous clinical validation alone.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"24"},"PeriodicalIF":2.6,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of clinical prediction models for chronic kidney disease among people with diabetes: external validation using the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). 糖尿病患者慢性肾脏疾病的临床预测模型的性能:使用加拿大初级保健哨点监测网络(cpcsn)的外部验证
IF 2.6 Pub Date : 2025-11-11 DOI: 10.1186/s41512-025-00208-5
Jason E Black, David Jt Campbell, Paul E Ronksley, Kerry A McBrien, Tyler S Williamson

Background: Several clinical prediction models that predict the risk of chronic kidney disease (CKD) in people with diabetes have been developed; however, these models lack external validation demonstrating accurate predictions in Canadian primary care. We externally validated existing clinical prediction models for CKD in Canadian primary care data, overall and across subgroups defined by sex/gender, age, comorbidities, and neighbourhood-level deprivation.

Methods: We conducted a retrospective cohort study using data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) electronic medical record database (2014-2019). We identified models that use demographic, health behaviour, clinical and diabetes-related characteristics to predict incident CKD based on two recent systematic reviews and included models with sufficient predictors in CPCSSN (≤ 1 unavailable) and eGFR-based CKD definitions. We included adult patients (18 +) with diabetes without an existing diagnosis of CKD. We identified incident cases of CKD within 5 years based on ≥ 2 laboratory values corresponding to eGFR < 60 mL/min/1.73 m2 separated by ≥ 90 days and ≤ 1 year. For each model, we estimated the discrimination, precision, recall, and calibration within CPCSSN.

Results: Among 37,604 patients with diabetes, 14.6% met diagnostic criteria for CKD within 5 years. Overall performance of the 13 included CKD prediction models in CPCCSN was mixed: three models displayed moderate to strong discrimination (areas under the receiver-operating characteristic curves [AUROCs] > 0.70), whereas other AUROCs were as Low as 0.508. After model updating, calibrations were heterogeneous with most models displaying some miscalibration. Some subgroups displayed considerable differences in performance: discriminative performance (AUROC) declined with increasing age and number of comorbidities, whereas the precision and recall improved with increasing age and number of comorbidities. We observed no difference in performance according to sex/gender or deprivation quintile.

Conclusions: Three models displayed moderate to strong performance predicting CKD among CPCSSN patients. Next, these models should be evaluated for their impact on practitioner and patient outcomes when implemented in clinical practice. If successful, these models hold promise in achieving widespread adoption to help identify those at highest risk of CKD and guide therapies that may prevent or delay CKD and related sequelae (e.g., end-stage renal disease) among people with diabetes.

背景:已经开发了几种预测糖尿病患者慢性肾脏疾病(CKD)风险的临床预测模型;然而,这些模型缺乏外部验证,证明在加拿大初级保健准确预测。我们从外部验证了加拿大初级保健数据中现有的CKD临床预测模型,包括总体和跨亚组,这些亚组由性别/性别、年龄、合并症和邻里贫困定义。方法:采用加拿大初级保健哨点监测网络(cpcsn)电子病历数据库(2014-2019)的数据进行回顾性队列研究。根据最近的两篇系统综述,我们确定了使用人口统计学、健康行为、临床和糖尿病相关特征来预测CKD事件的模型,并纳入了在cpcsn(≤1不可用)和基于egfr的CKD定义中具有足够预测因子的模型。我们纳入了没有CKD诊断的成年糖尿病患者(18岁以上)。我们根据eGFR 2的≥2个实验室值(间隔≥90天和≤1年)来确定5年内CKD的事件病例。对于每个模型,我们估计了cpcsn内的辨别率、精度、召回率和校准。结果:在37604例糖尿病患者中,14.6%的患者在5年内符合CKD诊断标准。在CPCCSN中,13个纳入的CKD预测模型的总体表现好坏参半:3个模型表现出中等到强烈的辨别能力(接受者-工作特征曲线下的区域[auroc] bb0 0.70),而其他auroc低至0.508。模型更新后,校正结果存在异质性,多数模型存在一定的误差。一些亚组在表现上表现出相当大的差异:鉴别表现(AUROC)随着年龄和合并症数量的增加而下降,而精确度和召回率随着年龄和合并症数量的增加而提高。我们观察到,根据性别/性别或剥夺五分位数,表现没有差异。结论:三种模型对cpcsn患者CKD的预测表现为中强。接下来,在临床实践中实施这些模型时,应该评估它们对医生和患者结果的影响。如果成功,这些模型有望被广泛采用,以帮助识别CKD风险最高的人群,并指导糖尿病患者预防或延迟CKD及其相关后遗症(如终末期肾脏疾病)的治疗。
{"title":"Performance of clinical prediction models for chronic kidney disease among people with diabetes: external validation using the Canadian Primary Care Sentinel Surveillance Network (CPCSSN).","authors":"Jason E Black, David Jt Campbell, Paul E Ronksley, Kerry A McBrien, Tyler S Williamson","doi":"10.1186/s41512-025-00208-5","DOIUrl":"10.1186/s41512-025-00208-5","url":null,"abstract":"<p><strong>Background: </strong>Several clinical prediction models that predict the risk of chronic kidney disease (CKD) in people with diabetes have been developed; however, these models lack external validation demonstrating accurate predictions in Canadian primary care. We externally validated existing clinical prediction models for CKD in Canadian primary care data, overall and across subgroups defined by sex/gender, age, comorbidities, and neighbourhood-level deprivation.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study using data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) electronic medical record database (2014-2019). We identified models that use demographic, health behaviour, clinical and diabetes-related characteristics to predict incident CKD based on two recent systematic reviews and included models with sufficient predictors in CPCSSN (≤ 1 unavailable) and eGFR-based CKD definitions. We included adult patients (18 +) with diabetes without an existing diagnosis of CKD. We identified incident cases of CKD within 5 years based on ≥ 2 laboratory values corresponding to eGFR < 60 mL/min/1.73 m<sup>2</sup> separated by ≥ 90 days and ≤ 1 year. For each model, we estimated the discrimination, precision, recall, and calibration within CPCSSN.</p><p><strong>Results: </strong>Among 37,604 patients with diabetes, 14.6% met diagnostic criteria for CKD within 5 years. Overall performance of the 13 included CKD prediction models in CPCCSN was mixed: three models displayed moderate to strong discrimination (areas under the receiver-operating characteristic curves [AUROCs] > 0.70), whereas other AUROCs were as Low as 0.508. After model updating, calibrations were heterogeneous with most models displaying some miscalibration. Some subgroups displayed considerable differences in performance: discriminative performance (AUROC) declined with increasing age and number of comorbidities, whereas the precision and recall improved with increasing age and number of comorbidities. We observed no difference in performance according to sex/gender or deprivation quintile.</p><p><strong>Conclusions: </strong>Three models displayed moderate to strong performance predicting CKD among CPCSSN patients. Next, these models should be evaluated for their impact on practitioner and patient outcomes when implemented in clinical practice. If successful, these models hold promise in achieving widespread adoption to help identify those at highest risk of CKD and guide therapies that may prevent or delay CKD and related sequelae (e.g., end-stage renal disease) among people with diabetes.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"26"},"PeriodicalIF":2.6,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12604207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145490972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Will large language models transform clinical prediction? 大型语言模型会改变临床预测吗?
IF 2.6 Pub Date : 2025-11-06 DOI: 10.1186/s41512-025-00211-w
Yusuf Yildiz, Goran Nenadic, Meghna Jani, David A Jenkins

Objective: Large language models (LLMs) are attracting increasing interest in healthcare. This commentary evaluates the potential of LLMs to improve clinical prediction models (CPMs) for diagnostic and prognostic tasks, with a focus on their ability to process longitudinal electronic health record (EHR) data.

Findings: LLMs show promise in handling multimodal and longitudinal EHR data and can support multi-outcome predictions for diverse health conditions. However, methodological, validation, infrastructural, and regulatory challenges remain. These include inadequate methods for time-to-event modelling, poor calibration of predictions, limited external validation, and bias affecting underrepresented groups. High infrastructure costs and the absence of clear regulatory frameworks further prevent adoption.

Implications: Further work and interdisciplinary collaboration are needed to support equitable and effective integration into the clinical prediction. Developing temporally aware, fair, and explainable models should be a priority focus for transforming clinical prediction workflow.

目的:大型语言模型(LLMs)在医疗保健领域吸引了越来越多的兴趣。这篇评论评估了llm在改善临床预测模型(cpm)诊断和预后任务方面的潜力,重点是llm处理纵向电子健康记录(EHR)数据的能力。研究结果:llm在处理多模式和纵向电子病历数据方面表现出希望,并且可以支持对不同健康状况的多结果预测。然而,方法、验证、基础设施和监管方面的挑战仍然存在。这些问题包括时间到事件建模方法不充分、预测校准不良、外部验证有限以及影响代表性不足群体的偏见。高昂的基础设施成本和缺乏明确的监管框架进一步阻碍了采用。意义:需要进一步的工作和跨学科合作来支持公平有效地整合到临床预测中。开发具有时效性、公平性和可解释性的模型应该是转变临床预测工作流程的优先重点。
{"title":"Will large language models transform clinical prediction?","authors":"Yusuf Yildiz, Goran Nenadic, Meghna Jani, David A Jenkins","doi":"10.1186/s41512-025-00211-w","DOIUrl":"10.1186/s41512-025-00211-w","url":null,"abstract":"<p><strong>Objective: </strong>Large language models (LLMs) are attracting increasing interest in healthcare. This commentary evaluates the potential of LLMs to improve clinical prediction models (CPMs) for diagnostic and prognostic tasks, with a focus on their ability to process longitudinal electronic health record (EHR) data.</p><p><strong>Findings: </strong>LLMs show promise in handling multimodal and longitudinal EHR data and can support multi-outcome predictions for diverse health conditions. However, methodological, validation, infrastructural, and regulatory challenges remain. These include inadequate methods for time-to-event modelling, poor calibration of predictions, limited external validation, and bias affecting underrepresented groups. High infrastructure costs and the absence of clear regulatory frameworks further prevent adoption.</p><p><strong>Implications: </strong>Further work and interdisciplinary collaboration are needed to support equitable and effective integration into the clinical prediction. Developing temporally aware, fair, and explainable models should be a priority focus for transforming clinical prediction workflow.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"28"},"PeriodicalIF":2.6,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12590740/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Targeted test evaluation: five suggestions for refining the framework for designing diagnostic accuracy studies with clear study hypotheses. 有针对性的测试评估:五个建议,以完善框架设计诊断准确性研究明确的研究假设。
IF 2.6 Pub Date : 2025-11-04 DOI: 10.1186/s41512-025-00207-6
Werner Vach

Five years ago, Korevaar and colleagues proposed a framework for designing diagnostic accuracy studies, focusing on the definition of clear study hypotheses. This proposal filled a gap and was well received by the scientific community. In this commentary, I suggest five potential refinements. They aim at increasing the flexibility of the framework while pertaining its logical consistency. The refinements address the following five topics: (1) The relationship between minimal criteria and the choice of the null hypothesis region; (2) The potential to allow compensation between sensitivity and specificity; (3) The possibility to use other pairs than sensitivity and specificity; (4) The potential phrasing as an estimation problem; (5) The advantages of directly moving to a comparative accuracy study.

五年前,Korevaar及其同事提出了一个设计诊断准确性研究的框架,重点是明确研究假设的定义。这一建议填补了空白,并得到了科学界的好评。在这篇评论中,我提出了五个可能的改进。它们旨在增加框架的灵活性,同时保持其逻辑一致性。这些改进涉及以下五个主题:(1)最小标准与零假设区域选择之间的关系;(2)允许在敏感性和特异性之间进行补偿的潜力;(3)使用敏感性和特异性以外的其他配对的可能性;(4)作为估计问题的潜在措辞;(5)直接进行比较精度研究的优点。
{"title":"Targeted test evaluation: five suggestions for refining the framework for designing diagnostic accuracy studies with clear study hypotheses.","authors":"Werner Vach","doi":"10.1186/s41512-025-00207-6","DOIUrl":"10.1186/s41512-025-00207-6","url":null,"abstract":"<p><p>Five years ago, Korevaar and colleagues proposed a framework for designing diagnostic accuracy studies, focusing on the definition of clear study hypotheses. This proposal filled a gap and was well received by the scientific community. In this commentary, I suggest five potential refinements. They aim at increasing the flexibility of the framework while pertaining its logical consistency. The refinements address the following five topics: (1) The relationship between minimal criteria and the choice of the null hypothesis region; (2) The potential to allow compensation between sensitivity and specificity; (3) The possibility to use other pairs than sensitivity and specificity; (4) The potential phrasing as an estimation problem; (5) The advantages of directly moving to a comparative accuracy study.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"27"},"PeriodicalIF":2.6,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12584363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calibration of cause-specific absolute risk for external validation using each cause-specific hazards model in the presence of competing events. 在存在竞争事件的情况下,使用每个特定原因危害模型校准外部验证的特定原因绝对风险。
IF 2.6 Pub Date : 2025-10-14 DOI: 10.1186/s41512-025-00197-5
Sarwar I Mozumder, Sarah Booth, Richard D Riley, Mark J Rutherford, Paul C Lambert

Background: When developing/validating prognostic models, it is typical to assess calibration between predicted and observed risks - either in the development dataset or in an external sample. For competing risks data, correct specification of more than one model may be required to ensure well-calibrated predicted risks for the event of interest. Furthermore, interest may be in the predicted risks of the event of interest, competing events and all-causes. Therefore, calibration must be assessed simultaneously using various measures.

Methods: We focus on the calibration of prediction models for external validation using a cause-specific hazards approach. We propose that miscalibration for cause-specific hazard models be assessed using components specific to each model through the complement of the cause-specific survival alongside the assessment of the calibration of the cause-specific absolute risks. We simulated a range of scenarios to illustrate how to identify which model(s) are mis-specified in an external validation setting. Calibration plots and calibration statistics (calibration slope, calibration-in-the-large) are presented alongside performance measures such as the Brier score and Index of Prediction Accuracy. We use pseudo-observations to calculate observed risks and generate a smooth calibration curve with restricted cubic splines. We fitted flexible parametric survival models to the simulated data to flexibly estimate baseline cause-specific hazards for the prediction of individual cause-specific absolute risks.

Results: Our simulations illustrate that miscalibration due to changes in the baseline cause-specific hazards in external validation data is better identified using components from each cause-specific model. A mis-calibrated model on one cause could lead to poor calibration of the predicted absolute risks for each cause of interest, including the all-cause absolute risk. This is because prediction of a single cause-specific absolute risk is impacted by effects of variables on the cause of interest and competing events.

Conclusions: If accurate predictions for both all-cause and each cause-specific absolute risks are of interest, this is best achieved by developing and validating models via the cause-specific hazards approach. For each cause-specific model, researchers should evaluate calibration plots separately using the complement of the cause-specific survival function to reveal the cause of any miscalibration. However, this also requires careful consideration of dependent censoring which must be sufficiently accounted for.

背景:在开发/验证预后模型时,通常会评估预测风险和观察风险之间的校准——无论是在开发数据集中还是在外部样本中。对于竞争风险数据,可能需要多个模型的正确规范,以确保对感兴趣的事件进行良好校准的预测风险。此外,兴趣可能是对感兴趣的事件、竞争事件和所有原因的预测风险。因此,必须使用各种测量方法同时评估校准。方法:我们专注于使用特定原因危害方法对预测模型进行外部验证的校准。我们建议,在评估特定原因绝对风险校准的同时,通过补充特定原因生存期,使用每个模型的特定成分来评估特定原因风险模型的误校准。我们模拟了一系列场景,以说明如何识别在外部验证设置中错误指定了哪些模型。校准图和校准统计数据(校准斜率,校准-in-the-large)与Brier评分和预测精度指数等性能指标一起呈现。我们使用伪观测值来计算观测风险,并生成具有受限三次样条的光滑校准曲线。我们将灵活的参数生存模型拟合到模拟数据中,以灵活地估计基线原因特异性危险,以预测个体原因特异性绝对风险。结果:我们的模拟表明,使用每个原因特定模型的组件可以更好地识别外部验证数据中基线原因特定危害变化引起的误校准。对一个原因的错误校准模型可能导致对每个感兴趣的原因的预测绝对风险的错误校准,包括所有原因的绝对风险。这是因为对单一原因特定的绝对风险的预测受到对兴趣原因和竞争事件的变量效应的影响。结论:如果对所有原因和每个特定原因的绝对风险都有准确的预测,最好通过特定原因的危害方法开发和验证模型来实现。对于每个病因特异性模型,研究人员应分别使用病因特异性生存函数的补充来评估校准图,以揭示任何误校准的原因。然而,这也需要仔细考虑必须充分考虑的依赖审查。
{"title":"Calibration of cause-specific absolute risk for external validation using each cause-specific hazards model in the presence of competing events.","authors":"Sarwar I Mozumder, Sarah Booth, Richard D Riley, Mark J Rutherford, Paul C Lambert","doi":"10.1186/s41512-025-00197-5","DOIUrl":"10.1186/s41512-025-00197-5","url":null,"abstract":"<p><strong>Background: </strong>When developing/validating prognostic models, it is typical to assess calibration between predicted and observed risks - either in the development dataset or in an external sample. For competing risks data, correct specification of more than one model may be required to ensure well-calibrated predicted risks for the event of interest. Furthermore, interest may be in the predicted risks of the event of interest, competing events and all-causes. Therefore, calibration must be assessed simultaneously using various measures.</p><p><strong>Methods: </strong>We focus on the calibration of prediction models for external validation using a cause-specific hazards approach. We propose that miscalibration for cause-specific hazard models be assessed using components specific to each model through the complement of the cause-specific survival alongside the assessment of the calibration of the cause-specific absolute risks. We simulated a range of scenarios to illustrate how to identify which model(s) are mis-specified in an external validation setting. Calibration plots and calibration statistics (calibration slope, calibration-in-the-large) are presented alongside performance measures such as the Brier score and Index of Prediction Accuracy. We use pseudo-observations to calculate observed risks and generate a smooth calibration curve with restricted cubic splines. We fitted flexible parametric survival models to the simulated data to flexibly estimate baseline cause-specific hazards for the prediction of individual cause-specific absolute risks.</p><p><strong>Results: </strong>Our simulations illustrate that miscalibration due to changes in the baseline cause-specific hazards in external validation data is better identified using components from each cause-specific model. A mis-calibrated model on one cause could lead to poor calibration of the predicted absolute risks for each cause of interest, including the all-cause absolute risk. This is because prediction of a single cause-specific absolute risk is impacted by effects of variables on the cause of interest and competing events.</p><p><strong>Conclusions: </strong>If accurate predictions for both all-cause and each cause-specific absolute risks are of interest, this is best achieved by developing and validating models via the cause-specific hazards approach. For each cause-specific model, researchers should evaluate calibration plots separately using the complement of the cause-specific survival function to reveal the cause of any miscalibration. However, this also requires careful consideration of dependent censoring which must be sufficiently accounted for.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"23"},"PeriodicalIF":2.6,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12519608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data. 机器学习模型的反事实预测:使用多源数据进行模型开发和评估的可移植性和联合分析。
IF 2.6 Pub Date : 2025-10-02 DOI: 10.1186/s41512-025-00201-y
Sarah C Voter, Issa J Dahabreh, Christopher B Boyer, Habib Rahbar, Despina Kontos, Jon A Steingrimsson

Background: When a machine learning model is developed and evaluated in a setting where the treatment assignment process differs from the setting of intended model deployment, failure to account for this difference can lead to suboptimal model development and biased estimates of model performance.

Methods: We consider the setting where data from a randomized trial and an observational study emulating the trial are available for machine learning model development and evaluation. We provide two approaches for estimating the model and assessing model performance under a hypothetical treatment strategy in the target population underlying the observational study. The first approach uses counterfactual predictions from the observational study only and relies on the assumption of conditional exchangeability between treated and untreated individuals (no unmeasured confounding). The second approach leverages the exchangeability between treatment groups in the trial (supported by study design) to "transport" estimates from the trial to the population underlying the observational study, relying on an additional assumption of conditional exchangeability between the populations underlying the observational study and the randomized trial.

Results: We examine the assumptions underlying both approaches for fitting the model and estimating performance in the target population and provide estimators for both objectives. We then develop a joint estimation strategy that combines data from the trial and the observational study, and discuss benchmarking of the trial and observational results.

Conclusions: Both the observational and transportability analyses can be used to fit a model and estimate performance under a counterfactual treatment strategy in the population underlying the observational data, but they rely on different assumptions. In either case, the assumptions are untestable, and deciding which method is more appropriate requires careful contextual consideration. If all assumptions hold, then combining the data from the observational study and the randomized trial can be used for more efficient estimation.

背景:当机器学习模型在处理分配过程与预期模型部署设置不同的环境中开发和评估时,未能考虑到这种差异可能导致模型开发次优和模型性能估计偏差。方法:我们考虑随机试验和模拟试验的观察性研究的数据可用于机器学习模型开发和评估的设置。我们提供了两种方法来估计模型和评估模型在观察性研究的目标人群中假设治疗策略下的性能。第一种方法仅使用来自观察性研究的反事实预测,并依赖于治疗个体和未治疗个体之间条件互换性的假设(没有未测量的混淆)。第二种方法利用试验中治疗组之间的互换性(由研究设计支持),将估计从试验“传递”到观察性研究的基础人群,依赖于观察性研究和随机试验基础人群之间条件互换性的额外假设。结果:我们检验了拟合模型和估计目标人群表现的两种方法的假设,并为这两个目标提供了估计器。然后,我们开发了一种联合估计策略,结合了试验和观察性研究的数据,并讨论了试验和观察结果的基准。结论:观察性分析和可转运性分析都可以用来拟合模型,并在观察数据基础上的人群中估计反事实治疗策略下的表现,但它们依赖于不同的假设。在任何一种情况下,假设都是不可测试的,并且决定哪种方法更合适需要仔细考虑上下文。如果所有的假设都成立,那么结合观察性研究和随机试验的数据可以用于更有效的估计。
{"title":"Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data.","authors":"Sarah C Voter, Issa J Dahabreh, Christopher B Boyer, Habib Rahbar, Despina Kontos, Jon A Steingrimsson","doi":"10.1186/s41512-025-00201-y","DOIUrl":"10.1186/s41512-025-00201-y","url":null,"abstract":"<p><strong>Background: </strong>When a machine learning model is developed and evaluated in a setting where the treatment assignment process differs from the setting of intended model deployment, failure to account for this difference can lead to suboptimal model development and biased estimates of model performance.</p><p><strong>Methods: </strong>We consider the setting where data from a randomized trial and an observational study emulating the trial are available for machine learning model development and evaluation. We provide two approaches for estimating the model and assessing model performance under a hypothetical treatment strategy in the target population underlying the observational study. The first approach uses counterfactual predictions from the observational study only and relies on the assumption of conditional exchangeability between treated and untreated individuals (no unmeasured confounding). The second approach leverages the exchangeability between treatment groups in the trial (supported by study design) to \"transport\" estimates from the trial to the population underlying the observational study, relying on an additional assumption of conditional exchangeability between the populations underlying the observational study and the randomized trial.</p><p><strong>Results: </strong>We examine the assumptions underlying both approaches for fitting the model and estimating performance in the target population and provide estimators for both objectives. We then develop a joint estimation strategy that combines data from the trial and the observational study, and discuss benchmarking of the trial and observational results.</p><p><strong>Conclusions: </strong>Both the observational and transportability analyses can be used to fit a model and estimate performance under a counterfactual treatment strategy in the population underlying the observational data, but they rely on different assumptions. In either case, the assumptions are untestable, and deciding which method is more appropriate requires careful contextual consideration. If all assumptions hold, then combining the data from the observational study and the randomized trial can be used for more efficient estimation.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"22"},"PeriodicalIF":2.6,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490139/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting venous thromboembolism among hospitalized adults: a protocol for development and validation of an implementable real-time prognostic model. 预测住院成人静脉血栓栓塞:开发和验证可实现的实时预后模型的协议。
IF 2.6 Pub Date : 2025-09-08 DOI: 10.1186/s41512-025-00205-8
Henry J Domenico, Benjamin F Tillman, Shari L Just, Yeji Ko, Amanda S Mixon, Asli Weitkamp, Jonathan S Schildcrout, Colin Walsh, Thomas Ortel, Benjamin French

Background: Hospital-acquired venous thromboembolism (HA-VTE) is a leading cause of morbidity and mortality among hospitalized adults. Numerous prognostic models have been developed to identify those patients with elevated risk of HA-VTE. None, however, has met the necessary criteria to guide clinical decision-making. This study outlines a protocol for refining and validating a general-purpose prognostic model for HA-VTE, designed for real-time automation within the electronic health record (EHR) system.

Methods: A retrospective cohort of 132,561 inpatient encounters (89,586 individual patients) at a large academic medical center will be collected, along with clinical and demographic data available as part of routine care. Data for temporal, geographic, and domain external validation cohorts will also be collected. Logistic regression will be used to predict occurrence of HA-VTE during an inpatient encounter. Variables considered for model inclusion will be based on prior demonstrated association with HA-VTE and their availability in both retrospective EHR data and routine clinical care. Least absolute shrinkage and selection operator (LASSO) with tenfold cross-validation will be used for initial variable selection. Variables selected by the LASSO procedure, along with those deemed necessary by clinicians, will be used in an unpenalized multivariable logistic regression model. Discrimination and calibration will be reported for the derivation and validation cohorts. Discrimination will be measured using Harrell's C statistic. Calibration will be measured using calibration intercept, calibration slope, Brier score, integrated calibration index, and visual examination of non-linear calibration curve. Model reporting will adhere to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guidelines for clinical prediction models using machine learning methods (TRIPOD + AI).

Discussion: We describe methods for developing, evaluating, and validating a prognostic model for HA-VTE using routinely collected EHR data. By combining best practices in statistical development and validation, knowledge engineering, and clinical domain knowledge, the resulting model should be well suited for real-time clinical implementation. Although this protocol describes our development of a model for HA-VTE, the general approach can be applied to other clinical outcomes.

背景:医院获得性静脉血栓栓塞(HA-VTE)是住院成人发病率和死亡率的主要原因。已经开发了许多预后模型来识别HA-VTE风险升高的患者。然而,没有一个达到指导临床决策的必要标准。本研究概述了一种改进和验证HA-VTE通用预后模型的方案,该模型设计用于电子健康记录(EHR)系统中的实时自动化。方法:在一个大型学术医疗中心收集132561例住院患者(89586例个体患者)的回顾性队列,以及作为常规护理一部分的临床和人口统计数据。还将收集时间、地理和领域外部验证队列的数据。逻辑回归将用于预测住院期间HA-VTE的发生。考虑纳入模型的变量将基于先前证明的与HA-VTE的关联,以及它们在回顾性电子病历数据和常规临床护理中的可用性。最小绝对收缩和选择算子(LASSO)与十倍交叉验证将用于初始变量选择。LASSO程序选择的变量,以及临床医生认为必要的变量,将用于无惩罚的多变量逻辑回归模型。将报告衍生和验证队列的鉴别和校准。歧视将使用Harrell的C统计量来衡量。校准将使用校准截距、校准斜率、Brier评分、综合校准指数和非线性校准曲线的目视检查来测量。模型报告将遵循使用机器学习方法(TRIPOD + AI)的临床预测模型的透明报告(Transparent reporting of a multivariable prediction Model for Individual Prognosis Or Diagnosis)指南。讨论:我们描述了利用常规收集的电子病历数据开发、评估和验证HA-VTE预后模型的方法。通过结合统计开发和验证、知识工程和临床领域知识的最佳实践,得到的模型应该非常适合于实时临床实现。虽然该方案描述了我们对HA-VTE模型的开发,但一般方法可以应用于其他临床结果。
{"title":"Predicting venous thromboembolism among hospitalized adults: a protocol for development and validation of an implementable real-time prognostic model.","authors":"Henry J Domenico, Benjamin F Tillman, Shari L Just, Yeji Ko, Amanda S Mixon, Asli Weitkamp, Jonathan S Schildcrout, Colin Walsh, Thomas Ortel, Benjamin French","doi":"10.1186/s41512-025-00205-8","DOIUrl":"10.1186/s41512-025-00205-8","url":null,"abstract":"<p><strong>Background: </strong>Hospital-acquired venous thromboembolism (HA-VTE) is a leading cause of morbidity and mortality among hospitalized adults. Numerous prognostic models have been developed to identify those patients with elevated risk of HA-VTE. None, however, has met the necessary criteria to guide clinical decision-making. This study outlines a protocol for refining and validating a general-purpose prognostic model for HA-VTE, designed for real-time automation within the electronic health record (EHR) system.</p><p><strong>Methods: </strong>A retrospective cohort of 132,561 inpatient encounters (89,586 individual patients) at a large academic medical center will be collected, along with clinical and demographic data available as part of routine care. Data for temporal, geographic, and domain external validation cohorts will also be collected. Logistic regression will be used to predict occurrence of HA-VTE during an inpatient encounter. Variables considered for model inclusion will be based on prior demonstrated association with HA-VTE and their availability in both retrospective EHR data and routine clinical care. Least absolute shrinkage and selection operator (LASSO) with tenfold cross-validation will be used for initial variable selection. Variables selected by the LASSO procedure, along with those deemed necessary by clinicians, will be used in an unpenalized multivariable logistic regression model. Discrimination and calibration will be reported for the derivation and validation cohorts. Discrimination will be measured using Harrell's C statistic. Calibration will be measured using calibration intercept, calibration slope, Brier score, integrated calibration index, and visual examination of non-linear calibration curve. Model reporting will adhere to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guidelines for clinical prediction models using machine learning methods (TRIPOD + AI).</p><p><strong>Discussion: </strong>We describe methods for developing, evaluating, and validating a prognostic model for HA-VTE using routinely collected EHR data. By combining best practices in statistical development and validation, knowledge engineering, and clinical domain knowledge, the resulting model should be well suited for real-time clinical implementation. Although this protocol describes our development of a model for HA-VTE, the general approach can be applied to other clinical outcomes.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"19"},"PeriodicalIF":2.6,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416065/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and internal validation of a prediction model for post-COVID-19 condition 2 years after infection-results of the CORFU study. 感染后2年covid -19后病情预测模型的开发和内部验证- CORFU研究结果
IF 2.6 Pub Date : 2025-09-01 DOI: 10.1186/s41512-025-00203-w
Dorthe Odyl Klein, Nick Wilmes, Sophie F Waardenburg, Gouke J Bonsel, Erwin Birnie, Marieke Sjn Wintjens, Stella Cm Heemskerk, Emma Bnj Janssen, Chahinda Ghossein-Doha, Michiel C Warlé, Lotte Mc Jacobs, Bea Hemmen, Jeanine A Verbunt, Bas Ct van Bussel, Susanne van Santen, Bas Ljh Kietselaer, Gwyneth Jansen, Folkert W Asselbergs, Marijke Linschoten, Juanita A Haagsma, S M J van Kuijk

Background: A subset of COVID-19 patients develops post-COVID-19 condition (PCC). This condition results in disability in numerous areas of patients' lives and a reduced health-related quality of life, with societal impact including work absences and increased healthcare utilization. There is a scarcity of models predicting PCC, especially those considering the severity of the initial severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and incorporating long-term follow-up data. Therefore, we developed and internally validated a prediction model for PCC 2 years after SARS-CoV-2 infection in a cohort of COVID-19 patients.

Methods: Data from the CORona Follow-Up (CORFU) study were used. This research initiative integrated data from multiple Dutch COVID-19 cohort studies. We utilized 2-year follow-up data collected via the questionnaires between October 1st of 2021 and December 31st of 2022. Participants were former COVID-19 patients, approximately 2-year post-SARS-CoV-2 infection. Candidate predictors were selected based on literature and availability across cohorts. The outcome of interest was the prevalence of PCC at 2 years after the initial infection. Logistic regression with backward stepwise elimination identified significant predictors such as sex, BMI and initial disease severity. The model was internally validated using bootstrapping. Model performance was quantified as model fit, discrimination and calibration.

Results: In total 904 former COVID-19 patients were included in the analysis. The cohort included 146 (16.2%) non-hospitalized patients, 511 (56.5%) ward admitted patients, and 247 (27.3%) intensive care unit (ICU) admitted patients. Of all participants, 551 (61.0%) participants suffered from PCC. We included 20 candidate predictors in the multivariable analysis. The final model, after backward elimination, identified sex, body mass index (BMI), ward admission, ICU admission, and comorbidities such as arrhythmia, asthma, angina pectoris, previous stroke, hernia, osteoarthritis, and rheumatoid arthritis as predictors of post-COVID-19 condition. Nagelkerke's R-squared value for the model was 0.19. The optimism-adjusted AUC was 71.2%, and calibration was good across predicted probabilities.

Conclusions: This internally validated prediction model demonstrated moderate discriminative ability to predict PCC 2 years after COVID-19 based on sex, BMI, initial disease severity, and a collection of comorbidities.

背景:一部分COVID-19患者出现了COVID-19后病情(PCC)。这种情况导致患者在生活的许多方面残疾,并降低与健康有关的生活质量,其社会影响包括缺勤和医疗保健使用率增加。目前缺乏预测PCC的模型,特别是那些考虑到初始严重急性呼吸综合征冠状病毒2 (SARS-CoV-2)感染严重程度并纳入长期随访数据的模型。因此,我们在一组COVID-19患者中开发并内部验证了SARS-CoV-2感染2年后PCC的预测模型。方法:采用冠状病毒随访(CORFU)研究数据。这项研究计划整合了多项荷兰COVID-19队列研究的数据。我们使用了从2021年10月1日至2022年12月31日通过问卷收集的2年随访数据。参与者是前COVID-19患者,大约在sars - cov -2感染后2年。候选预测因子是根据文献和整个队列的可用性来选择的。研究的结果是首次感染后2年的PCC患病率。采用反向逐步消除的逻辑回归确定了显著的预测因素,如性别、BMI和初始疾病严重程度。采用自举法对模型进行内部验证。将模型性能量化为模型拟合、判别和校准。结果:共904例新冠肺炎患者纳入分析。该队列包括146例(16.2%)非住院患者,511例(56.5%)住院患者和247例(27.3%)重症监护病房(ICU)住院患者。在所有参与者中,551名(61.0%)参与者患有PCC。我们在多变量分析中纳入了20个候选预测因子。最终的模型,在反向排除后,确定性别、体重指数(BMI)、病房入住情况、ICU入住情况以及合并症,如心律失常、哮喘、心绞痛、既往中风、疝气、骨关节炎和类风湿性关节炎作为covid -19后病情的预测因素。该模型的Nagelkerke的r平方值为0.19。乐观调整的AUC为71.2%,在预测概率范围内校准良好。结论:该内部验证的预测模型基于性别、BMI、初始疾病严重程度和合共病的集合,显示出中度判别能力,可以预测COVID-19后2年的PCC。
{"title":"Development and internal validation of a prediction model for post-COVID-19 condition 2 years after infection-results of the CORFU study.","authors":"Dorthe Odyl Klein, Nick Wilmes, Sophie F Waardenburg, Gouke J Bonsel, Erwin Birnie, Marieke Sjn Wintjens, Stella Cm Heemskerk, Emma Bnj Janssen, Chahinda Ghossein-Doha, Michiel C Warlé, Lotte Mc Jacobs, Bea Hemmen, Jeanine A Verbunt, Bas Ct van Bussel, Susanne van Santen, Bas Ljh Kietselaer, Gwyneth Jansen, Folkert W Asselbergs, Marijke Linschoten, Juanita A Haagsma, S M J van Kuijk","doi":"10.1186/s41512-025-00203-w","DOIUrl":"10.1186/s41512-025-00203-w","url":null,"abstract":"<p><strong>Background: </strong>A subset of COVID-19 patients develops post-COVID-19 condition (PCC). This condition results in disability in numerous areas of patients' lives and a reduced health-related quality of life, with societal impact including work absences and increased healthcare utilization. There is a scarcity of models predicting PCC, especially those considering the severity of the initial severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and incorporating long-term follow-up data. Therefore, we developed and internally validated a prediction model for PCC 2 years after SARS-CoV-2 infection in a cohort of COVID-19 patients.</p><p><strong>Methods: </strong>Data from the CORona Follow-Up (CORFU) study were used. This research initiative integrated data from multiple Dutch COVID-19 cohort studies. We utilized 2-year follow-up data collected via the questionnaires between October 1st of 2021 and December 31st of 2022. Participants were former COVID-19 patients, approximately 2-year post-SARS-CoV-2 infection. Candidate predictors were selected based on literature and availability across cohorts. The outcome of interest was the prevalence of PCC at 2 years after the initial infection. Logistic regression with backward stepwise elimination identified significant predictors such as sex, BMI and initial disease severity. The model was internally validated using bootstrapping. Model performance was quantified as model fit, discrimination and calibration.</p><p><strong>Results: </strong>In total 904 former COVID-19 patients were included in the analysis. The cohort included 146 (16.2%) non-hospitalized patients, 511 (56.5%) ward admitted patients, and 247 (27.3%) intensive care unit (ICU) admitted patients. Of all participants, 551 (61.0%) participants suffered from PCC. We included 20 candidate predictors in the multivariable analysis. The final model, after backward elimination, identified sex, body mass index (BMI), ward admission, ICU admission, and comorbidities such as arrhythmia, asthma, angina pectoris, previous stroke, hernia, osteoarthritis, and rheumatoid arthritis as predictors of post-COVID-19 condition. Nagelkerke's R-squared value for the model was 0.19. The optimism-adjusted AUC was 71.2%, and calibration was good across predicted probabilities.</p><p><strong>Conclusions: </strong>This internally validated prediction model demonstrated moderate discriminative ability to predict PCC 2 years after COVID-19 based on sex, BMI, initial disease severity, and a collection of comorbidities.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"18"},"PeriodicalIF":2.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400538/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calibrating multiplex serology for Helicobacter pylori. 幽门螺杆菌多重血清学校正。
IF 2.6 Pub Date : 2025-08-11 DOI: 10.1186/s41512-025-00202-x
Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki

Background: Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.

Methods: In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.

Results: All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.

Conclusions: Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.

背景:幽门螺杆菌(Helicobacter pylori, H. pylori)是一种定植于胃部的细菌,是胃癌的主要危险因素,据估计,全世界89%的非贲门性胃癌病例可归因于幽门螺杆菌。前瞻性研究为量化胃癌和幽门螺杆菌之间的关系提供了可靠的证据,因为它们规避了因癌症发展前抗体水平可能降低而导致假阴性的风险。方法:在中国嘉道理生物库的一项大规模前瞻性研究中,幽门螺旋杆菌感染被分析为胃癌的危险因素。感染的存在通常通过血清学测试来确定。免疫印迹试验虽然已经建立,但比其他高通量多重血清学试验需要更多的劳动强度和更多的血浆。免疫印迹输出二元阳性/阴性血清状态分类,而多元输出连续抗原测量的载体。当将这种多维连续测量映射到二元分类时,在定义分类截止点和考虑不同抗原提供的感染证据的差异方面出现了统计上的挑战。我们讨论了这些挑战,并提出了一种新的解决方案,利用分类算法(贝叶斯加性回归树(BART)、多维单调BART、逻辑回归、随机森林和弹性网络),将多重血清学的连续测量结果优化转化为幽门螺杆菌感染的概率。我们(i)校准和应用分类模型来预测多重测量下幽门螺杆菌感染的概率,(ii)使用免疫印迹作为参考比较模型的预测性能,(iii)讨论预测性能差异的原因,(iv)应用校准模型来深入了解各种抗原提供的感染证据的相对优势。结果:所有模型均显示出较高的判别能力,对训练和测试数据的曲线下面积(AUC)估计至少为95%。模型在训练数据和测试数据上的性能没有显著差异。结论:分类算法可用于校正中国嘉道理生物库的多重幽门螺杆菌血清学检测和免疫印迹检测。这项研究进一步加深了我们对分类算法在血清学测试中的适用性的理解。
{"title":"Calibrating multiplex serology for Helicobacter pylori.","authors":"Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki","doi":"10.1186/s41512-025-00202-x","DOIUrl":"10.1186/s41512-025-00202-x","url":null,"abstract":"<p><strong>Background: </strong>Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.</p><p><strong>Methods: </strong>In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.</p><p><strong>Results: </strong>All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.</p><p><strong>Conclusions: </strong>Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"17"},"PeriodicalIF":2.6,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144818449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Diagnostic and prognostic research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1