首页 > 最新文献

Diagnostic and prognostic research最新文献

英文 中文
The continuous net benefit: assessing the clinical utility of prediction models when informing a continuum of decisions. 持续的净收益:评估预测模型的临床效用时,通知一个连续的决定。
IF 2.6 Pub Date : 2026-02-17 DOI: 10.1186/s41512-026-00224-z
Jose Benitez-Aurioles, Laure Wynants, Niels Peek, Patrick Goodley, Philip Crosbie, Matthew Sperrin
{"title":"The continuous net benefit: assessing the clinical utility of prediction models when informing a continuum of decisions.","authors":"Jose Benitez-Aurioles, Laure Wynants, Niels Peek, Patrick Goodley, Philip Crosbie, Matthew Sperrin","doi":"10.1186/s41512-026-00224-z","DOIUrl":"https://doi.org/10.1186/s41512-026-00224-z","url":null,"abstract":"","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"8"},"PeriodicalIF":2.6,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146215067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of violation of the proportional hazards assumption on the discrimination of the Cox proportional hazards model. 违反比例风险假设对Cox比例风险模型判别的影响。
IF 2.6 Pub Date : 2026-02-12 DOI: 10.1186/s41512-026-00223-0
Peter C Austin, Daniele Giardiello

Background: The Cox proportional hazards regression model is frequently used to estimate an individual's probability of experiencing an outcome within a specified prediction horizon. A key assumption of this model is that of proportional hazards. An important component of validating a prediction model is assessing its discrimination. Discrimination refers to the ability of predicted risk to separate those who do and do not experience the event. The impact of violation of the proportional hazards assumption on the discrimination of risk estimates obtained from a Cox model has not been examined.

Methods: We used Monte Carlo simulations to assess the impact of the magnitude of the violation of the proportional hazards assumption on the discrimination of a Cox model as assessed using the time-varying area under the curve and on predictive accuracy as assessed using the time-varying index of predictive accuracy.

Results: Compared to settings in which the proportional hazards assumption was satisfied, discrimination and predictive accuracy decreased in settings in which the log-hazard ratio was positively associated with time. Conversely, compared to settings in which the proportional hazards assumption was satisfied, discrimination and predictive accuracy increased in settings in which the log-hazard ratio was negatively associated with time. Compared with the use of a Cox regression model, the use of accelerated failure time parametric survival models, Royston and Parmar's spline-based parametric survival models, and generalized linear models using pseudo-observations did not result in estimates with improved discrimination or predictive accuracy in settings in which the proportional hazards assumption was violated.

Conclusions: Violation of the proportional hazards assumption had an effect on the discrimination of predictions obtained using a Cox regression model.

背景:Cox比例风险回归模型经常用于估计个体在特定预测范围内经历结果的概率。该模型的一个关键假设是风险成比例。验证预测模型的一个重要组成部分是评估其辨别能力。歧视是指通过预测风险来区分那些经历过和没有经历过事件的人的能力。违反比例风险假设对从Cox模型获得的风险估计的歧视的影响尚未得到检验。方法:我们使用蒙特卡罗模拟来评估违反比例风险假设的程度对Cox模型的判别性的影响(使用曲线下的时变面积来评估)和对预测准确性的影响(使用预测准确性的时变指数来评估)。结果:与满足比例风险假设的情况相比,对数风险比与时间正相关的情况下,辨别力和预测准确性下降。相反,与满足比例风险假设的情况相比,对数风险比与时间负相关的情况下,辨别力和预测准确性增加。与使用Cox回归模型相比,使用加速失效时间参数生存模型、基于Royston和Parmar样条的参数生存模型,以及使用伪观测的广义线性模型,在违反比例风险假设的情况下,并没有得到更好的判别或预测精度的估计。结论:违反比例风险假设会影响使用Cox回归模型获得的预测的辨别性。
{"title":"The impact of violation of the proportional hazards assumption on the discrimination of the Cox proportional hazards model.","authors":"Peter C Austin, Daniele Giardiello","doi":"10.1186/s41512-026-00223-0","DOIUrl":"10.1186/s41512-026-00223-0","url":null,"abstract":"<p><strong>Background: </strong>The Cox proportional hazards regression model is frequently used to estimate an individual's probability of experiencing an outcome within a specified prediction horizon. A key assumption of this model is that of proportional hazards. An important component of validating a prediction model is assessing its discrimination. Discrimination refers to the ability of predicted risk to separate those who do and do not experience the event. The impact of violation of the proportional hazards assumption on the discrimination of risk estimates obtained from a Cox model has not been examined.</p><p><strong>Methods: </strong>We used Monte Carlo simulations to assess the impact of the magnitude of the violation of the proportional hazards assumption on the discrimination of a Cox model as assessed using the time-varying area under the curve and on predictive accuracy as assessed using the time-varying index of predictive accuracy.</p><p><strong>Results: </strong>Compared to settings in which the proportional hazards assumption was satisfied, discrimination and predictive accuracy decreased in settings in which the log-hazard ratio was positively associated with time. Conversely, compared to settings in which the proportional hazards assumption was satisfied, discrimination and predictive accuracy increased in settings in which the log-hazard ratio was negatively associated with time. Compared with the use of a Cox regression model, the use of accelerated failure time parametric survival models, Royston and Parmar's spline-based parametric survival models, and generalized linear models using pseudo-observations did not result in estimates with improved discrimination or predictive accuracy in settings in which the proportional hazards assumption was violated.</p><p><strong>Conclusions: </strong>Violation of the proportional hazards assumption had an effect on the discrimination of predictions obtained using a Cox regression model.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"7"},"PeriodicalIF":2.6,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12895773/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protocol for development of a reporting guideline (TRIPOD-Code) for code repositories associated with diagnostic and prognostic prediction model studies. 为与诊断和预后预测模型研究相关的代码库制定报告指南的协议(TRIPOD-Code)。
IF 2.6 Pub Date : 2026-02-10 DOI: 10.1186/s41512-025-00217-4
Tom Pollard, Thomas Sounack, Catherine A Gao, Leo Anthony Celi, Charlotta Lindvall, Hyeonhoon Lee, Hyung-Chul Lee, Karel G M Moons, Gary S Collins
{"title":"Protocol for development of a reporting guideline (TRIPOD-Code) for code repositories associated with diagnostic and prognostic prediction model studies.","authors":"Tom Pollard, Thomas Sounack, Catherine A Gao, Leo Anthony Celi, Charlotta Lindvall, Hyeonhoon Lee, Hyung-Chul Lee, Karel G M Moons, Gary S Collins","doi":"10.1186/s41512-025-00217-4","DOIUrl":"10.1186/s41512-025-00217-4","url":null,"abstract":"","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"4"},"PeriodicalIF":2.6,"publicationDate":"2026-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888484/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of methodological approaches to developing clinical prediction models for individuals living with multiple long-term conditions: a protocol for a systematic review. 为患有多种长期疾病的个体开发临床预测模型的方法学方法的比较:一项系统回顾的方案。
IF 2.6 Pub Date : 2026-02-06 DOI: 10.1186/s41512-026-00221-2
Lauren A Anderson, Joie Ensor, Clare L Gillies, Selina T Lock, Kamlesh Khunti, Laura J Gray
{"title":"A comparison of methodological approaches to developing clinical prediction models for individuals living with multiple long-term conditions: a protocol for a systematic review.","authors":"Lauren A Anderson, Joie Ensor, Clare L Gillies, Selina T Lock, Kamlesh Khunti, Laura J Gray","doi":"10.1186/s41512-026-00221-2","DOIUrl":"10.1186/s41512-026-00221-2","url":null,"abstract":"","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"6"},"PeriodicalIF":2.6,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12879393/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-based COVID-19 prognostic models lag behind in reporting quality: findings from a TRIPOD/TRIPOD + AI systematic review. 基于机器学习的COVID-19预后模型在报告质量方面落后:来自TRIPOD/TRIPOD + AI系统评价的发现。
IF 2.6 Pub Date : 2026-02-03 DOI: 10.1186/s41512-026-00218-x
Ioannis Partheniadis, Persefoni Talimtzi, Adriani Nikolakopoulou, Anna-Bettina Haidich

Background: Reporting of COVID-19 prognostic models frequently falls short of established standards. The TRIPOD checklist and its 2024 AI extension (TRIPOD + AI) provide a comprehensive framework for assessing reporting quality. We therefore evaluated and compared reporting completeness in conventional versus machine-learning models.

Methods: Studies reporting the development, and internal and external validation of prognostic prediction models for COVID-19 using either conventional or machine learning-based algorithms were included. Literature searches were conducted in MEDLINE, Epistemonikos.org, and Scopus (up to July 31, 2024). Studies using conventional statistical methods were evaluated under TRIPOD, while machine learning-based studies were assessed using TRIPOD + AI. Data extraction followed TRIPOD and TRIPOD + AI checklists, measuring adherence per article and per checklist item. The protocol was prospectively registered at the Open Science Framework ( https://osf.io/kg9yw ).

Results: A total of 53 studies describing 71 prognostic models were identified. Overall, adherence to both guidelines was low, with significantly poorer compliance among machine learning-based studies (TRIPOD + AI) compared to conventional model studies (TRIPOD) (28.4% vs. 38.1%, 95% CI of difference: 4.1-15.4). No study fully adhered to abstract reporting requirements, and appropriate titles were included in only a minority of cases (29.0%, 95% CI: 16.1-46.6 for TRIPOD; 13.6%, 95% CI: 4.8-33.3 for TRIPOD + AI). Sample size calculations were not fully reported in any study. Reporting of methods and results sections was poor across both frameworks.

Conclusion: Lower adherence among machine learning studies reflects the relatively recent publication of the TRIPOD + AI guidelines (April 2024), which postdate many of the included studies. Both conventional and machine learning-based prediction models showed insufficient reporting, with major gaps in model description and performance reporting. Greater compliance with reporting guidelines is critical to improving the clarity, reproducibility, and clinical value of prediction model research.

背景:报告的COVID-19预后模型往往达不到既定标准。TRIPOD清单及其2024 AI扩展(TRIPOD + AI)为评估报告质量提供了一个全面的框架。因此,我们评估并比较了传统模型与机器学习模型的报告完整性。方法:纳入使用传统或基于机器学习算法的COVID-19预后预测模型的开发和内部和外部验证的研究。在MEDLINE、Epistemonikos.org和Scopus中进行文献检索(截止到2024年7月31日)。使用传统统计方法的研究在TRIPOD下进行评估,而基于机器学习的研究使用TRIPOD + AI进行评估。数据提取遵循TRIPOD和TRIPOD + AI检查表,测量每个条目和每个检查表项目的依从性。该方案已在开放科学框架(https://osf.io/kg9yw)注册。结果:共有53项研究描述了71种预后模型。总体而言,两种指南的依从性都很低,与传统模型研究(TRIPOD)相比,基于机器学习的研究(TRIPOD + AI)的依从性明显较差(28.4%对38.1%,95% CI差异:4.1-15.4)。没有研究完全符合摘要报告要求,只有少数病例纳入了适当的标题(TRIPOD为29.0%,95% CI: 16.1-46.6; TRIPOD + AI为13.6%,95% CI: 4.8-33.3)。在任何研究中都没有完整的样本量计算报告。方法和结果部分的报告在两个框架中都很差。结论:机器学习研究的依从性较低反映了TRIPOD + AI指南(2024年4月)的相对较新出版,该指南迟于许多纳入的研究。传统预测模型和基于机器学习的预测模型报告不足,在模型描述和性能报告方面存在主要差距。更严格地遵守报告准则对于提高预测模型研究的清晰度、可重复性和临床价值至关重要。
{"title":"Machine learning-based COVID-19 prognostic models lag behind in reporting quality: findings from a TRIPOD/TRIPOD + AI systematic review.","authors":"Ioannis Partheniadis, Persefoni Talimtzi, Adriani Nikolakopoulou, Anna-Bettina Haidich","doi":"10.1186/s41512-026-00218-x","DOIUrl":"10.1186/s41512-026-00218-x","url":null,"abstract":"<p><strong>Background: </strong>Reporting of COVID-19 prognostic models frequently falls short of established standards. The TRIPOD checklist and its 2024 AI extension (TRIPOD + AI) provide a comprehensive framework for assessing reporting quality. We therefore evaluated and compared reporting completeness in conventional versus machine-learning models.</p><p><strong>Methods: </strong>Studies reporting the development, and internal and external validation of prognostic prediction models for COVID-19 using either conventional or machine learning-based algorithms were included. Literature searches were conducted in MEDLINE, Epistemonikos.org, and Scopus (up to July 31, 2024). Studies using conventional statistical methods were evaluated under TRIPOD, while machine learning-based studies were assessed using TRIPOD + AI. Data extraction followed TRIPOD and TRIPOD + AI checklists, measuring adherence per article and per checklist item. The protocol was prospectively registered at the Open Science Framework ( https://osf.io/kg9yw ).</p><p><strong>Results: </strong>A total of 53 studies describing 71 prognostic models were identified. Overall, adherence to both guidelines was low, with significantly poorer compliance among machine learning-based studies (TRIPOD + AI) compared to conventional model studies (TRIPOD) (28.4% vs. 38.1%, 95% CI of difference: 4.1-15.4). No study fully adhered to abstract reporting requirements, and appropriate titles were included in only a minority of cases (29.0%, 95% CI: 16.1-46.6 for TRIPOD; 13.6%, 95% CI: 4.8-33.3 for TRIPOD + AI). Sample size calculations were not fully reported in any study. Reporting of methods and results sections was poor across both frameworks.</p><p><strong>Conclusion: </strong>Lower adherence among machine learning studies reflects the relatively recent publication of the TRIPOD + AI guidelines (April 2024), which postdate many of the included studies. Both conventional and machine learning-based prediction models showed insufficient reporting, with major gaps in model description and performance reporting. Greater compliance with reporting guidelines is critical to improving the clarity, reproducibility, and clinical value of prediction model research.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"3"},"PeriodicalIF":2.6,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146108706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of the Leicester risk assessment and Leicester practice risk scores for assessing the risk of undiagnosed type 2 diabetes or prediabetes in diverse populations: protocol for a systematic review of published validations and updates. 在不同人群中评估未确诊的2型糖尿病或前驱糖尿病风险的Leicester风险评估和Leicester实践风险评分的表现:已发表验证和更新的系统评价方案。
IF 2.6 Pub Date : 2026-01-15 DOI: 10.1186/s41512-026-00219-w
Louise Haddon, Joie Ensor, Kamlesh Khunti, Laura J Gray

Background: Approximately one million adults in the UK are estimated to have undiagnosed type 2 diabetes mellitus (T2DM), with a further 5.1 million adults with nondiabetic hyperglycaemia (NDH) that does not meet the threshold for a diabetes diagnosis. The Leicester Risk Assessment score (LRA) and Leicester Practice Risk score (LPR) are diagnostic risk prediction models that estimate an individual's risk of undiagnosed T2DM and NDH, developed for use in community and primary care settings respectively. The LRA is also used as a prognostic model; neither model has been updated since development. This study will systematically review all applications of these models as diagnostic and prognostic tools and any published updates to evaluate their performance in different populations. This review has been registered with PROSPERO (CRD420251005841).

Methods: We will implement a citation search strategy to search Scopus, Web of Science and Google Scholar, restricted to full text, English language papers. Eligible papers will validate, update or modify either model. Data will be extracted using a form based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist; missing information will be sought from authors or estimated from other available information where possible. Meta-analysis of predictive performance measures will be completed if sufficient data exist. Subgroup and sensitivity analyses will be used to explore between-study heterogeneity and risk-of-bias impact.

Discussion: This review will identify studies that have implemented, modified or validated the LRA and LPR for the risk of undiagnosed T2DM and NDH in different populations. This will allow summary measures, including level of uncertainty, of model performance to be calculated, making this highly relevant to individuals and stakeholders who recommend and implement these models. Review conclusions will also inform the potential update and recalibration of the models. This will ultimately lead to improved outcomes through earlier diagnosis and management.

背景:据估计,英国约有100万成年人患有未确诊的2型糖尿病(T2DM),另有510万成年人患有未达到糖尿病诊断阈值的非糖尿病性高血糖(NDH)。莱斯特风险评估评分(LRA)和莱斯特实践风险评分(LPR)是诊断性风险预测模型,用于估计个人未确诊的T2DM和NDH的风险,分别用于社区和初级保健机构。LRA也被用作预测模型;这两个模型自开发以来都没有更新过。本研究将系统地回顾这些模型作为诊断和预后工具的所有应用,以及任何已发表的更新,以评估它们在不同人群中的表现。本综述已在普洛斯彼罗注册(CRD420251005841)。方法:我们将实施引文检索策略,检索Scopus、Web of Science和谷歌Scholar,仅限于全文英文论文。合格的论文将验证,更新或修改任何一个模型。将使用基于预测模型研究系统审查关键评估和数据提取清单的表格提取数据;在可能的情况下,将从作者处寻找或从其他现有资料中估计缺失的资料。如果有足够的数据,将完成预测性能指标的meta分析。亚组和敏感性分析将用于研究间异质性和偏倚风险影响。讨论:本综述将确定在不同人群中实施、修改或验证LRA和LPR的未确诊T2DM和NDH风险的研究。这将允许计算模型性能的总结度量,包括不确定性水平,使其与推荐和实现这些模型的个人和涉众高度相关。审查结论还将为模型的可能更新和重新校准提供信息。这将最终通过早期诊断和管理改善结果。
{"title":"Performance of the Leicester risk assessment and Leicester practice risk scores for assessing the risk of undiagnosed type 2 diabetes or prediabetes in diverse populations: protocol for a systematic review of published validations and updates.","authors":"Louise Haddon, Joie Ensor, Kamlesh Khunti, Laura J Gray","doi":"10.1186/s41512-026-00219-w","DOIUrl":"10.1186/s41512-026-00219-w","url":null,"abstract":"<p><strong>Background: </strong>Approximately one million adults in the UK are estimated to have undiagnosed type 2 diabetes mellitus (T2DM), with a further 5.1 million adults with nondiabetic hyperglycaemia (NDH) that does not meet the threshold for a diabetes diagnosis. The Leicester Risk Assessment score (LRA) and Leicester Practice Risk score (LPR) are diagnostic risk prediction models that estimate an individual's risk of undiagnosed T2DM and NDH, developed for use in community and primary care settings respectively. The LRA is also used as a prognostic model; neither model has been updated since development. This study will systematically review all applications of these models as diagnostic and prognostic tools and any published updates to evaluate their performance in different populations. This review has been registered with PROSPERO (CRD420251005841).</p><p><strong>Methods: </strong>We will implement a citation search strategy to search Scopus, Web of Science and Google Scholar, restricted to full text, English language papers. Eligible papers will validate, update or modify either model. Data will be extracted using a form based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist; missing information will be sought from authors or estimated from other available information where possible. Meta-analysis of predictive performance measures will be completed if sufficient data exist. Subgroup and sensitivity analyses will be used to explore between-study heterogeneity and risk-of-bias impact.</p><p><strong>Discussion: </strong>This review will identify studies that have implemented, modified or validated the LRA and LPR for the risk of undiagnosed T2DM and NDH in different populations. This will allow summary measures, including level of uncertainty, of model performance to be calculated, making this highly relevant to individuals and stakeholders who recommend and implement these models. Review conclusions will also inform the potential update and recalibration of the models. This will ultimately lead to improved outcomes through earlier diagnosis and management.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"2"},"PeriodicalIF":2.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12810001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction models developed using artificial intelligence: similar predictive performances with highly varying predictions for individuals - an illustration in deep vein thrombosis. 使用人工智能开发的预测模型:类似的预测性能,对个体的预测高度不同-深静脉血栓形成的例子。
IF 2.6 Pub Date : 2026-01-08 DOI: 10.1186/s41512-025-00216-5
Maerziya Yusufujiang, Constanza L Andaur Navarro, Johanna Aa Damen, Toshihiko Takada, Geert-Jan Geersing, Lotty Hooft, Ewoud Schuit, Karel Gm Moons, Valentijn Mt de Jong, Maarten van Smeden
<p><strong>Objectives: </strong>The rise in popularity and off-the-shelf availability of machine learning (ML) and AI-based methodology to develop new prediction models provides developers with ample choices to compare and select the best performing model out of many possible models. Many studies have shown that such comparisons on any particular dataset, the difference in performance between models developed using different techniques (e.g. logistic regression, vs. random forest or neural networks) can often be small, especially when looking at crude performance measures such as the area under the ROC curve. This may lead to the conclusion that such models are essentially exchangeable, and model selection is arbitrary. However, as we will illustrate using a dataset on deep venous thrombosis, prediction models with similar discriminative performance may nonetheless generate different outcome probability estimates for individual patients and potentially lead to meaningfully different decision making.</p><p><strong>Methods: </strong>We developed diagnostic prediction models to predict the presence of deep venous thrombosis (DVT) in a large dataset of patients with leg symptoms suspected of having DVT, using five modelling techniques: unpenalized logistic regression (ULR), ridge logistic regression (RLR), random forests (RF), support vector machine (SVM) and neural network (NN). Age, sex, d-dimer, history of DVT, diagnosis alternative to DVT, and having cancer were used as a fixed set of predictors. Model performance was evaluated in terms of discrimination, calibration, and stability of individual risk prediction for a set of patients across the models.</p><p><strong>Results: </strong>Of the 6,087 suspected patients, 1,146 (19%) were diagnosed with DVT based on leg ultrasound (reference test). Three prediction models (ULR, RLR, NN) had similar discrimination with AUCs point estimates of 0.84. However, the 6087 individuals' estimated probabilities of DVT varied substantially across the five different modelling techniques, highlighting differences in prediction stability. Notably, the RF model tended to overestimate individual risks, while the SVM model tended to underestimate them compared to the other models. While the estimated probabilities were more similar for ULR, RLR and NN, classification measures (sensitivity, specificity, positive and negative predictive value) did differ because of differences in estimated probabilities of individuals near the risk threshold, illustrating that differences, even when relatively small, could potentially lead to different clinical decisions.</p><p><strong>Conclusions: </strong>Prediction models developed with different modeling techniques yielded very different individuals' outcome probabilities, even though the models had similar discriminative performance in this low-dimensional setting. Part of this variation can be explained by differences in calibration but also from modelling choices as estimated risks
目标:机器学习(ML)和基于人工智能的开发新预测模型的方法的普及和现成可用性的增加为开发人员提供了充足的选择来比较和选择许多可能的模型中表现最好的模型。许多研究表明,在任何特定数据集上进行此类比较,使用不同技术(例如逻辑回归,与随机森林或神经网络)开发的模型之间的性能差异通常很小,特别是在查看粗糙性能度量(如ROC曲线下面积)时。这可能导致这样的结论,即这些模型本质上是可交换的,模型选择是任意的。然而,正如我们将使用深静脉血栓形成的数据集所说明的那样,具有相似判别性能的预测模型可能会对个体患者产生不同的结果概率估计,并可能导致有意义的不同决策。方法:我们建立了诊断预测模型,以预测深静脉血栓形成(DVT)的存在,在一个大型数据集的腿部症状疑似有DVT的患者,使用五种建模技术:无惩罚逻辑回归(ULR),脊逻辑回归(RLR),随机森林(RF),支持向量机(SVM)和神经网络(NN)。年龄、性别、d-二聚体、DVT病史、DVT替代诊断以及是否患有癌症被用作一组固定的预测因子。根据模型中一组患者的个体风险预测的区分、校准和稳定性来评估模型的性能。结果:6087例疑似患者中,1146例(19%)通过腿部超声(参考试验)诊断为DVT。三种预测模型(ULR、RLR、NN)具有相似的判别性,auc点估定值为0.84。然而,6087个个体的DVT估计概率在五种不同的建模技术中差异很大,突出了预测稳定性的差异。值得注意的是,与其他模型相比,RF模型倾向于高估个体风险,而SVM模型倾向于低估个体风险。虽然ULR、RLR和NN的估计概率更相似,但分类措施(敏感性、特异性、阳性和阴性预测值)确实存在差异,因为接近风险阈值的个体的估计概率存在差异,这表明差异,即使相对较小,也可能导致不同的临床决策。结论:使用不同建模技术开发的预测模型产生了非常不同的个体结果概率,尽管这些模型在低维环境中具有相似的判别性能。这种差异的部分原因可以解释为校准的差异,但也可以解释为建模选择的差异,因为具有相似校准性能的建模技术的估计风险也不同。因此,我们的研究结果强调了建模技术的选择对模型性能、个人估计概率的影响,从而对基于风险的临床决策的选择的影响。
{"title":"Prediction models developed using artificial intelligence: similar predictive performances with highly varying predictions for individuals - an illustration in deep vein thrombosis.","authors":"Maerziya Yusufujiang, Constanza L Andaur Navarro, Johanna Aa Damen, Toshihiko Takada, Geert-Jan Geersing, Lotty Hooft, Ewoud Schuit, Karel Gm Moons, Valentijn Mt de Jong, Maarten van Smeden","doi":"10.1186/s41512-025-00216-5","DOIUrl":"10.1186/s41512-025-00216-5","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Objectives: &lt;/strong&gt;The rise in popularity and off-the-shelf availability of machine learning (ML) and AI-based methodology to develop new prediction models provides developers with ample choices to compare and select the best performing model out of many possible models. Many studies have shown that such comparisons on any particular dataset, the difference in performance between models developed using different techniques (e.g. logistic regression, vs. random forest or neural networks) can often be small, especially when looking at crude performance measures such as the area under the ROC curve. This may lead to the conclusion that such models are essentially exchangeable, and model selection is arbitrary. However, as we will illustrate using a dataset on deep venous thrombosis, prediction models with similar discriminative performance may nonetheless generate different outcome probability estimates for individual patients and potentially lead to meaningfully different decision making.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We developed diagnostic prediction models to predict the presence of deep venous thrombosis (DVT) in a large dataset of patients with leg symptoms suspected of having DVT, using five modelling techniques: unpenalized logistic regression (ULR), ridge logistic regression (RLR), random forests (RF), support vector machine (SVM) and neural network (NN). Age, sex, d-dimer, history of DVT, diagnosis alternative to DVT, and having cancer were used as a fixed set of predictors. Model performance was evaluated in terms of discrimination, calibration, and stability of individual risk prediction for a set of patients across the models.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Of the 6,087 suspected patients, 1,146 (19%) were diagnosed with DVT based on leg ultrasound (reference test). Three prediction models (ULR, RLR, NN) had similar discrimination with AUCs point estimates of 0.84. However, the 6087 individuals' estimated probabilities of DVT varied substantially across the five different modelling techniques, highlighting differences in prediction stability. Notably, the RF model tended to overestimate individual risks, while the SVM model tended to underestimate them compared to the other models. While the estimated probabilities were more similar for ULR, RLR and NN, classification measures (sensitivity, specificity, positive and negative predictive value) did differ because of differences in estimated probabilities of individuals near the risk threshold, illustrating that differences, even when relatively small, could potentially lead to different clinical decisions.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Prediction models developed with different modeling techniques yielded very different individuals' outcome probabilities, even though the models had similar discriminative performance in this low-dimensional setting. Part of this variation can be explained by differences in calibration but also from modelling choices as estimated risks","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"10 1","pages":"1"},"PeriodicalIF":2.6,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784591/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models - part 2: time-to-event outcomes. 分解Fisher的信息,为开发或更新公平和精确的临床预测模型提供样本量-第2部分:时间到事件结果。
IF 2.6 Pub Date : 2025-12-16 DOI: 10.1186/s41512-025-00204-9
Richard D Riley, Gary S Collins, Lucinda Archer, Rebecca Whittle, Amardeep Legha, Laura Kirton, Paula Dhiman, Mohsen Sadatsafavi, Nicola J Adderley, Joseph Alderman, Glen P Martin, Joie Ensor

Background: When developing a clinical prediction model using time-to-event data (i.e. with censoring and different lengths of follow-up), previous research focuses on the sample size needed to minimise overfitting and precisely estimating the overall risk. However, instability of individual-level risk estimates may still be large.

Methods: We propose using a decomposition of Fisher's information matrix to help examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used either before data collection or when an existing dataset is available. Steps 1 to 5 require researchers to specify the overall risk in the target population at a key time-point of interest: an assumed pragmatic 'core model' in the form of an exponential regression model, the (anticipated) joint distribution of core predictors included in that model and the distribution of censoring times. The 'core model' can be specified directly or based on a specified C-index and relative effects of (standardised) predictors. The joint distribution of predictors may be available directly in an existing dataset, in a pilot study or in a synthetic dataset provided by other researchers.

Results: We derive closed-form solutions that decompose the variance of an individual's estimated event rate into Fisher's unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including any risk thresholds for decision-making, and examine fairness concerns for pre- and postmenopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our exponential approach are close to using more flexible parametric models.

Conclusions: Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.

背景:在使用事件发生时间数据(即通过审查和不同随访时间)开发临床预测模型时,以前的研究主要关注最小化过拟合所需的样本量和精确估计总体风险。然而,个人层面风险估计的不稳定性可能仍然很大。方法:我们建议使用Fisher信息矩阵的分解来帮助检查和计算开发模型所需的样本量,该模型旨在进行精确和公平的风险估计。我们提出了一个六步流程,既可以在数据收集之前使用,也可以在现有数据集可用时使用。步骤1至步骤5要求研究人员在关键时间点指定目标人群的总体风险:以指数回归模型的形式假设的实用“核心模型”,该模型中包含的核心预测因子的(预期的)联合分布以及审查时间的分布。“核心模型”可以直接指定,也可以基于指定的c指数和(标准化)预测因子的相对影响。预测因子的联合分布可以直接从现有数据集、试点研究或其他研究人员提供的合成数据集中获得。结果:我们得到了封闭形式的解,将个体估计事件率的方差分解为Fisher单位信息矩阵、预测值和总样本量;这使得研究人员可以计算和检查特定样本量的个体风险估计和错误分类概率周围的不确定性分布。我们在乳腺癌中提供了一个说明性的例子,强调临床背景的重要性,包括决策的任何风险阈值,并检查绝经前和绝经后妇女的公平性问题。最后,在两个经验评估中,我们保证基于我们的指数方法的不确定性区间宽度接近于使用更灵活的参数模型。结论:我们的方法允许用户通过pmstabilityss模块确定所需的(目标)样本量,以开发时间到事件结果的预测模型。它旨在促进模型在个人层面的预测中具有更高的信任度、可靠性和公平性。
{"title":"A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models - part 2: time-to-event outcomes.","authors":"Richard D Riley, Gary S Collins, Lucinda Archer, Rebecca Whittle, Amardeep Legha, Laura Kirton, Paula Dhiman, Mohsen Sadatsafavi, Nicola J Adderley, Joseph Alderman, Glen P Martin, Joie Ensor","doi":"10.1186/s41512-025-00204-9","DOIUrl":"10.1186/s41512-025-00204-9","url":null,"abstract":"<p><strong>Background: </strong>When developing a clinical prediction model using time-to-event data (i.e. with censoring and different lengths of follow-up), previous research focuses on the sample size needed to minimise overfitting and precisely estimating the overall risk. However, instability of individual-level risk estimates may still be large.</p><p><strong>Methods: </strong>We propose using a decomposition of Fisher's information matrix to help examine and calculate the sample size required for developing a model that aims for precise and fair risk estimates. We propose a six-step process which can be used either before data collection or when an existing dataset is available. Steps 1 to 5 require researchers to specify the overall risk in the target population at a key time-point of interest: an assumed pragmatic 'core model' in the form of an exponential regression model, the (anticipated) joint distribution of core predictors included in that model and the distribution of censoring times. The 'core model' can be specified directly or based on a specified C-index and relative effects of (standardised) predictors. The joint distribution of predictors may be available directly in an existing dataset, in a pilot study or in a synthetic dataset provided by other researchers.</p><p><strong>Results: </strong>We derive closed-form solutions that decompose the variance of an individual's estimated event rate into Fisher's unit information matrix, predictor values and total sample size; this allows researchers to calculate and examine uncertainty distributions around individual risk estimates and misclassification probabilities for specified sample sizes. We provide an illustrative example in breast cancer and emphasise the importance of clinical context, including any risk thresholds for decision-making, and examine fairness concerns for pre- and postmenopausal women. Lastly, in two empirical evaluations, we provide reassurance that uncertainty interval widths based on our exponential approach are close to using more flexible parametric models.</p><p><strong>Conclusions: </strong>Our approach allows users to identify the (target) sample size required to develop a prediction model for time-to-event outcomes, via the pmstabilityss module. It aims to facilitate models with improved trust, reliability and fairness in individual-level predictions.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"33"},"PeriodicalIF":2.6,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12709744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risk of bias in machine learning and statistical models to predict height or weight: a systematic review in fetal and paediatric medicine. 预测身高或体重的机器学习和统计模型的偏倚风险:胎儿和儿科医学的系统综述。
IF 2.6 Pub Date : 2025-12-15 DOI: 10.1186/s41512-025-00215-6
Neil R Lawrence, Irina Bacila, Joseph Tonge, Anthea Tucker, Jeremy Dawson, Zi-Qiang Lang, Nils P Krone, Paula Dhiman, Gary S Collins

Background: Prediction of suboptimal growth allows early intervention that can improve outcomes for developing fetus' as well as infants and children. We investigate the risk of bias in statistical or machine learning models to predict the height or weight of a fetus, infant or child under 20 years of age to inform the current standard of research and provide insight into why equations developed over 30 years ago are still recommended for use by national professional bodies.

Methods: We systematically searched MEDLINE and EMBASE for peer reviewed original research studies published in 2022. We included studies if they developed or validated a multivariable model to predict height or weight of an individual using two or more variables, excluding studies assessing imaging or using genetics or metabolomics information. Risk of bias was assessed for all prediction models and analyses using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Results: Sixty-four studies were included, in which we assessed the development of 180 models and validation of 61 models. Sample size was only considered in 10% of developed models and 13% of validated models. Despite height and weight being continuous variables, 77% of models developed predicted a dichotomised outcome variable.

Registration: The review was registered on PROSPERO (ID: CRD42023421146), the International prospective register of systematic reviews on 26/4/2023.

背景:亚理想生长的预测允许早期干预,可以改善发育中的胎儿以及婴儿和儿童的结局。我们调查了统计或机器学习模型中的偏差风险,以预测胎儿、婴儿或20岁以下儿童的身高或体重,以告知当前的研究标准,并深入了解为什么30多年前开发的方程仍然被国家专业机构推荐使用。方法:系统检索MEDLINE和EMBASE,检索2022年发表的经同行评审的原创研究。我们纳入了使用两个或更多变量开发或验证了预测个体身高或体重的多变量模型的研究,排除了评估影像学或使用遗传或代谢组学信息的研究。使用预测模型偏倚风险评估工具(PROBAST)对所有预测模型和分析进行偏倚风险评估。结果:共纳入64项研究,其中评估了180个模型的开发和61个模型的验证。只有10%的已开发模型和13%的已验证模型考虑了样本量。尽管身高和体重是连续变量,但77%的模型预测了一个二分类的结果变量。注册:该综述已于2023年4月26日在国际前瞻性系统评价注册网站PROSPERO (ID: CRD42023421146)注册。
{"title":"Risk of bias in machine learning and statistical models to predict height or weight: a systematic review in fetal and paediatric medicine.","authors":"Neil R Lawrence, Irina Bacila, Joseph Tonge, Anthea Tucker, Jeremy Dawson, Zi-Qiang Lang, Nils P Krone, Paula Dhiman, Gary S Collins","doi":"10.1186/s41512-025-00215-6","DOIUrl":"10.1186/s41512-025-00215-6","url":null,"abstract":"<p><strong>Background: </strong>Prediction of suboptimal growth allows early intervention that can improve outcomes for developing fetus' as well as infants and children. We investigate the risk of bias in statistical or machine learning models to predict the height or weight of a fetus, infant or child under 20 years of age to inform the current standard of research and provide insight into why equations developed over 30 years ago are still recommended for use by national professional bodies.</p><p><strong>Methods: </strong>We systematically searched MEDLINE and EMBASE for peer reviewed original research studies published in 2022. We included studies if they developed or validated a multivariable model to predict height or weight of an individual using two or more variables, excluding studies assessing imaging or using genetics or metabolomics information. Risk of bias was assessed for all prediction models and analyses using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).</p><p><strong>Results: </strong>Sixty-four studies were included, in which we assessed the development of 180 models and validation of 61 models. Sample size was only considered in 10% of developed models and 13% of validated models. Despite height and weight being continuous variables, 77% of models developed predicted a dichotomised outcome variable.</p><p><strong>Registration: </strong>The review was registered on PROSPERO (ID: CRD42023421146), the International prospective register of systematic reviews on 26/4/2023.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"32"},"PeriodicalIF":2.6,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12703889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natriuretic peptides testing and survival prediction models for chronic heart failure: a systematic review of added prognostic value. 慢性心力衰竭的利钠肽检测和生存预测模型:一项增加预后价值的系统综述。
IF 2.6 Pub Date : 2025-12-09 DOI: 10.1186/s41512-025-00210-x
Charlotte A Smith, Kathryn S Taylor, Nicholas R Jones, Dominik Roth, Amy Magona, Nia Roberts, Clare J Taylor, F D Richard Hobbs, Maria D L A Vazquez-Montes

Background: High natriuretic peptide levels are associated with a poor outcome in adults with chronic heart failure (CHF). However, the incremented prediction accuracy of multivariable prognostic models after adding B-type natriuretic peptide (BNP) and/or N-terminal proBNP (NT-proBNP) remains unclear.

Methods: We carried out a systematic review narrative analysis of added-value studies of BNP and NT-proBNP in CHF prognostication. Primary clinical studies investigating prognostic model development or validation in adult participants with CHF were included. Any studies of individual factors' association with patient outcomes, treatment efficacy, or those using patients with transplant/ventricular assist devices, ≥ 10% of patients with advanced HF, or significant comorbidities, HF secondary to congenital/reversible conditions, or ≥ 33% of patients with valvular HF were excluded. The databases MEDLINE, Embase, Science Citation Index, and Cochrane Prognosis Methods Group Database were searched from January 1990 to February 2024. Predictive performance was measured in terms of discrimination and calibration, the added value in terms of the c-statistic difference before and after adding BNP and/or NT-proBNP to a base model, and the risk reclassification, namely, net reclassification index (NRI) and integrated discrimination improvement (IDI). Risk of bias assessment used the Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Results: Fourteen added-value studies comprising a total of 50,949 individuals were included. Both BNP and NT-proBNP consistently improved mortality prediction performance, but studies only presented separately before and after c-statistics, without formally testing for statistically significant differences. Meta-analysis was impossible due to missing data on the change in predictive performance and data heterogeneity. All studies reported discrimination. Few reported calibration, NRI, and IDI. All studies except one were deemed to be at high risk of bias, whereas 50% showed high applicability to the review question, with only 14% scoring high for applicability concern, and the rest were unclear.

Conclusions: Improving consistency in researching and reporting the added value of natriuretic peptide testing to predict mortality in chronic heart failure patients could facilitate summarizing and interpreting the results more meaningfully.

Registration: This review is a refinement of the methods and a search update of the review of added-value biomarkers in HF prognosis (PROSPERO registration number: CRD42019086993).

背景:高利钠肽水平与成人慢性心力衰竭(CHF)的不良预后相关。然而,在加入b型利钠肽(BNP)和/或n端proBNP (NT-proBNP)后,多变量预后模型的预测准确性增加尚不清楚。方法:我们对BNP和NT-proBNP在CHF预测中的附加价值研究进行了系统回顾和叙述分析。纳入了调查成年CHF患者预后模型开发或验证的初步临床研究。排除任何与患者结局、治疗效果、或使用移植/心室辅助装置患者、≥10%的晚期HF患者、或显著合并症、继发于先天性/可逆性疾病的HF患者或≥33%的瓣膜性HF患者相关的个体因素的研究。检索数据库为MEDLINE、Embase、Science Citation Index和Cochrane Prognosis Methods Group Database,检索时间为1990年1月至2024年2月。通过判别和校准、在基础模型中加入BNP和/或NT-proBNP前后的c统计量差异的附加值以及风险重分类,即净重分类指数(NRI)和综合判别改善(IDI)来衡量预测性能。偏倚风险评估采用预测模型偏倚风险评估工具(PROBAST)。结果:14项附加价值研究共纳入50,949人。BNP和NT-proBNP都能持续提高死亡率预测性能,但研究只分别在c统计前后进行,没有正式检验统计学上的显著差异。由于缺少预测性能变化的数据和数据异质性,不可能进行meta分析。所有的研究都报告了歧视。很少报告校准、NRI和IDI。除一项研究外,所有研究均被认为存在高偏倚风险,而50%的研究显示对综述问题具有高适用性,只有14%的研究在适用性方面得分较高,其余的研究不清楚。结论:提高研究和报道利钠肽检测预测慢性心力衰竭患者死亡率的附加价值的一致性,有助于对结果进行更有意义的总结和解释。注册:本综述是对心力衰竭预后中附加价值生物标志物研究方法的改进和检索更新(PROSPERO注册号:CRD42019086993)。
{"title":"Natriuretic peptides testing and survival prediction models for chronic heart failure: a systematic review of added prognostic value.","authors":"Charlotte A Smith, Kathryn S Taylor, Nicholas R Jones, Dominik Roth, Amy Magona, Nia Roberts, Clare J Taylor, F D Richard Hobbs, Maria D L A Vazquez-Montes","doi":"10.1186/s41512-025-00210-x","DOIUrl":"10.1186/s41512-025-00210-x","url":null,"abstract":"<p><strong>Background: </strong>High natriuretic peptide levels are associated with a poor outcome in adults with chronic heart failure (CHF). However, the incremented prediction accuracy of multivariable prognostic models after adding B-type natriuretic peptide (BNP) and/or N-terminal proBNP (NT-proBNP) remains unclear.</p><p><strong>Methods: </strong>We carried out a systematic review narrative analysis of added-value studies of BNP and NT-proBNP in CHF prognostication. Primary clinical studies investigating prognostic model development or validation in adult participants with CHF were included. Any studies of individual factors' association with patient outcomes, treatment efficacy, or those using patients with transplant/ventricular assist devices, ≥ 10% of patients with advanced HF, or significant comorbidities, HF secondary to congenital/reversible conditions, or ≥ 33% of patients with valvular HF were excluded. The databases MEDLINE, Embase, Science Citation Index, and Cochrane Prognosis Methods Group Database were searched from January 1990 to February 2024. Predictive performance was measured in terms of discrimination and calibration, the added value in terms of the c-statistic difference before and after adding BNP and/or NT-proBNP to a base model, and the risk reclassification, namely, net reclassification index (NRI) and integrated discrimination improvement (IDI). Risk of bias assessment used the Prediction model Risk Of Bias ASsessment Tool (PROBAST).</p><p><strong>Results: </strong>Fourteen added-value studies comprising a total of 50,949 individuals were included. Both BNP and NT-proBNP consistently improved mortality prediction performance, but studies only presented separately before and after c-statistics, without formally testing for statistically significant differences. Meta-analysis was impossible due to missing data on the change in predictive performance and data heterogeneity. All studies reported discrimination. Few reported calibration, NRI, and IDI. All studies except one were deemed to be at high risk of bias, whereas 50% showed high applicability to the review question, with only 14% scoring high for applicability concern, and the rest were unclear.</p><p><strong>Conclusions: </strong>Improving consistency in researching and reporting the added value of natriuretic peptide testing to predict mortality in chronic heart failure patients could facilitate summarizing and interpreting the results more meaningfully.</p><p><strong>Registration: </strong>This review is a refinement of the methods and a search update of the review of added-value biomarkers in HF prognosis (PROSPERO registration number: CRD42019086993).</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"31"},"PeriodicalIF":2.6,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12687552/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145709918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Diagnostic and prognostic research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1