首页 > 最新文献

Diagnostic and prognostic research最新文献

英文 中文
Calibrating multiplex serology for Helicobacter pylori. 幽门螺杆菌多重血清学校正。
IF 2.6 Pub Date : 2025-08-11 DOI: 10.1186/s41512-025-00202-x
Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki

Background: Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.

Methods: In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.

Results: All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.

Conclusions: Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.

背景:幽门螺杆菌(Helicobacter pylori, H. pylori)是一种定植于胃部的细菌,是胃癌的主要危险因素,据估计,全世界89%的非贲门性胃癌病例可归因于幽门螺杆菌。前瞻性研究为量化胃癌和幽门螺杆菌之间的关系提供了可靠的证据,因为它们规避了因癌症发展前抗体水平可能降低而导致假阴性的风险。方法:在中国嘉道理生物库的一项大规模前瞻性研究中,幽门螺旋杆菌感染被分析为胃癌的危险因素。感染的存在通常通过血清学测试来确定。免疫印迹试验虽然已经建立,但比其他高通量多重血清学试验需要更多的劳动强度和更多的血浆。免疫印迹输出二元阳性/阴性血清状态分类,而多元输出连续抗原测量的载体。当将这种多维连续测量映射到二元分类时,在定义分类截止点和考虑不同抗原提供的感染证据的差异方面出现了统计上的挑战。我们讨论了这些挑战,并提出了一种新的解决方案,利用分类算法(贝叶斯加性回归树(BART)、多维单调BART、逻辑回归、随机森林和弹性网络),将多重血清学的连续测量结果优化转化为幽门螺杆菌感染的概率。我们(i)校准和应用分类模型来预测多重测量下幽门螺杆菌感染的概率,(ii)使用免疫印迹作为参考比较模型的预测性能,(iii)讨论预测性能差异的原因,(iv)应用校准模型来深入了解各种抗原提供的感染证据的相对优势。结果:所有模型均显示出较高的判别能力,对训练和测试数据的曲线下面积(AUC)估计至少为95%。模型在训练数据和测试数据上的性能没有显著差异。结论:分类算法可用于校正中国嘉道理生物库的多重幽门螺杆菌血清学检测和免疫印迹检测。这项研究进一步加深了我们对分类算法在血清学测试中的适用性的理解。
{"title":"Calibrating multiplex serology for Helicobacter pylori.","authors":"Emmanuelle A Dankwa, Martyn Plummer, Daniel Chapman, Rima Jeske, Julia Butt, Michael Hill, Tim Waterboer, Iona Y Millwood, Ling Yang, Christiana Kartsonaki","doi":"10.1186/s41512-025-00202-x","DOIUrl":"10.1186/s41512-025-00202-x","url":null,"abstract":"<p><strong>Background: </strong>Helicobacter pylori (H. pylori) is a bacterium that colonizes the stomach and is a major risk factor for gastric cancer, with an estimated 89% of non-cardia gastric cancer cases worldwide attributable to H. pylori. Prospective studies provide reliable evidence for quantifying the association between gastric cancer and H. pylori, as they circumvent the risk of a false negative due to possible reduction in antibody levels before cancer development.</p><p><strong>Methods: </strong>In a large-scale prospective study within the China Kadoorie Biobank, H. pylori infection is being analysed as a risk factor for gastric cancer. The presence of infection is typically determined by serological tests. The immunoblot test, although well established, is more labour intensive and uses a larger amount of plasma than the alternative high-throughput multiplex serology test. Immunoblot outputs a binary positive/negative serostatus classification, while multiplex outputs a vector of continuous antigen measurements. When mapping such multidimensional continuous measurements onto a binary classification, statistical challenges arise in defining classification cut-offs and accounting for the differences in infection evidence provided by different antigens. We discuss these challenges and propose a novel solution to optimize the translation of the continuous measurements from multiplex serology into probabilities of H. pylori infection, using classification algorithms (Bayesian additive regressive trees (BART), multidimensional monotone BART, logistic regression, random forest and elastic net). We (i) calibrate and apply classification models to predict probabilities of H. pylori infection given multiplex measurements, (ii) compare the predictive performance of the models using immunoblot as reference, (iii) discuss reasons for the differences in predictive performance and (iv) apply the calibrated models to gain insights on the relative strengths of infection evidence provided by the various antigens.</p><p><strong>Results: </strong>All models showed high discriminative ability with at least 95% area under the curve (AUC) estimates on the training and test data. There was no substantial difference between the performance of models on the training and test data.</p><p><strong>Conclusions: </strong>Classification algorithms can be used to calibrate the H. pylori multiplex serology test to the immunoblot test in the China Kadoorie Biobank. This study furthers our understanding of the applicability of classification algorithms to the context of serologic tests.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"17"},"PeriodicalIF":2.6,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144818449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting patient outcomes and risk for revision surgery after hip and knee replacement surgery: study protocol for a comparison of modelling approaches using the Swiss National Joint Registry (SIRIS). 预测髋关节和膝关节置换术后翻修手术的患者预后和风险:使用瑞士国家联合登记(SIRIS)的建模方法比较的研究方案。
IF 2.6 Pub Date : 2025-08-04 DOI: 10.1186/s41512-025-00200-z
Léonie Hofstetter, Nathalie Schweyckart, Christof Seiler, Christian Brand, Laura C Rosella, Mazda Farshad, Milo A Puhan, Cesar A Hincapié

Background: Prediction of postoperative patient-reported outcomes and risk for revision surgery after total hip arthroplasty (THA) or total knee arthroplasty (TKA) can inform clinical decision-making, health resource allocation, and care planning. Machine learning (ML) algorithms are increasingly used as an alternative to traditional logistic regression (LR) prediction, but there is uncertainty about their superiority in overall model performance. The aim of this study is to compare the predictive performance of LR with different ML approaches for predicting patient outcomes and risk for revision surgery after THA and TKA.

Methods: A population-based historical cohort study will be developed using routinely collected data from all primary and revision THA and TKA procedures performed in Switzerland and registered in the Swiss National Joint Registry (SIRIS). Patients of age ≥ 18 years with surgery for primary osteoarthritis from 01 January 2015 up to 31 December 2023 will be included. Outcomes of interest will be (1) 12-month postoperative poor pain outcome (defined as < 50% improvement of pain or < 3 absolute reduction in pain on a 11-point (0 to 10) numeric rating scale) and poor satisfaction outcome, and (2) early revision within 5 years after primary surgery. Prespecified predictor variables will include demographic characteristics, comorbidity score, patient-reported health status measures, and surgical variables. Measures of overall predictive accuracy, discrimination, and calibration will be used to compare predictive performance, and decision curve analysis performed to evaluate the clinical usefulness of models. The models will be internally validated using cross-validation and externally validated using geographical validation. Development of the models will be informed by the updated Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD + AI) statement.

Discussion: This study will develop, validate, and compare prediction models for postoperative patient-reported outcomes and risk for revision surgery after THA and TKA using SIRIS data.

背景:预测患者报告的全髋关节置换术(THA)或全膝关节置换术(TKA)后翻修手术的预后和风险可以为临床决策、卫生资源分配和护理计划提供信息。机器学习(ML)算法越来越多地被用作传统逻辑回归(LR)预测的替代方法,但它们在整体模型性能方面的优势尚不确定。本研究的目的是比较LR与不同ML方法在预测THA和TKA后患者翻修手术的预后和风险方面的预测性能。方法:一项基于人群的历史队列研究将使用常规收集的数据,这些数据来自在瑞士进行的所有初级和修订THA和TKA手术,并在瑞士国家联合登记处(SIRIS)注册。在2015年1月1日至2023年12月31日期间接受原发性骨关节炎手术的年龄≥18岁的患者将被纳入研究。(1)术后12个月的不良疼痛结局(定义为讨论:本研究将使用SIRIS数据开发、验证和比较术后患者报告的THA和TKA后翻修手术的结局和风险的预测模型。
{"title":"Predicting patient outcomes and risk for revision surgery after hip and knee replacement surgery: study protocol for a comparison of modelling approaches using the Swiss National Joint Registry (SIRIS).","authors":"Léonie Hofstetter, Nathalie Schweyckart, Christof Seiler, Christian Brand, Laura C Rosella, Mazda Farshad, Milo A Puhan, Cesar A Hincapié","doi":"10.1186/s41512-025-00200-z","DOIUrl":"10.1186/s41512-025-00200-z","url":null,"abstract":"<p><strong>Background: </strong>Prediction of postoperative patient-reported outcomes and risk for revision surgery after total hip arthroplasty (THA) or total knee arthroplasty (TKA) can inform clinical decision-making, health resource allocation, and care planning. Machine learning (ML) algorithms are increasingly used as an alternative to traditional logistic regression (LR) prediction, but there is uncertainty about their superiority in overall model performance. The aim of this study is to compare the predictive performance of LR with different ML approaches for predicting patient outcomes and risk for revision surgery after THA and TKA.</p><p><strong>Methods: </strong>A population-based historical cohort study will be developed using routinely collected data from all primary and revision THA and TKA procedures performed in Switzerland and registered in the Swiss National Joint Registry (SIRIS). Patients of age ≥ 18 years with surgery for primary osteoarthritis from 01 January 2015 up to 31 December 2023 will be included. Outcomes of interest will be (1) 12-month postoperative poor pain outcome (defined as < 50% improvement of pain or < 3 absolute reduction in pain on a 11-point (0 to 10) numeric rating scale) and poor satisfaction outcome, and (2) early revision within 5 years after primary surgery. Prespecified predictor variables will include demographic characteristics, comorbidity score, patient-reported health status measures, and surgical variables. Measures of overall predictive accuracy, discrimination, and calibration will be used to compare predictive performance, and decision curve analysis performed to evaluate the clinical usefulness of models. The models will be internally validated using cross-validation and externally validated using geographical validation. Development of the models will be informed by the updated Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD + AI) statement.</p><p><strong>Discussion: </strong>This study will develop, validate, and compare prediction models for postoperative patient-reported outcomes and risk for revision surgery after THA and TKA using SIRIS data.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"16"},"PeriodicalIF":2.6,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320300/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144777053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models. 使用电子健康记录对中心线相关血流感染进行静态和动态预测的建模方法比较(第2部分):随机森林模型。
Pub Date : 2025-07-21 DOI: 10.1186/s41512-025-00194-8
Elena Albu, Shan Gao, Pieter Stijnen, Frank E Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster

Objective: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of static and dynamic random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations.

Methods: We included data from 27,478 admissions to the University Hospitals Leuven, covering 30,862 catheter episodes (970 CLABSI, 1466 deaths and 28,426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. Static models used information at the onset of the catheter episode, while dynamic models updated predictions daily for 30 days (landmark 0-30). We evaluated model performance across 100 train/test splits.

Results: Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for predictions at catheter onset, rose to 0.77 for predictions at landmark 5, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models, choosing different variables for early splits in trees.

Discussion and conclusion: In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.

目的:与住院相关的预后结果通常不受审查,并且可以按类别或事件时间建模。竞争项目很常见,但往往被忽视。我们比较了静态和动态随机森林(RF)模型的性能,以预测使用不同结果操作的中心线相关血流感染(CLABSI)的风险。方法:我们纳入了来自鲁汶大学医院27,478名入院患者的数据,涵盖30,862次导管事件(970例CLABSI, 1466例死亡和28,426例出院),建立了二元(CLABSI vs无CLABSI)、多项(CLABSI,出院,死亡或无事件)、生存(CLABSI时间)和竞争风险(CLABSI时间,出院或死亡)结果的静态和动态RF模型,以预测7天CLABSI风险。静态模型使用导管事件开始时的信息,而动态模型在30天内每天更新预测(里程碑0-30)。我们在100个训练/测试分割中评估了模型的性能。结果:二元、多项和竞争风险模型的表现相似:在导管开始时预测AUROC为0.74,在里程碑5预测AUROC上升至0.77,此后下降。生存模型高估了CLABSI的风险(E:O比值在1.2 ~ 1.6之间),auroc比其他模型低0.01左右。二元和多项模型的计算时间最短。包含多个结果事件(多项风险和竞争风险)的模型与二元模型和生存模型相比,显示出不同的内部结构,为树的早期分裂选择不同的变量。讨论和结论:在没有审查的情况下,与我们研究的CLABSI预测的二元模型相比,复杂的建模选择并没有显着提高预测性能。应该避免在竞争事件发生时对其进行审查的生存模型。
{"title":"A comparison of modeling approaches for static and dynamic prediction of central line-associated bloodstream infections using electronic health records (part 2): random forest models.","authors":"Elena Albu, Shan Gao, Pieter Stijnen, Frank E Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster","doi":"10.1186/s41512-025-00194-8","DOIUrl":"10.1186/s41512-025-00194-8","url":null,"abstract":"<p><strong>Objective: </strong>Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of static and dynamic random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations.</p><p><strong>Methods: </strong>We included data from 27,478 admissions to the University Hospitals Leuven, covering 30,862 catheter episodes (970 CLABSI, 1466 deaths and 28,426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. Static models used information at the onset of the catheter episode, while dynamic models updated predictions daily for 30 days (landmark 0-30). We evaluated model performance across 100 train/test splits.</p><p><strong>Results: </strong>Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for predictions at catheter onset, rose to 0.77 for predictions at landmark 5, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models, choosing different variables for early splits in trees.</p><p><strong>Discussion and conclusion: </strong>In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278561/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144683722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of modeling approaches for static and dynamic prediction of central-line bloodstream infections using electronic health records (part 1): regression models. 使用电子健康记录对中心静脉血流感染进行静态和动态预测的建模方法比较(第1部分):回归模型。
Pub Date : 2025-07-21 DOI: 10.1186/s41512-025-00199-3
Shan Gao, Elena Albu, Hein Putter, Pieter Stijnen, Frank E Rademakers, Veerle Cossey, Yves Debaveye, Christel Janssens, Ben Van Calster, Laure Wynants

Background: Hospitals register information in the electronic health records (EHRs) continuously until discharge or death. As such, there is no censoring for in-hospital outcomes. We aimed to compare different static and dynamic regression modeling approaches to predict central line-associated bloodstream infections (CLABSIs) in EHR while accounting for competing events precluding CLABSI.

Methods: We analyzed data from 30,862 catheter episodes at University Hospitals Leuven from 2012 and 2013 to predict 7-day risk of CLABSI. Competing events are discharge and death. Static models using information at catheter onset included logistic, multinomial logistic, Cox, cause-specific hazard, and Fine-Gray regression. Dynamic models updated predictions daily up to 30 days after catheter onset (i.e., landmarks 0 to 30 days) and included landmark supermodel extensions of the static models, separate Fine-Gray models per landmark time, and regularized multi-task learning (RMTL). Model performance was assessed using 100 random 2:1 train-test splits.

Results: The Cox model performed worst of all static models in terms of area under the receiver operating characteristic curve (AUROC) and calibration. Dynamic landmark supermodels reached peak AUROCs between 0.741 and 0.747 at landmark 5. The Cox landmark supermodel had the worst AUROCs (≤ 0.731) and calibration up to landmark 7. Separate Fine-Gray models per landmark performed worst for later landmarks, when the number of patients at risk was low.

Conclusions: Categorical and time-to-event approaches had similar performance in the static and dynamic settings, except Cox models. Ignoring competing risks caused problems for risk prediction in the time-to-event framework (Cox), but not in the categorical framework (logistic regression).

背景:医院持续在电子健康记录(EHRs)中登记信息,直到出院或死亡。因此,对住院结果没有审查。我们的目的是比较不同的静态和动态回归建模方法来预测EHR中中心线相关血流感染(CLABSI),同时考虑排除CLABSI的竞争事件。方法:我们分析了2012年至2013年鲁汶大学医院30,862例导管发作的数据,以预测CLABSI的7天风险。竞争项目是放电和死亡。使用导管开始时信息的静态模型包括logistic、多项logistic、Cox、病因特异性风险和Fine-Gray回归。动态模型每天更新预测,直至导管开始后30天(即里程碑0至30天),并包括静态模型的里程碑超级模型扩展,每个里程碑时间单独的Fine-Gray模型和正则化多任务学习(RMTL)。使用100个随机2:1训练测试分割来评估模型性能。结果:Cox模型在受试者工作特征曲线下面积(AUROC)和校准方面是所有静态模型中表现最差的。动态地标超模的auroc峰值在0.741 - 0.747之间。Cox地标超模的auroc最差(≤0.731),校准至地标7。当风险患者数量较低时,每个路标单独的Fine-Gray模型在较晚的路标中表现最差。结论:除了Cox模型外,分类和事件时间方法在静态和动态环境下具有相似的性能。忽略竞争风险会在事件时间框架(Cox)中导致风险预测问题,但在分类框架(逻辑回归)中不会。
{"title":"A comparison of modeling approaches for static and dynamic prediction of central-line bloodstream infections using electronic health records (part 1): regression models.","authors":"Shan Gao, Elena Albu, Hein Putter, Pieter Stijnen, Frank E Rademakers, Veerle Cossey, Yves Debaveye, Christel Janssens, Ben Van Calster, Laure Wynants","doi":"10.1186/s41512-025-00199-3","DOIUrl":"10.1186/s41512-025-00199-3","url":null,"abstract":"<p><strong>Background: </strong>Hospitals register information in the electronic health records (EHRs) continuously until discharge or death. As such, there is no censoring for in-hospital outcomes. We aimed to compare different static and dynamic regression modeling approaches to predict central line-associated bloodstream infections (CLABSIs) in EHR while accounting for competing events precluding CLABSI.</p><p><strong>Methods: </strong>We analyzed data from 30,862 catheter episodes at University Hospitals Leuven from 2012 and 2013 to predict 7-day risk of CLABSI. Competing events are discharge and death. Static models using information at catheter onset included logistic, multinomial logistic, Cox, cause-specific hazard, and Fine-Gray regression. Dynamic models updated predictions daily up to 30 days after catheter onset (i.e., landmarks 0 to 30 days) and included landmark supermodel extensions of the static models, separate Fine-Gray models per landmark time, and regularized multi-task learning (RMTL). Model performance was assessed using 100 random 2:1 train-test splits.</p><p><strong>Results: </strong>The Cox model performed worst of all static models in terms of area under the receiver operating characteristic curve (AUROC) and calibration. Dynamic landmark supermodels reached peak AUROCs between 0.741 and 0.747 at landmark 5. The Cox landmark supermodel had the worst AUROCs (≤ 0.731) and calibration up to landmark 7. Separate Fine-Gray models per landmark performed worst for later landmarks, when the number of patients at risk was low.</p><p><strong>Conclusions: </strong>Categorical and time-to-event approaches had similar performance in the static and dynamic settings, except Cox models. Ignoring competing risks caused problems for risk prediction in the time-to-event framework (Cox), but not in the categorical framework (logistic regression).</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"20"},"PeriodicalIF":0.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278581/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144683723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk-part 1: binary outcomes. 对Fisher信息进行分解,以确定样本量,从而开发或更新公平、精确的个体风险临床预测模型——第一部分:二元结果。
Pub Date : 2025-07-08 DOI: 10.1186/s41512-025-00193-9
Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor

Background: When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.

Methods: We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.

Results: We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.

Conclusions: Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.

背景:当使用数据集开发或更新临床预测模型时,小样本量会增加对过拟合、不稳定、预测性能差和缺乏公平性的担忧。对于估计二元结果风险的模型,先前的研究概述了以低过拟合和精确的总体风险估计为目标的样本量计算。然而,需要更多的指导来精确和公平地评估个人层面的风险。方法:我们提出Fisher信息矩阵的分解,以帮助检查开发或更新模型所需的样本量,旨在精确和公平地估计个人层面的风险。我们概述了在数据收集之前或当现有数据集或试点研究可用时使用的五步过程。它要求研究人员指定目标人群的总体风险,模型中关键预测因子的(预期)分布和假设的“核心模型”,要么直接指定(即提供逻辑回归方程),要么基于指定的c统计量和(标准化)预测因子的相对效应。结果:我们产生了封闭形式的解决方案,将个体风险估计的方差分解为Fisher单位信息矩阵、预测值和总样本量。这使研究人员能够快速计算和检查个人水平预测的预期精度,并对指定的样本量进行分类。这些信息可提交给关键利益攸关方(如卫生专业人员、患者、赠款资助者),以告知未来数据收集的目标样本量或现有数据集是否足够。我们的建议在我们的新软件模块“稳定性”中实现。我们提供了两个真实的例子,并强调临床环境的重要性,包括决策和公平性检查的任何风险阈值。结论:我们的方法有助于研究人员在开发或更新二元结果的预测模型时,检查精确和公平的个人水平预测所需的潜在样本量。
{"title":"A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk-part 1: binary outcomes.","authors":"Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor","doi":"10.1186/s41512-025-00193-9","DOIUrl":"10.1186/s41512-025-00193-9","url":null,"abstract":"<p><strong>Background: </strong>When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.</p><p><strong>Methods: </strong>We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.</p><p><strong>Results: </strong>We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.</p><p><strong>Conclusions: </strong>Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235806/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144585768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and temporal evaluation of sex-specific models to predict 4-year atherosclerotic cardiovascular disease risk based on age and neighbourhood characteristics in South Limburg, the Netherlands. 在荷兰南林堡,基于年龄和邻里特征的预测4年动脉粥样硬化性心血管疾病风险的性别特异性模型的开发和时间评估。
Pub Date : 2025-07-02 DOI: 10.1186/s41512-025-00198-4
Anke Bruninx, Lianne Ippel, Rob Willems, Andre Dekker, Iñigo Bermejo

Background: To improve screening for atherosclerotic cardiovascular disease (ASCVD), we aimed to develop and temporally evaluate sex-specific models to predict 4-year ASCVD risk in South Limburg based on age and neighbourhood characteristics concerning home address.

Methods: We included 40- to 70-year-olds living in South Limburg on 1 January 2015 for model development, and 40- to 70-year-olds living in South Limburg on 1 January 2016 for model evaluation. We randomly sampled people selected in 1 year and in both years to create development and evaluation data sets. Follow-up of ASCVD and competing events (overall mortality excluding ASCVD) lasted until 31 December 2020. Candidate predictors were the individual's age, the neighbourhood's socio-economic status, and the neighbourhood's particulate matter concentration. Using the evaluation data sets, we compared two model types, subdistribution and cause-specific hazard models, and eight model structures. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration was assessed by calculating overall expected-observed ratios (E/O). For the final models, calibration plots were made additionally.

Results: The development data sets consisted of 67,549 males (4-year cumulative ASCVD incidence: 3.08%) and 67,947 females (4-year cumulative ASCVD incidence: 1.50%). The evaluation data sets consisted of 66,068 males (4-year cumulative ASCVD incidence: 3.22%) and 66,231 females (4-year cumulative ASCVD incidence: 1.49%). For males, the AUROC of the final model equalled 0.6548. The E/O equalled 0.9466. For females, the AUROC equalled 0.6744. The E/O equalled 0.9838.

Conclusions: The resulting model shows promise for further research. These models may be used for ASCVD screening in the future.

背景:为了提高动脉粥样硬化性心血管疾病(ASCVD)的筛查,我们旨在建立并暂时评估基于年龄和家庭住址的社区特征的性别特异性模型,以预测南林堡地区4年ASCVD风险。方法:我们纳入了2015年1月1日居住在南林堡的40至70岁的人进行模型开发,以及2016年1月1日居住在南林堡的40至70岁的人进行模型评估。我们在1年内随机抽取了一些人,并在这两年中创建了开发和评估数据集。ASCVD和竞争事件(不包括ASCVD的总死亡率)的随访持续到2020年12月31日。候选的预测指标是个人的年龄、邻居的社会经济地位和邻居的颗粒物浓度。利用评估数据集,我们比较了两种模型类型,亚分布和特定原因的危害模型,以及八种模型结构。用受试者工作特征曲线下面积(AUROC)评价鉴别性。通过计算总体期望观测比(E/O)来评估校准。对于最终的模型,还绘制了标定图。结果:发展数据集包括67,549名男性(4年累积ASCVD发病率:3.08%)和67,947名女性(4年累积ASCVD发病率:1.50%)。评估数据集包括66,068名男性(4年累积ASCVD发病率:3.22%)和66231名女性(4年累积ASCVD发病率:1.49%)。对于男性,最终模型的AUROC等于0.6548。E/O = 0.9466。雌性的AUROC为0.6744。E/O = 0.9838。结论:所得模型显示了进一步研究的前景。这些模型将来可能用于ASCVD筛查。
{"title":"Development and temporal evaluation of sex-specific models to predict 4-year atherosclerotic cardiovascular disease risk based on age and neighbourhood characteristics in South Limburg, the Netherlands.","authors":"Anke Bruninx, Lianne Ippel, Rob Willems, Andre Dekker, Iñigo Bermejo","doi":"10.1186/s41512-025-00198-4","DOIUrl":"10.1186/s41512-025-00198-4","url":null,"abstract":"<p><strong>Background: </strong>To improve screening for atherosclerotic cardiovascular disease (ASCVD), we aimed to develop and temporally evaluate sex-specific models to predict 4-year ASCVD risk in South Limburg based on age and neighbourhood characteristics concerning home address.</p><p><strong>Methods: </strong>We included 40- to 70-year-olds living in South Limburg on 1 January 2015 for model development, and 40- to 70-year-olds living in South Limburg on 1 January 2016 for model evaluation. We randomly sampled people selected in 1 year and in both years to create development and evaluation data sets. Follow-up of ASCVD and competing events (overall mortality excluding ASCVD) lasted until 31 December 2020. Candidate predictors were the individual's age, the neighbourhood's socio-economic status, and the neighbourhood's particulate matter concentration. Using the evaluation data sets, we compared two model types, subdistribution and cause-specific hazard models, and eight model structures. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration was assessed by calculating overall expected-observed ratios (E/O). For the final models, calibration plots were made additionally.</p><p><strong>Results: </strong>The development data sets consisted of 67,549 males (4-year cumulative ASCVD incidence: 3.08%) and 67,947 females (4-year cumulative ASCVD incidence: 1.50%). The evaluation data sets consisted of 66,068 males (4-year cumulative ASCVD incidence: 3.22%) and 66,231 females (4-year cumulative ASCVD incidence: 1.49%). For males, the AUROC of the final model equalled 0.6548. The E/O equalled 0.9466. For females, the AUROC equalled 0.6744. The E/O equalled 0.9838.</p><p><strong>Conclusions: </strong>The resulting model shows promise for further research. These models may be used for ASCVD screening in the future.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ischemic modified albumin and thiol levels in Coronavirus disease 19: a systematic review and meta-analysis. 冠状病毒病缺血性修饰白蛋白和硫醇水平19:系统综述和荟萃分析
Pub Date : 2025-06-23 DOI: 10.1186/s41512-025-00196-6
Asma Mousavi, Shayan Shojaei, Peyvand Parhizkar, Razman Arabzadeh Bahri, Sanam Alilou, Hanieh Radkhah

Background: The COVID-19 pandemic has imposed a significant global health burden. Identifying prognostic markers for COVID-19 and its severity could contribute to improved patient outcomes by reducing morbidity and mortality. This systematic review and meta-analysis aimed to evaluate the relationship between ischemic-modified albumin (IMA) and thiol levels, both indicators of oxidative stress, in patients diagnosed with COVID-19.

Method: We conducted a comprehensive search across PubMed, Scopus, Embase, and Web of Science for eligible original studies. The study assessed IMA and thiol levels in COVID-19 patients, examining their association with both disease severity and mortality. A random effect analysis was conducted to estimate the standardized mean difference (SMD) and confidence intervals (CI).

Results: Sixteen studies comprising 2010 COVID-19 patients and 982 controls were included. A diagnosis of COVID-19 was associated with significantly elevated IMA levels (Hedges's g = 1.02, 95% CI: 0.45 to 1.60) and reduced total thiol levels (Hedges's g = -1.08, 95% CI: -2.10 to -0.07). However, native thiol levels did not reveal a significant difference between infected patients and healthy participants. Subgroup analysis showed significantly lower total thiol levels in patients with critical and severe COVID-19, as well as lower native thiol levels specifically in critical COVID-19 patients. IMA levels were significantly higher across the critical, severe, and moderate COVID-19 groups.

Conclusion: Elevated IMA and reduced thiol levels may serve as novel markers for predicting COVID-19 severity and prognosis. Further research is needed to explore therapeutic interventions that target oxidative imbalance in COVID-19 patients.

背景:2019冠状病毒病大流行造成了重大的全球卫生负担。确定COVID-19及其严重程度的预后标志物可以通过降低发病率和死亡率来改善患者预后。本系统综述和荟萃分析旨在评估COVID-19患者缺血修饰白蛋白(IMA)和硫醇水平之间的关系,这两个指标都是氧化应激的指标。方法:我们在PubMed, Scopus, Embase和Web of Science上进行了全面的搜索,以获得符合条件的原始研究。该研究评估了COVID-19患者的IMA和硫醇水平,研究了它们与疾病严重程度和死亡率的关系。采用随机效应分析估计标准化平均差(SMD)和置信区间(CI)。结果:纳入了16项研究,包括2010例COVID-19患者和982例对照。诊断为COVID-19与IMA水平显著升高(Hedges's g = 1.02, 95% CI: 0.45至1.60)和总硫醇水平降低(Hedges's g = -1.08, 95% CI: -2.10至-0.07)相关。然而,天然硫醇水平并没有显示感染患者和健康参与者之间的显著差异。亚组分析显示,COVID-19危重症和重症患者的总硫醇水平明显降低,重症患者的天然硫醇水平也明显降低。在COVID-19危重、严重和中度组中,IMA水平显著较高。结论:IMA升高和硫醇降低可作为预测COVID-19严重程度和预后的新指标。需要进一步研究探索针对COVID-19患者氧化失衡的治疗干预措施。
{"title":"Ischemic modified albumin and thiol levels in Coronavirus disease 19: a systematic review and meta-analysis.","authors":"Asma Mousavi, Shayan Shojaei, Peyvand Parhizkar, Razman Arabzadeh Bahri, Sanam Alilou, Hanieh Radkhah","doi":"10.1186/s41512-025-00196-6","DOIUrl":"10.1186/s41512-025-00196-6","url":null,"abstract":"<p><strong>Background: </strong>The COVID-19 pandemic has imposed a significant global health burden. Identifying prognostic markers for COVID-19 and its severity could contribute to improved patient outcomes by reducing morbidity and mortality. This systematic review and meta-analysis aimed to evaluate the relationship between ischemic-modified albumin (IMA) and thiol levels, both indicators of oxidative stress, in patients diagnosed with COVID-19.</p><p><strong>Method: </strong>We conducted a comprehensive search across PubMed, Scopus, Embase, and Web of Science for eligible original studies. The study assessed IMA and thiol levels in COVID-19 patients, examining their association with both disease severity and mortality. A random effect analysis was conducted to estimate the standardized mean difference (SMD) and confidence intervals (CI).</p><p><strong>Results: </strong>Sixteen studies comprising 2010 COVID-19 patients and 982 controls were included. A diagnosis of COVID-19 was associated with significantly elevated IMA levels (Hedges's g = 1.02, 95% CI: 0.45 to 1.60) and reduced total thiol levels (Hedges's g = -1.08, 95% CI: -2.10 to -0.07). However, native thiol levels did not reveal a significant difference between infected patients and healthy participants. Subgroup analysis showed significantly lower total thiol levels in patients with critical and severe COVID-19, as well as lower native thiol levels specifically in critical COVID-19 patients. IMA levels were significantly higher across the critical, severe, and moderate COVID-19 groups.</p><p><strong>Conclusion: </strong>Elevated IMA and reduced thiol levels may serve as novel markers for predicting COVID-19 severity and prognosis. Further research is needed to explore therapeutic interventions that target oxidative imbalance in COVID-19 patients.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12183906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expert panel as reference standard procedure in diagnostic accuracy studies: a systematic scoping review and methodological guidance. 作为诊断准确性研究参考标准程序的专家小组:系统的范围审查和方法指导。
Pub Date : 2025-05-13 DOI: 10.1186/s41512-025-00195-7
Bas E Kellerhuis, Kevin Jenniskens, Mike P T Kusters, Ewoud Schuit, Lotty Hooft, Karel G M Moons, Johannes B Reitsma

Background: In diagnostic accuracy studies, when no reference standard test is available, a group of experts, combined in an expert panel, is often used to assess the presence of the target condition using multiple relevant pieces of patient information. Based on the expert panel's judgment, the accuracy of a test or model can be determined. Methodological choices in design and analysis of the expert panel procedure have been shown to vary considerably between studies as well as the quality of reporting. This review maps the current landscape of expert panels used as reference standard in diagnostic accuracy or model studies.

Methods: PubMed was systematically searched for eligible studies published between June 1, 2012, and October 1, 2022. Data extraction was performed by one author and, in cases of doubt, checked by another author. Study characteristics, expert panel characteristics, and expert panel methodology were extracted. Articles were included if the diagnostic accuracy of an index test or diagnostic model was assessed using an expert panel as reference standard and the study was reported in English, Dutch, or German.

Results: After initial identification of 4078 studies, 318 were included for data extraction. Expert panels were used across numerous medical domains, of which oncology was the most common (20%). The number of experts judging the presence of the target condition in each patient was 2 or fewer in 29%, 3 or 4 in 55%, and 5 or more in 16% of the 318 studies. Expert panel types used were an independent panel (i.e., each expert returns a judgement without conferring with other experts in the panel) in 33% of studies, a panel using a consensus method (i.e., each case was discussed by the expert panel) in 27%, a staged (i.e., each expert independently returns a judgement and discordant cases were discussed in a consensus meeting) target condition assessment approach in 11%, and a tiebreaker (i.e., each expert independently returns a judgement and discordant cases were assessed by another expert) in 8%. The exact expert panel decision approach was unclear or not reported in 21% of studies. In 5% of studies, information about remaining uncertainty in experts about the target condition presence or absence was collected for each participant.

Conclusions: There is large heterogeneity in the composition of expert panels and the way that expert panels are used as reference standard in diagnostic research. Key methodological characteristics of expert panels are frequently not reported, making it difficult to replicate or reproduce results, and potentially masking biasing factors. There is a clear need for more guidance on how to perform an expert panel procedure and specific extensions of the STARD and TRIPOD reporting guidelines when using an expert panel.

背景:在诊断准确性研究中,当没有可用的参考标准测试时,一组专家组成一个专家小组,通常使用多个相关的患者信息片段来评估目标条件的存在。根据专家小组的判断,可以确定测试或模型的准确性。专家小组程序的设计和分析的方法选择已显示在不同的研究之间以及报告的质量有很大的不同。这篇综述描绘了目前在诊断准确性或模型研究中作为参考标准的专家小组的现状。方法:系统检索2012年6月1日至2022年10月1日期间发表的符合条件的研究。数据提取由一名作者进行,如有疑问,由另一名作者进行检查。提取研究特征、专家小组特征和专家小组方法。如果使用专家小组作为参考标准评估指标测试或诊断模型的诊断准确性,并且研究以英语、荷兰语或德语报道,则纳入文章。结果:初步鉴定4078项研究后,纳入318项研究进行数据提取。专家小组被用于许多医学领域,其中肿瘤学是最常见的(20%)。在318项研究中,判断每位患者是否存在目标疾病的专家人数为2人或以下(29%),3人或4人(55%),5人或以上(16%)。在33%的研究中,使用的专家小组类型是独立小组(即每个专家在不与小组中的其他专家协商的情况下返回判断),使用共识方法的小组(即每个案例都由专家小组讨论)占27%,分阶段(即每个专家独立返回判断,不一致的案例在共识会议中讨论)目标条件评估方法占11%,以及决定论(即:每位专家独立地给出一个判断,不一致的情况由另一位专家评估,占8%。在21%的研究中,确切的专家小组决策方法不清楚或没有报道。在5%的研究中,为每个参与者收集了专家对目标条件存在或不存在的剩余不确定性信息。结论:在诊断研究中,专家小组的组成和专家小组作为参考标准的方式存在较大的异质性。专家小组的关键方法特征往往没有报告,这使得很难复制或再现结果,并可能掩盖偏见因素。显然需要更多关于如何执行专家小组程序的指导,以及在使用专家小组时对STARD和TRIPOD报告指南的具体扩展。
{"title":"Expert panel as reference standard procedure in diagnostic accuracy studies: a systematic scoping review and methodological guidance.","authors":"Bas E Kellerhuis, Kevin Jenniskens, Mike P T Kusters, Ewoud Schuit, Lotty Hooft, Karel G M Moons, Johannes B Reitsma","doi":"10.1186/s41512-025-00195-7","DOIUrl":"10.1186/s41512-025-00195-7","url":null,"abstract":"<p><strong>Background: </strong>In diagnostic accuracy studies, when no reference standard test is available, a group of experts, combined in an expert panel, is often used to assess the presence of the target condition using multiple relevant pieces of patient information. Based on the expert panel's judgment, the accuracy of a test or model can be determined. Methodological choices in design and analysis of the expert panel procedure have been shown to vary considerably between studies as well as the quality of reporting. This review maps the current landscape of expert panels used as reference standard in diagnostic accuracy or model studies.</p><p><strong>Methods: </strong>PubMed was systematically searched for eligible studies published between June 1, 2012, and October 1, 2022. Data extraction was performed by one author and, in cases of doubt, checked by another author. Study characteristics, expert panel characteristics, and expert panel methodology were extracted. Articles were included if the diagnostic accuracy of an index test or diagnostic model was assessed using an expert panel as reference standard and the study was reported in English, Dutch, or German.</p><p><strong>Results: </strong>After initial identification of 4078 studies, 318 were included for data extraction. Expert panels were used across numerous medical domains, of which oncology was the most common (20%). The number of experts judging the presence of the target condition in each patient was 2 or fewer in 29%, 3 or 4 in 55%, and 5 or more in 16% of the 318 studies. Expert panel types used were an independent panel (i.e., each expert returns a judgement without conferring with other experts in the panel) in 33% of studies, a panel using a consensus method (i.e., each case was discussed by the expert panel) in 27%, a staged (i.e., each expert independently returns a judgement and discordant cases were discussed in a consensus meeting) target condition assessment approach in 11%, and a tiebreaker (i.e., each expert independently returns a judgement and discordant cases were assessed by another expert) in 8%. The exact expert panel decision approach was unclear or not reported in 21% of studies. In 5% of studies, information about remaining uncertainty in experts about the target condition presence or absence was collected for each participant.</p><p><strong>Conclusions: </strong>There is large heterogeneity in the composition of expert panels and the way that expert panels are used as reference standard in diagnostic research. Key methodological characteristics of expert panels are frequently not reported, making it difficult to replicate or reproduce results, and potentially masking biasing factors. There is a clear need for more guidance on how to perform an expert panel procedure and specific extensions of the STARD and TRIPOD reporting guidelines when using an expert panel.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scoping review of machine learning models to predict risk of falls in elders, without using sensor data. 在不使用传感器数据的情况下,对预测老年人跌倒风险的机器学习模型进行范围审查。
Pub Date : 2025-05-06 DOI: 10.1186/s41512-025-00190-y
Angelo Capodici, Claudio Fanconi, Catherine Curtin, Alessandro Shapiro, Francesca Noci, Alberto Giannoni, Tina Hernandez-Boussard

Objectives: This scoping review assesses machine learning (ML) tools that predicted falls, relying on information in health records without using any sensor data. The aim was to assess the available evidence on innovative techniques to improve fall prevention management.

Methods: Studies were included if they focused on predicting fall risk with machine learning in elderly populations and were written in English. There were 13 different extracted variables, including population characteristics (community dwelling, inpatients, age range, main pathology, ethnicity/race). Furthermore, the number of variables used in the final models, as well as their type, was extracted.

Results: A total of 6331 studies were retrieved, and 19 articles met criteria for data extraction. Metric performances reported by authors were commonly high in terms of accuracy (e.g., greater than 0.70). The most represented features included cardiovascular status and mobility assessments. Common gaps identified included a lack of transparent reporting and insufficient fairness assessments.

Conclusions: This review provides evidence that falls can be predicted using ML without using sensors if the amount of data and its quality is adequate. However, further studies are needed to validate these models in diverse groups and populations.

目的:本综述评估了机器学习(ML)工具预测跌倒,依赖于健康记录中的信息,而不使用任何传感器数据。目的是评估现有证据的创新技术,以改善跌倒预防管理。方法:如果研究重点是用机器学习预测老年人跌倒风险,并以英语撰写,则纳入研究。共有13个不同的提取变量,包括人口特征(社区居住、住院患者、年龄范围、主要病理、民族/种族)。此外,还提取了最终模型中使用的变量数量及其类型。结果:共检索到6331篇研究,符合数据提取标准的文献有19篇。作者报告的度量性能通常在准确性方面很高(例如,大于0.70)。最具代表性的特征包括心血管状况和活动能力评估。确定的常见差距包括缺乏透明的报告和不充分的公平评估。结论:本综述提供的证据表明,如果数据量和质量足够,可以使用ML预测跌倒而不使用传感器。然而,需要进一步的研究来验证这些模型在不同群体和人群中的有效性。
{"title":"A scoping review of machine learning models to predict risk of falls in elders, without using sensor data.","authors":"Angelo Capodici, Claudio Fanconi, Catherine Curtin, Alessandro Shapiro, Francesca Noci, Alberto Giannoni, Tina Hernandez-Boussard","doi":"10.1186/s41512-025-00190-y","DOIUrl":"https://doi.org/10.1186/s41512-025-00190-y","url":null,"abstract":"<p><strong>Objectives: </strong>This scoping review assesses machine learning (ML) tools that predicted falls, relying on information in health records without using any sensor data. The aim was to assess the available evidence on innovative techniques to improve fall prevention management.</p><p><strong>Methods: </strong>Studies were included if they focused on predicting fall risk with machine learning in elderly populations and were written in English. There were 13 different extracted variables, including population characteristics (community dwelling, inpatients, age range, main pathology, ethnicity/race). Furthermore, the number of variables used in the final models, as well as their type, was extracted.</p><p><strong>Results: </strong>A total of 6331 studies were retrieved, and 19 articles met criteria for data extraction. Metric performances reported by authors were commonly high in terms of accuracy (e.g., greater than 0.70). The most represented features included cardiovascular status and mobility assessments. Common gaps identified included a lack of transparent reporting and insufficient fairness assessments.</p><p><strong>Conclusions: </strong>This review provides evidence that falls can be predicted using ML without using sensors if the amount of data and its quality is adequate. However, further studies are needed to validate these models in diverse groups and populations.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054167/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144013018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can we develop real-world prognostic models using observational healthcare data? Large-scale experiment to investigate model sensitivity to database and phenotypes. 我们能否利用观察性医疗数据开发现实世界的预后模型?大规模实验研究模型对数据库和表型的敏感性。
Pub Date : 2025-04-17 DOI: 10.1186/s41512-025-00191-x
Jenna M Reps, Peter R Rijnbeek, Patrick B Ryan

Background: Large observational healthcare databases are frequently used to develop models to be implemented in real-world clinical practice populations. For example, these databases were used to develop COVID severity models that guided interventions such as who to prioritize vaccinating during the pandemic. However, the clinical setting and observational databases often differ in the types of patients (case mix), and it is a nontrivial process to identify patients with medical conditions (phenotyping) in these databases. In this study, we investigate how sensitive a model's performance is to the choice of development database, population, and outcome phenotype.

Methods: We developed > 450 different logistic regression models for nine prediction tasks across seven databases with a range of suitable population and outcome phenotypes. Performance stability within tasks was calculated by applying each model to data created by permuting the database, population, or outcome phenotype. We investigate performance (AUROC, scaled Brier, and calibration-in-the-large) stability and individual risk estimate stability.

Results: In general, changing the outcome definitions or population phenotype made little impact on the model validation discrimination. However, validation discrimination was unstable when the database changed. Calibration and Brier performance were unstable when the population, outcome definition, or database changed. This may be problematic if a model developed using observational data is implemented in a real-world setting.

Conclusions: These results highlight the importance of validating a model developed using observational data in the clinical setting prior to using it for decision-making. Calibration and Brier score should be evaluated to prevent miscalibrated risk estimates being used to aid clinical decisions.

背景:大型观察性医疗数据库经常用于开发模型,以便在现实世界的临床实践人群中实施。例如,这些数据库用于开发COVID严重程度模型,指导世卫组织等干预措施在大流行期间优先接种疫苗。然而,临床环境和观察数据库通常在患者类型(病例组合)上有所不同,在这些数据库中识别具有医疗条件的患者(表型)是一个重要的过程。在本研究中,我们研究了模型的性能对发展数据库、人口和结果表型的选择有多敏感。方法:我们在7个数据库中为9个预测任务开发了bbbb450种不同的逻辑回归模型,这些模型具有一系列合适的人群和结果表型。通过将每个模型应用于通过排列数据库、人口或结果表型创建的数据来计算任务内的性能稳定性。我们研究了性能(AUROC、标度Brier和校准)稳定性和个体风险估计稳定性。结果:总体而言,改变结局定义或群体表型对模型验证判别影响不大。然而,当数据库发生变化时,验证判别是不稳定的。当人群、结果定义或数据库发生变化时,校准和Brier性能不稳定。如果利用观测数据开发的模型在现实环境中实施,这可能会产生问题。结论:这些结果强调了在将观察数据用于决策之前,在临床环境中验证模型的重要性。应评估校准和Brier评分,以防止使用错误校准的风险估计来辅助临床决策。
{"title":"Can we develop real-world prognostic models using observational healthcare data? Large-scale experiment to investigate model sensitivity to database and phenotypes.","authors":"Jenna M Reps, Peter R Rijnbeek, Patrick B Ryan","doi":"10.1186/s41512-025-00191-x","DOIUrl":"https://doi.org/10.1186/s41512-025-00191-x","url":null,"abstract":"<p><strong>Background: </strong>Large observational healthcare databases are frequently used to develop models to be implemented in real-world clinical practice populations. For example, these databases were used to develop COVID severity models that guided interventions such as who to prioritize vaccinating during the pandemic. However, the clinical setting and observational databases often differ in the types of patients (case mix), and it is a nontrivial process to identify patients with medical conditions (phenotyping) in these databases. In this study, we investigate how sensitive a model's performance is to the choice of development database, population, and outcome phenotype.</p><p><strong>Methods: </strong>We developed > 450 different logistic regression models for nine prediction tasks across seven databases with a range of suitable population and outcome phenotypes. Performance stability within tasks was calculated by applying each model to data created by permuting the database, population, or outcome phenotype. We investigate performance (AUROC, scaled Brier, and calibration-in-the-large) stability and individual risk estimate stability.</p><p><strong>Results: </strong>In general, changing the outcome definitions or population phenotype made little impact on the model validation discrimination. However, validation discrimination was unstable when the database changed. Calibration and Brier performance were unstable when the population, outcome definition, or database changed. This may be problematic if a model developed using observational data is implemented in a real-world setting.</p><p><strong>Conclusions: </strong>These results highlight the importance of validating a model developed using observational data in the clinical setting prior to using it for decision-making. Calibration and Brier score should be evaluated to prevent miscalibrated risk estimates being used to aid clinical decisions.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Diagnostic and prognostic research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1