首页 > 最新文献

Diagnostic and prognostic research最新文献

英文 中文
Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. 压力损伤发生的风险预测工具:系统评价报告模型开发和验证方法的总括性回顾。
Pub Date : 2025-01-14 DOI: 10.1186/s41512-024-00182-4
Bethany Hillier, Katie Scandrett, April Coombe, Tina Hernandez-Boussard, Ewout Steyerberg, Yemisi Takwoingi, Vladica Velickovic, Jacqueline Dinnes

Background: Pressure injuries (PIs) place a substantial burden on healthcare systems worldwide. Risk stratification of those who are at risk of developing PIs allows preventive interventions to be focused on patients who are at the highest risk. The considerable number of risk assessment scales and prediction models available underscores the need for a thorough evaluation of their development, validation, and clinical utility. Our objectives were to identify and describe available risk prediction tools for PI occurrence, their content and the development and validation methods used.

Methods: The umbrella review was conducted according to Cochrane guidance. MEDLINE, Embase, CINAHL, EPISTEMONIKOS, Google Scholar, and reference lists were searched to identify relevant systematic reviews. The risk of bias was assessed using adapted AMSTAR-2 criteria. Results were described narratively. All included reviews contributed to building a comprehensive list of risk prediction tools.

Results: We identified 32 eligible systematic reviews only seven of which described the development and validation of risk prediction tools for PI. Nineteen reviews assessed the prognostic accuracy of the tools and 11 assessed clinical effectiveness. Of the seven reviews reporting model development and validation, six included only machine learning models. Two reviews included external validations of models, although only one review reported any details on external validation methods or results. This was also the only review to report measures of both discrimination and calibration. Five reviews presented measures of discrimination, such as the area under the curve (AUC), sensitivities, specificities, F1 scores, and G-means. For the four reviews that assessed the risk of bias assessment using the PROBAST tool, all models but one were found to be at high or unclear risk of bias.

Conclusions: Available tools do not meet current standards for the development or reporting of risk prediction models. The majority of tools have not been externally validated. Standardised and rigorous approaches to risk prediction model development and validation are needed.

Trial registration: The protocol was registered on the Open Science Framework ( https://osf.io/tepyk ).

背景:压力性损伤(PIs)给世界各地的医疗保健系统带来了巨大的负担。对那些有患pi风险的人进行风险分层,可以将预防干预措施的重点放在风险最高的患者身上。大量可用的风险评估量表和预测模型强调了对其开发、验证和临床应用进行彻底评估的必要性。我们的目标是识别和描述PI发生的可用风险预测工具,它们的内容以及所使用的开发和验证方法。方法:根据Cochrane指南进行总括性综述。检索MEDLINE、Embase、CINAHL、EPISTEMONIKOS、谷歌Scholar和参考文献列表,以确定相关的系统综述。采用AMSTAR-2标准评估偏倚风险。对结果进行叙述。所有纳入的审查都有助于建立一个全面的风险预测工具列表。结果:我们确定了32个符合条件的系统评价,其中只有7个描述了PI风险预测工具的开发和验证。19篇综述评估了这些工具的预后准确性,11篇综述评估了临床有效性。在报告模型开发和验证的七篇综述中,有六篇仅包括机器学习模型。两个综述包括模型的外部验证,尽管只有一个综述报告了外部验证方法或结果的任何细节。这也是唯一一篇报告辨别和校准措施的综述。五篇综述提出了歧视的测量方法,如曲线下面积(AUC)、敏感性、特异性、F1分数和g均值。对于使用PROBAST工具评估偏倚风险的四篇综述,除一篇外,所有模型都被发现具有较高或不明确的偏倚风险。结论:现有的工具不符合当前风险预测模型开发或报告的标准。大多数工具还没有经过外部验证。需要标准化和严格的方法来开发和验证风险预测模型。试验注册:该方案已在开放科学框架(https://osf.io/tepyk)上注册。
{"title":"Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods.","authors":"Bethany Hillier, Katie Scandrett, April Coombe, Tina Hernandez-Boussard, Ewout Steyerberg, Yemisi Takwoingi, Vladica Velickovic, Jacqueline Dinnes","doi":"10.1186/s41512-024-00182-4","DOIUrl":"10.1186/s41512-024-00182-4","url":null,"abstract":"<p><strong>Background: </strong>Pressure injuries (PIs) place a substantial burden on healthcare systems worldwide. Risk stratification of those who are at risk of developing PIs allows preventive interventions to be focused on patients who are at the highest risk. The considerable number of risk assessment scales and prediction models available underscores the need for a thorough evaluation of their development, validation, and clinical utility. Our objectives were to identify and describe available risk prediction tools for PI occurrence, their content and the development and validation methods used.</p><p><strong>Methods: </strong>The umbrella review was conducted according to Cochrane guidance. MEDLINE, Embase, CINAHL, EPISTEMONIKOS, Google Scholar, and reference lists were searched to identify relevant systematic reviews. The risk of bias was assessed using adapted AMSTAR-2 criteria. Results were described narratively. All included reviews contributed to building a comprehensive list of risk prediction tools.</p><p><strong>Results: </strong>We identified 32 eligible systematic reviews only seven of which described the development and validation of risk prediction tools for PI. Nineteen reviews assessed the prognostic accuracy of the tools and 11 assessed clinical effectiveness. Of the seven reviews reporting model development and validation, six included only machine learning models. Two reviews included external validations of models, although only one review reported any details on external validation methods or results. This was also the only review to report measures of both discrimination and calibration. Five reviews presented measures of discrimination, such as the area under the curve (AUC), sensitivities, specificities, F1 scores, and G-means. For the four reviews that assessed the risk of bias assessment using the PROBAST tool, all models but one were found to be at high or unclear risk of bias.</p><p><strong>Conclusions: </strong>Available tools do not meet current standards for the development or reporting of risk prediction models. The majority of tools have not been externally validated. Standardised and rigorous approaches to risk prediction model development and validation are needed.</p><p><strong>Trial registration: </strong>The protocol was registered on the Open Science Framework ( https://osf.io/tepyk ).</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"2"},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142980868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rehabilitation outcomes after comprehensive post-acute inpatient rehabilitation following moderate to severe acquired brain injury-study protocol for an overall prognosis study based on routinely collected health data. 中度至重度获得性脑损伤急性住院后全面康复的康复结果——基于常规收集的健康数据的总体预后研究方案
Pub Date : 2025-01-07 DOI: 10.1186/s41512-024-00183-3
Uwe M Pommerich, Peter W Stubbs, Jørgen Feldbæk Nielsen

Background: The initial theme of the PROGRESS framework for prognosis research is termed overall prognosis research. Its aim is to describe the most likely course of health conditions in the context of current care. These average group-level prognoses may be used to inform patients, health policies, trial designs, or further prognosis research. Acquired brain injury, such as stroke, traumatic brain injury or encephalopathy, is a major cause of disability and functional limitations, worldwide. Rehabilitation aims to maximize independent functioning and meaningful participation in society post-injury. While some observational studies can allow for an inference of the overall prognosis of the level of independent functioning, the context for the provision of rehabilitation is rarely described. The aim of this protocol is to provide a detailed account of the clinical context to aid the interpretation of our upcoming overall prognosis study.

Methods: The study will occur at a Danish post-acute inpatient rehabilitation facility providing specialised inpatient rehabilitation for individuals with moderate to severe acquired brain injury. Routinely collected electronic health data will be extracted from the healthcare provider's database and deterministically linked on an individual level to construct the study cohort. The study period spans from March 2011 to December 2022. Four outcomes will measure the level of functioning. Rehabilitation needs will also be described. Outcomes and rehabilitation needs will be described for the entire cohort, across rehabilitation complexity levels and stratified for relevant demographic and clinical parameters. Descriptive statistics will be used to estimate average prognoses for the level of functioning at discharge from post-acute rehabilitation. The patterns of missing data will be investigated.

Discussion: This protocol is intended to provide transparency in our upcoming study based on routinely collected clinical data. It will aid in the interpretation of the overall prognosis estimates within the context of our current clinical practice and the assessment of potential sources of bias independently.

背景:预后研究进展框架的最初主题被称为整体预后研究。其目的是描述在当前护理情况下最可能出现的健康状况。这些平均组水平的预后可用于告知患者、卫生政策、试验设计或进一步的预后研究。获得性脑损伤,如中风、创伤性脑损伤或脑病,是全世界残疾和功能限制的主要原因。康复旨在最大限度地提高受伤后的独立功能和有意义的社会参与。虽然一些观察性研究可以推断独立功能水平的总体预后,但提供康复的背景很少被描述。本协议的目的是提供临床背景的详细描述,以帮助解释我们即将进行的整体预后研究。方法:该研究将在丹麦的急性住院康复机构进行,该机构为中度至重度获得性脑损伤患者提供专门的住院康复。常规收集的电子健康数据将从医疗保健提供者的数据库中提取,并在个体水平上确定地联系起来,以构建研究队列。研究时间为2011年3月至2022年12月。四个结果将衡量功能水平。还将描述康复需要。将描述整个队列的结果和康复需求,跨越康复复杂性水平,并根据相关的人口统计学和临床参数分层。描述性统计将用于估计急性康复后出院时功能水平的平均预后。缺失数据的模式将被调查。讨论:本方案旨在为我们即将开展的基于常规收集的临床数据的研究提供透明度。这将有助于在我们当前临床实践的背景下解释总体预后估计,并独立评估潜在的偏倚来源。
{"title":"Rehabilitation outcomes after comprehensive post-acute inpatient rehabilitation following moderate to severe acquired brain injury-study protocol for an overall prognosis study based on routinely collected health data.","authors":"Uwe M Pommerich, Peter W Stubbs, Jørgen Feldbæk Nielsen","doi":"10.1186/s41512-024-00183-3","DOIUrl":"https://doi.org/10.1186/s41512-024-00183-3","url":null,"abstract":"<p><strong>Background: </strong>The initial theme of the PROGRESS framework for prognosis research is termed overall prognosis research. Its aim is to describe the most likely course of health conditions in the context of current care. These average group-level prognoses may be used to inform patients, health policies, trial designs, or further prognosis research. Acquired brain injury, such as stroke, traumatic brain injury or encephalopathy, is a major cause of disability and functional limitations, worldwide. Rehabilitation aims to maximize independent functioning and meaningful participation in society post-injury. While some observational studies can allow for an inference of the overall prognosis of the level of independent functioning, the context for the provision of rehabilitation is rarely described. The aim of this protocol is to provide a detailed account of the clinical context to aid the interpretation of our upcoming overall prognosis study.</p><p><strong>Methods: </strong>The study will occur at a Danish post-acute inpatient rehabilitation facility providing specialised inpatient rehabilitation for individuals with moderate to severe acquired brain injury. Routinely collected electronic health data will be extracted from the healthcare provider's database and deterministically linked on an individual level to construct the study cohort. The study period spans from March 2011 to December 2022. Four outcomes will measure the level of functioning. Rehabilitation needs will also be described. Outcomes and rehabilitation needs will be described for the entire cohort, across rehabilitation complexity levels and stratified for relevant demographic and clinical parameters. Descriptive statistics will be used to estimate average prognoses for the level of functioning at discharge from post-acute rehabilitation. The patterns of missing data will be investigated.</p><p><strong>Discussion: </strong>This protocol is intended to provide transparency in our upcoming study based on routinely collected clinical data. It will aid in the interpretation of the overall prognosis estimates within the context of our current clinical practice and the assessment of potential sources of bias independently.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706155/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142959579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis. 低收入和中等收入国家预测COVID-19患者死亡率或ICU入院的预后模型的验证:一项全球个体参与者数据荟萃分析
Pub Date : 2024-12-19 DOI: 10.1186/s41512-024-00181-5
Johanna A A Damen, Banafsheh Arshi, Maarten van Smeden, Silvia Bertagnolio, Janet V Diaz, Ronaldo Silva, Soe Soe Thwin, Laure Wynants, Karel G M Moons

Background: We evaluated the performance of prognostic models for predicting mortality or ICU admission in hospitalized patients with COVID-19 in the World Health Organization (WHO) Global Clinical Platform, a repository of individual-level clinical data of patients hospitalized with COVID-19, including in low- and middle-income countries (LMICs).

Methods: We identified eligible multivariable prognostic models for predicting overall mortality and ICU admission during hospital stay in patients with confirmed or suspected COVID-19 from a living review of COVID-19 prediction models. These models were evaluated using data contributed to the WHO Global Clinical Platform for COVID-19 from nine LMICs (Burkina Faso, Cameroon, Democratic Republic of Congo, Guinea, India, Niger, Nigeria, Zambia, and Zimbabwe). Model performance was assessed in terms of discrimination and calibration.

Results: Out of 144 eligible models, 140 were excluded due to a high risk of bias, predictors unavailable in LIMCs, or insufficient model description. Among 11,338 participants, the remaining models showed good discrimination for predicting in-hospital mortality (3 models), with areas under the curve (AUCs) ranging between 0.76 (95% CI 0.71-0.81) and 0.84 (95% CI 0.77-0.89). An AUC of 0.74 (95% CI 0.70-0.78) was found for predicting ICU admission risk (one model). All models showed signs of miscalibration and overfitting, with extensive heterogeneity between countries.

Conclusions: Among the available COVID-19 prognostic models, only a few could be validated on data collected from LMICs, mainly due to limited predictor availability. Despite their discriminative ability, selected models for mortality prediction or ICU admission showed varying and suboptimal calibration.

背景:我们在世界卫生组织(WHO)全球临床平台中评估了预测COVID-19住院患者死亡率或ICU入院率的预后模型的性能,该平台是包括低收入和中等收入国家(LMICs)在内的COVID-19住院患者个人临床数据的存储库。方法:通过对COVID-19预测模型的实时回顾,我们确定了用于预测确诊或疑似COVID-19患者住院期间总死亡率和ICU住院率的合格多变量预后模型。使用来自9个中低收入国家(布基纳法索、喀麦隆、刚果民主共和国、几内亚、印度、尼日尔、尼日利亚、赞比亚和津巴布韦)向世卫组织COVID-19全球临床平台提供的数据对这些模型进行了评估。从判别和校准两个方面对模型性能进行了评估。结果:在144个符合条件的模型中,140个因高偏倚风险、LIMCs中无法获得预测因子或模型描述不充分而被排除。在11,338名参与者中,其余模型在预测住院死亡率(3个模型)方面表现出良好的辨别能力,曲线下面积(auc)范围在0.76 (95% CI 0.71-0.81)和0.84 (95% CI 0.77-0.89)之间。预测ICU入院风险的AUC为0.74 (95% CI 0.70-0.78)(一个模型)。所有模型都显示出校准不当和过拟合的迹象,各国之间存在广泛的异质性。结论:在现有的COVID-19预后模型中,只有少数模型可以根据从中低收入国家收集的数据进行验证,这主要是由于预测器的可用性有限。尽管它们具有判别能力,但所选的死亡率预测或ICU入院模型显示出不同的和次优的校准。
{"title":"Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis.","authors":"Johanna A A Damen, Banafsheh Arshi, Maarten van Smeden, Silvia Bertagnolio, Janet V Diaz, Ronaldo Silva, Soe Soe Thwin, Laure Wynants, Karel G M Moons","doi":"10.1186/s41512-024-00181-5","DOIUrl":"10.1186/s41512-024-00181-5","url":null,"abstract":"<p><strong>Background: </strong>We evaluated the performance of prognostic models for predicting mortality or ICU admission in hospitalized patients with COVID-19 in the World Health Organization (WHO) Global Clinical Platform, a repository of individual-level clinical data of patients hospitalized with COVID-19, including in low- and middle-income countries (LMICs).</p><p><strong>Methods: </strong>We identified eligible multivariable prognostic models for predicting overall mortality and ICU admission during hospital stay in patients with confirmed or suspected COVID-19 from a living review of COVID-19 prediction models. These models were evaluated using data contributed to the WHO Global Clinical Platform for COVID-19 from nine LMICs (Burkina Faso, Cameroon, Democratic Republic of Congo, Guinea, India, Niger, Nigeria, Zambia, and Zimbabwe). Model performance was assessed in terms of discrimination and calibration.</p><p><strong>Results: </strong>Out of 144 eligible models, 140 were excluded due to a high risk of bias, predictors unavailable in LIMCs, or insufficient model description. Among 11,338 participants, the remaining models showed good discrimination for predicting in-hospital mortality (3 models), with areas under the curve (AUCs) ranging between 0.76 (95% CI 0.71-0.81) and 0.84 (95% CI 0.77-0.89). An AUC of 0.74 (95% CI 0.70-0.78) was found for predicting ICU admission risk (one model). All models showed signs of miscalibration and overfitting, with extensive heterogeneity between countries.</p><p><strong>Conclusions: </strong>Among the available COVID-19 prognostic models, only a few could be validated on data collected from LMICs, mainly due to limited predictor availability. Despite their discriminative ability, selected models for mortality prediction or ICU admission showed varying and suboptimal calibration.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"17"},"PeriodicalIF":0.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reported prevalence and comparison of diagnostic approaches for Candida africana: a systematic review with meta-analysis. 非洲念珠菌的报告患病率和诊断方法的比较:一项系统综述和荟萃分析。
Pub Date : 2024-12-05 DOI: 10.1186/s41512-024-00180-6
Bwambale Jonani, Emmanuel Charles Kasule, Herman Roman Bwire, Gerald Mboowa

This systematic review and meta-analysis evaluated reported prevalence and diagnostic methods for identifying Candida africana, an opportunistic yeast associated with vaginal and oral candidiasis. A comprehensive literature search yielded 53 studies meeting the inclusion criteria, 2 of which were case studies. The pooled prevalence of C. africana among 20,571 participants was 0.9% (95% CI: 0.7-1.3%), with significant heterogeneity observed (I2 = 79%, p < 0.01). Subgroup analyses revealed regional variations, with North America showing the highest prevalence (4.6%, 95% CI: 1.8-11.2%). The majority 84.52% of the C. africana have been isolated from vaginal samples, 8.37% from oral samples, 3.77% from urine, 2.09% from glans penis swabs, and 0.42% from rectal swabs, nasal swabs, and respiratory tract expectorations respectively. No C. africana has been isolated from nail samples. Hyphal wall protein 1 gene PCR was the most used diagnostic method for identifying C. africana. It has been used to identify 70% of the isolates. A comparison of methods revealed that the Vitek-2 system consistently failed to differentiate C. africana from Candida albicans, whereas MALDI-TOF misidentified several isolates compared with HWP1 PCR. Factors beyond diagnostic methodology may influence C. africana detection rates. We highlight the importance of adapting molecular methods for resource-limited settings or developing equally accurate but more accessible alternatives for the identification and differentiation of highly similar and cryptic Candida species such as C. africana.

本系统综述和荟萃分析评估了非洲念珠菌(一种与阴道和口腔念珠菌病相关的机会性酵母菌)的报告患病率和诊断方法。综合文献检索得到53项符合纳入标准的研究,其中2项为个案研究。在20,571名参与者中,非洲卷虫的总患病率为0.9% (95% CI: 0.7-1.3%),存在显著的异质性(I2 = 79%, p
{"title":"Reported prevalence and comparison of diagnostic approaches for Candida africana: a systematic review with meta-analysis.","authors":"Bwambale Jonani, Emmanuel Charles Kasule, Herman Roman Bwire, Gerald Mboowa","doi":"10.1186/s41512-024-00180-6","DOIUrl":"10.1186/s41512-024-00180-6","url":null,"abstract":"<p><p>This systematic review and meta-analysis evaluated reported prevalence and diagnostic methods for identifying Candida africana, an opportunistic yeast associated with vaginal and oral candidiasis. A comprehensive literature search yielded 53 studies meeting the inclusion criteria, 2 of which were case studies. The pooled prevalence of C. africana among 20,571 participants was 0.9% (95% CI: 0.7-1.3%), with significant heterogeneity observed (I<sup>2</sup> = 79%, p < 0.01). Subgroup analyses revealed regional variations, with North America showing the highest prevalence (4.6%, 95% CI: 1.8-11.2%). The majority 84.52% of the C. africana have been isolated from vaginal samples, 8.37% from oral samples, 3.77% from urine, 2.09% from glans penis swabs, and 0.42% from rectal swabs, nasal swabs, and respiratory tract expectorations respectively. No C. africana has been isolated from nail samples. Hyphal wall protein 1 gene PCR was the most used diagnostic method for identifying C. africana. It has been used to identify 70% of the isolates. A comparison of methods revealed that the Vitek-2 system consistently failed to differentiate C. africana from Candida albicans, whereas MALDI-TOF misidentified several isolates compared with HWP1 PCR. Factors beyond diagnostic methodology may influence C. africana detection rates. We highlight the importance of adapting molecular methods for resource-limited settings or developing equally accurate but more accessible alternatives for the identification and differentiation of highly similar and cryptic Candida species such as C. africana.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11619109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The relative data hungriness of unpenalized and penalized logistic regression and ensemble-based machine learning methods: the case of calibration. 无惩罚和有惩罚逻辑回归以及基于集合的机器学习方法的相对数据饥渴度:校准案例。
Pub Date : 2024-11-05 DOI: 10.1186/s41512-024-00179-z
Peter C Austin, Douglas S Lee, Bo Wang

Background: Machine learning methods are increasingly being used to predict clinical outcomes. Optimism is the difference in model performance between derivation and validation samples. The term "data hungriness" refers to the sample size needed for a modelling technique to generate a prediction model with minimal optimism. Our objective was to compare the relative data hungriness of different statistical and machine learning methods when assessed using calibration.

Methods: We used Monte Carlo simulations to assess the effect of number of events per variable (EPV) on the optimism of six learning methods when assessing model calibration: unpenalized logistic regression, ridge regression, lasso regression, bagged classification trees, random forests, and stochastic gradient boosting machines using trees as the base learners. We performed simulations in two large cardiovascular datasets each of which comprised an independent derivation and validation sample: patients hospitalized with acute myocardial infarction and patients hospitalized with heart failure. We used six data-generating processes, each based on one of the six learning methods. We allowed the sample sizes to be such that the number of EPV ranged from 10 to 200 in increments of 10. We applied six prediction methods in each of the simulated derivation samples and evaluated calibration in the simulated validation samples using the integrated calibration index, the calibration intercept, and the calibration slope. We also examined Nagelkerke's R2, the scaled Brier score, and the c-statistic.

Results: Across all 12 scenarios (2 diseases × 6 data-generating processes), penalized logistic regression displayed very low optimism even when the number of EPV was very low. Random forests and bagged trees tended to be the most data hungry and displayed the greatest optimism.

Conclusions: When assessed using calibration, penalized logistic regression was substantially less data hungry than methods from the machine learning literature.

背景:机器学习方法越来越多地被用于预测临床结果。乐观度是推导样本和验证样本之间模型性能的差异。术语 "数据饥渴度 "指的是建模技术生成预测模型所需的样本量,该模型的乐观程度最低。我们的目标是比较不同统计方法和机器学习方法在使用校准评估时的相对数据饥饿度:我们使用蒙特卡罗模拟来评估在评估模型校准时,每个变量的事件数(EPV)对以下六种学习方法的乐观程度的影响:非惩罚性逻辑回归、脊回归、套索回归、袋装分类树、随机森林和使用树作为基础学习器的随机梯度提升机。我们在两个大型心血管数据集上进行了模拟,每个数据集由独立的推导和验证样本组成:急性心肌梗死住院患者和心力衰竭住院患者。我们使用了六种数据生成流程,每种流程都基于六种学习方法中的一种。我们允许样本大小为 EPV 数量在 10 到 200 之间,以 10 为增量。我们在每个模拟推导样本中应用了六种预测方法,并使用综合校准指数、校准截距和校准斜率评估了模拟验证样本中的校准情况。我们还检查了纳格尔克 R2、标度布赖尔得分和 c 统计量:在所有 12 种情况下(2 种疾病 × 6 个数据生成过程),即使 EPV 的数量非常少,惩罚逻辑回归也显示出非常低的乐观程度。随机森林和袋装树往往最需要数据,并显示出最大的乐观性:结论:在使用校准进行评估时,惩罚逻辑回归对数据的需求远远低于机器学习文献中的方法。
{"title":"The relative data hungriness of unpenalized and penalized logistic regression and ensemble-based machine learning methods: the case of calibration.","authors":"Peter C Austin, Douglas S Lee, Bo Wang","doi":"10.1186/s41512-024-00179-z","DOIUrl":"10.1186/s41512-024-00179-z","url":null,"abstract":"<p><strong>Background: </strong>Machine learning methods are increasingly being used to predict clinical outcomes. Optimism is the difference in model performance between derivation and validation samples. The term \"data hungriness\" refers to the sample size needed for a modelling technique to generate a prediction model with minimal optimism. Our objective was to compare the relative data hungriness of different statistical and machine learning methods when assessed using calibration.</p><p><strong>Methods: </strong>We used Monte Carlo simulations to assess the effect of number of events per variable (EPV) on the optimism of six learning methods when assessing model calibration: unpenalized logistic regression, ridge regression, lasso regression, bagged classification trees, random forests, and stochastic gradient boosting machines using trees as the base learners. We performed simulations in two large cardiovascular datasets each of which comprised an independent derivation and validation sample: patients hospitalized with acute myocardial infarction and patients hospitalized with heart failure. We used six data-generating processes, each based on one of the six learning methods. We allowed the sample sizes to be such that the number of EPV ranged from 10 to 200 in increments of 10. We applied six prediction methods in each of the simulated derivation samples and evaluated calibration in the simulated validation samples using the integrated calibration index, the calibration intercept, and the calibration slope. We also examined Nagelkerke's R<sup>2</sup>, the scaled Brier score, and the c-statistic.</p><p><strong>Results: </strong>Across all 12 scenarios (2 diseases × 6 data-generating processes), penalized logistic regression displayed very low optimism even when the number of EPV was very low. Random forests and bagged trees tended to be the most data hungry and displayed the greatest optimism.</p><p><strong>Conclusions: </strong>When assessed using calibration, penalized logistic regression was substantially less data hungry than methods from the machine learning literature.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding overfitting in random forest for probability estimation: a visualization and simulation study. 理解用于概率估计的随机森林中的过度拟合:一项可视化和模拟研究。
Pub Date : 2024-09-27 DOI: 10.1186/s41512-024-00177-1
Lasai Barreñada, Paula Dhiman, Dirk Timmerman, Anne-Laure Boulesteix, Ben Van Calster

Background: Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study.

Methods: For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000).

Results: The visualizations suggested that the model learned "spikes of probability" around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation - 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size.

Conclusions: Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.

背景随机森林已成为临床风险预测建模的常用方法。在一项预测卵巢恶性肿瘤的案例研究中,我们观察到训练 AUC 接近 1。虽然这表明存在过度拟合的情况,但在测试数据上的表现还是很有竞争力的。我们旨在通过(1)可视化三个真实世界案例研究中的数据空间和(2)模拟研究来了解随机森林在概率估计中的行为:在案例研究中,使用二维子空间中的热图对多叉风险估计进行可视化。模拟研究包括 48 种逻辑数据生成机制(DGM),预测因子的分布、预测因子的数量、预测因子之间的相关性、真实 AUC 和真实预测因子的强度各不相同。针对每种 DGM,模拟了 1000 个大小为 200 或 4000、结果为二进制的训练数据集,并使用 ranger R 软件包训练了最小节点大小为 2 或 20 的随机森林模型,总共产生了 192 个方案。在大型测试数据集(N = 100,000)上对模型性能进行了评估:可视化结果表明,模型围绕训练集中的事件学习到了 "概率峰值"。事件集群产生了更大的峰值或高原(信号),孤立事件产生了局部峰值(噪音)。在模拟研究中,除非有 4 个二进制预测因子或 16 个二进制预测因子(最小节点大小为 20),否则训练 AUC 的中位数介于 0.97 和 1 之间。分辨损失中位数,即测试 AUC 中位数与真实 AUC 之间的差值为 0.025(范围为 0.00 至 0.13)。训练 AUC 中位数与辨别损失的 Spearman 相关性约为 0.70。每个变量的事件数越多、最小节点大小越大、二元预测因子越多,测试 AUC 的中值就越高。训练校准斜率中值始终高于 1,且与各种情况下的测试斜率中值不相关(Spearman 相关性 - 0.11)。真实 AUC 越高、最小节点规模越大、样本规模越大,测试斜率中值越高:结论:随机森林学习局部概率峰值,往往能获得接近完美的训练 AUC,而不会对测试数据的 AUC 产生强烈影响。当目标是概率估计时,模拟结果与在随机森林模型中使用完全生长树的常见建议背道而驰。
{"title":"Understanding overfitting in random forest for probability estimation: a visualization and simulation study.","authors":"Lasai Barreñada, Paula Dhiman, Dirk Timmerman, Anne-Laure Boulesteix, Ben Van Calster","doi":"10.1186/s41512-024-00177-1","DOIUrl":"https://doi.org/10.1186/s41512-024-00177-1","url":null,"abstract":"<p><strong>Background: </strong>Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study.</p><p><strong>Methods: </strong>For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000).</p><p><strong>Results: </strong>The visualizations suggested that the model learned \"spikes of probability\" around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation - 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size.</p><p><strong>Conclusions: </strong>Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437774/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of methods for the analysis of diagnostic tests performed in sequence. 依次进行的诊断测试分析方法综述。
Pub Date : 2024-09-03 DOI: 10.1186/s41512-024-00175-3
Thomas R Fanshawe, Brian D Nicholson, Rafael Perera, Jason L Oke

Background: Many clinical pathways for the diagnosis of disease are based on diagnostic tests that are performed in sequence. The performance of the full diagnostic sequence is dictated by the diagnostic performance of each test in the sequence as well as the conditional dependence between them, given true disease status. Resulting estimates of performance, such as the sensitivity and specificity of the test sequence, are key parameters in health-economic evaluations. We conducted a methodological review of statistical methods for assessing the performance of diagnostic tests performed in sequence, with the aim of guiding data analysts towards classes of methods that may be suitable given the design and objectives of the testing sequence.

Methods: We searched PubMed, Scopus and Web of Science for relevant papers describing methodology for analysing sequences of diagnostic tests. Papers were classified by the characteristics of the method used, and these were used to group methods into themes. We illustrate some of the methods using data from a cohort study of repeat faecal immunochemical testing for colorectal cancer in symptomatic patients, to highlight the importance of allowing for conditional dependence in test sequences and adjustment for an imperfect reference standard.

Results: Five overall themes were identified, detailing methods for combining multiple tests in sequence, estimating conditional dependence, analysing sequences of diagnostic tests used for risk assessment, analysing test sequences in conjunction with an imperfect or incomplete reference standard, and meta-analysis of test sequences.

Conclusions: This methodological review can be used to help researchers identify suitable analytic methods for studies that use diagnostic tests performed in sequence.

背景:许多疾病诊断的临床路径都以依次进行的诊断测试为基础。完整诊断序列的性能取决于序列中每项检验的诊断性能,以及在真实疾病状态下它们之间的条件依赖性。由此得出的性能估计值,如检验序列的灵敏度和特异性,是健康经济评价的关键参数。我们对评估依次进行的诊断检测性能的统计方法进行了方法学回顾,目的是指导数据分析师根据检测序列的设计和目标选择合适的方法类别:我们在 PubMed、Scopus 和 Web of Science 上搜索了描述诊断检测序列分析方法的相关论文。我们根据所用方法的特点对论文进行了分类,并将这些方法归入不同的主题。我们利用对无症状患者进行结直肠癌重复粪便免疫化学检验的队列研究数据来说明其中的一些方法,以突出考虑检验序列中条件依赖性和调整不完善参考标准的重要性:确定了五个总体主题,详细介绍了将多个检验序列结合起来的方法、估计条件依赖性的方法、分析用于风险评估的诊断检测序列的方法、结合不完善或不完全参考标准分析检验序列的方法以及检验序列的元分析方法:本方法论综述可用于帮助研究人员为使用序列诊断检测的研究确定合适的分析方法。
{"title":"A review of methods for the analysis of diagnostic tests performed in sequence.","authors":"Thomas R Fanshawe, Brian D Nicholson, Rafael Perera, Jason L Oke","doi":"10.1186/s41512-024-00175-3","DOIUrl":"10.1186/s41512-024-00175-3","url":null,"abstract":"<p><strong>Background: </strong>Many clinical pathways for the diagnosis of disease are based on diagnostic tests that are performed in sequence. The performance of the full diagnostic sequence is dictated by the diagnostic performance of each test in the sequence as well as the conditional dependence between them, given true disease status. Resulting estimates of performance, such as the sensitivity and specificity of the test sequence, are key parameters in health-economic evaluations. We conducted a methodological review of statistical methods for assessing the performance of diagnostic tests performed in sequence, with the aim of guiding data analysts towards classes of methods that may be suitable given the design and objectives of the testing sequence.</p><p><strong>Methods: </strong>We searched PubMed, Scopus and Web of Science for relevant papers describing methodology for analysing sequences of diagnostic tests. Papers were classified by the characteristics of the method used, and these were used to group methods into themes. We illustrate some of the methods using data from a cohort study of repeat faecal immunochemical testing for colorectal cancer in symptomatic patients, to highlight the importance of allowing for conditional dependence in test sequences and adjustment for an imperfect reference standard.</p><p><strong>Results: </strong>Five overall themes were identified, detailing methods for combining multiple tests in sequence, estimating conditional dependence, analysing sequences of diagnostic tests used for risk assessment, analysing test sequences in conjunction with an imperfect or incomplete reference standard, and meta-analysis of test sequences.</p><p><strong>Conclusions: </strong>This methodological review can be used to help researchers identify suitable analytic methods for studies that use diagnostic tests performed in sequence.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical and analytical considerations when performing interim analyses in diagnostic test accuracy studies. 在诊断测试准确性研究中进行中期分析时的实用和分析考虑因素。
Pub Date : 2024-08-20 DOI: 10.1186/s41512-024-00174-4
Susannah Fleming, Lazaro Mwandigha, Thomas R Fanshawe

Interim analysis is a common methodology in randomised clinical trials but has received less attention in studies of diagnostic test accuracy. In such studies, early termination for futility may be beneficial if early evidence indicates that a diagnostic test is unlikely to achieve a clinically useful level of diagnostic performance, as measured by the sensitivity and specificity. In this paper, we describe relevant practical and analytical considerations when planning and performing interim analysis in diagnostic accuracy studies, focusing on stopping rules for futility. We present an adaptation of the exact group sequential method for diagnostic testing, with R code provided for implementing this method in practice. The method is illustrated using two simulated data sets and data from a published diagnostic accuracy study for point-of-care testing for SARS-CoV-2. The considerations described in this paper can be used to guide decisions as to when an interim analysis in a diagnostic accuracy study is suitable and highlight areas for further methodological development.

中期分析是随机临床试验的常用方法,但在诊断测试准确性研究中却较少受到关注。在此类研究中,如果早期证据表明诊断测试不太可能达到临床有用的诊断水平(以灵敏度和特异性衡量),则因无效而提前终止可能是有益的。在本文中,我们介绍了在诊断准确性研究中规划和执行中期分析时的相关实践和分析注意事项,重点是无效性的终止规则。我们介绍了一种用于诊断测试的精确分组顺序法,并提供了在实践中实施该方法的 R 代码。我们使用两个模拟数据集和已发表的 SARS-CoV-2 护理点检测诊断准确性研究的数据对该方法进行了说明。本文所述的考虑因素可用于指导诊断准确性研究中何时适合进行中期分析的决策,并突出了方法学进一步发展的领域。
{"title":"Practical and analytical considerations when performing interim analyses in diagnostic test accuracy studies.","authors":"Susannah Fleming, Lazaro Mwandigha, Thomas R Fanshawe","doi":"10.1186/s41512-024-00174-4","DOIUrl":"10.1186/s41512-024-00174-4","url":null,"abstract":"<p><p>Interim analysis is a common methodology in randomised clinical trials but has received less attention in studies of diagnostic test accuracy. In such studies, early termination for futility may be beneficial if early evidence indicates that a diagnostic test is unlikely to achieve a clinically useful level of diagnostic performance, as measured by the sensitivity and specificity. In this paper, we describe relevant practical and analytical considerations when planning and performing interim analysis in diagnostic accuracy studies, focusing on stopping rules for futility. We present an adaptation of the exact group sequential method for diagnostic testing, with R code provided for implementing this method in practice. The method is illustrated using two simulated data sets and data from a published diagnostic accuracy study for point-of-care testing for SARS-CoV-2. The considerations described in this paper can be used to guide decisions as to when an interim analysis in a diagnostic accuracy study is suitable and highlight areas for further methodological development.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11334588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic review of methods used in prediction models with recurrent event data. 系统性地回顾了使用重复事件数据建立预测模型的方法。
Pub Date : 2024-08-06 DOI: 10.1186/s41512-024-00173-5
Victoria Watson, Catrin Tudur Smith, Laura J Bonnett

Background: Patients who suffer from chronic conditions or diseases are susceptible to experiencing repeated events of the same type (e.g. seizures), termed 'recurrent events'. Prediction models can be used to predict the risk of recurrence so that intervention or management can be tailored accordingly, but statistical methodology can vary. The objective of this systematic review was to identify and describe statistical approaches that have been applied for the development and validation of multivariable prediction models with recurrent event data. A secondary objective was to informally assess the characteristics and quality of analysis approaches used in the development and validation of prediction models of recurrent event data.

Methods: Searches were run in MEDLINE using a search strategy in 2019 which included index terms and phrases related to recurrent events and prediction models. For studies to be included in the review they must have developed or validated a multivariable clinical prediction model for recurrent event outcome data, specifically modelling the recurrent events and the timing between them. The statistical analysis methods used to analyse the recurrent event data in the clinical prediction model were extracted to answer the primary aim of the systematic review. In addition, items such as the event rate as well as any discrimination and calibration statistics that were used to assess the model performance were extracted for the secondary aim of the review.

Results: A total of 855 publications were identified using the developed search strategy and 301 of these are included in our systematic review. The Andersen-Gill method was identified as the most commonly applied method in the analysis of recurrent events, which was used in 152 (50.5%) studies. This was closely followed by frailty models which were used in 116 (38.5%) included studies. Of the 301 included studies, only 75 (24.9%) internally validated their model(s) and three (1.0%) validated their model(s) in an external dataset.

Conclusions: This review identified a variety of methods which are used in practice when developing or validating prediction models for recurrent events. The variability of the approaches identified is cause for concern as it indicates possible immaturity in the field and highlights the need for more methodological research to bring greater consistency in approach of recurrent event analysis. Further work is required to ensure publications report all required information and use robust statistical methods for model development and validation.

Prospero registration: CRD42019116031.

背景:患有慢性病或慢性疾病的患者很容易重复发生同类事件(如癫痫发作),即 "复发事件"。预测模型可用于预测复发风险,以便相应地调整干预或管理,但统计方法可能各不相同。本系统性综述的目的是确定并描述用于开发和验证复发事件数据多变量预测模型的统计方法。次要目的是对用于开发和验证复发事件数据预测模型的分析方法的特点和质量进行非正式评估:采用 2019 年的检索策略在 MEDLINE 中进行检索,其中包括与复发事件和预测模型相关的索引词和短语。纳入综述的研究必须已针对复发事件结果数据开发或验证了多变量临床预测模型,特别是对复发事件和它们之间的时间进行建模。提取临床预测模型中用于分析复发事件数据的统计分析方法,以回答系统性综述的主要目的。此外,还提取了用于评估模型性能的事件发生率以及任何判别和校准统计数据等项目,以实现综述的次要目的:使用制定的搜索策略共识别出 855 篇出版物,其中 301 篇纳入了我们的系统综述。安徒生-吉尔法被认为是分析复发事件最常用的方法,有 152 项(50.5%)研究采用了该方法。紧随其后的是虚弱模型,有 116 项(38.5%)纳入研究使用了该方法。在纳入的 301 项研究中,只有 75 项(24.9%)对其模型进行了内部验证,3 项(1.0%)在外部数据集中对其模型进行了验证:本综述确定了在开发或验证复发性事件预测模型时实际使用的各种方法。所发现的方法的多样性令人担忧,因为这表明该领域可能还不成熟,并强调需要进行更多的方法研究,以提高复发性事件分析方法的一致性。需要进一步开展工作,确保出版物报告所有必要信息,并使用可靠的统计方法进行模型开发和验证:CRD42019116031。
{"title":"Systematic review of methods used in prediction models with recurrent event data.","authors":"Victoria Watson, Catrin Tudur Smith, Laura J Bonnett","doi":"10.1186/s41512-024-00173-5","DOIUrl":"10.1186/s41512-024-00173-5","url":null,"abstract":"<p><strong>Background: </strong>Patients who suffer from chronic conditions or diseases are susceptible to experiencing repeated events of the same type (e.g. seizures), termed 'recurrent events'. Prediction models can be used to predict the risk of recurrence so that intervention or management can be tailored accordingly, but statistical methodology can vary. The objective of this systematic review was to identify and describe statistical approaches that have been applied for the development and validation of multivariable prediction models with recurrent event data. A secondary objective was to informally assess the characteristics and quality of analysis approaches used in the development and validation of prediction models of recurrent event data.</p><p><strong>Methods: </strong>Searches were run in MEDLINE using a search strategy in 2019 which included index terms and phrases related to recurrent events and prediction models. For studies to be included in the review they must have developed or validated a multivariable clinical prediction model for recurrent event outcome data, specifically modelling the recurrent events and the timing between them. The statistical analysis methods used to analyse the recurrent event data in the clinical prediction model were extracted to answer the primary aim of the systematic review. In addition, items such as the event rate as well as any discrimination and calibration statistics that were used to assess the model performance were extracted for the secondary aim of the review.</p><p><strong>Results: </strong>A total of 855 publications were identified using the developed search strategy and 301 of these are included in our systematic review. The Andersen-Gill method was identified as the most commonly applied method in the analysis of recurrent events, which was used in 152 (50.5%) studies. This was closely followed by frailty models which were used in 116 (38.5%) included studies. Of the 301 included studies, only 75 (24.9%) internally validated their model(s) and three (1.0%) validated their model(s) in an external dataset.</p><p><strong>Conclusions: </strong>This review identified a variety of methods which are used in practice when developing or validating prediction models for recurrent events. The variability of the approaches identified is cause for concern as it indicates possible immaturity in the field and highlights the need for more methodological research to bring greater consistency in approach of recurrent event analysis. Further work is required to ensure publications report all required information and use robust statistical methods for model development and validation.</p><p><strong>Prospero registration: </strong>CRD42019116031.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11302841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of a prediction model of conversion to Alzheimer's disease in people with mild cognitive impairment: the statistical analysis plan of the INTERCEPTOR project. 开发轻度认知障碍患者转为阿尔茨海默病的预测模型:INTERCEPTOR 项目的统计分析计划。
Pub Date : 2024-07-25 DOI: 10.1186/s41512-024-00172-6
Flavia L Lombardo, Patrizia Lorenzini, Flavia Mayer, Marco Massari, Paola Piscopo, Ilaria Bacigalupo, Antonio Ancidoni, Francesco Sciancalepore, Nicoletta Locuratolo, Giulia Remoli, Simone Salemme, Stefano Cappa, Daniela Perani, Patrizia Spadin, Fabrizio Tagliavini, Alberto Redolfi, Maria Cotelli, Camillo Marra, Naike Caraglia, Fabrizio Vecchio, Francesca Miraglia, Paolo Maria Rossini, Nicola Vanacore

Background: In recent years, significant efforts have been directed towards the research and development of disease-modifying therapies for dementia. These drugs focus on prodromal (mild cognitive impairment, MCI) and/or early stages of Alzheimer's disease (AD). Literature evidence indicates that a considerable proportion of individuals with MCI do not progress to dementia. Identifying individuals at higher risk of developing dementia is essential for appropriate management, including the prescription of new disease-modifying therapies expected to become available in clinical practice in the near future.

Methods: The ongoing INTERCEPTOR study is a multicenter, longitudinal, interventional, non-therapeutic cohort study designed to enroll 500 individuals with MCI aged 50-85 years. The primary aim is to identify a biomarker or a set of biomarkers able to accurately predict the conversion from MCI to AD dementia within 3 years of follow-up. The biomarkers investigated in this study are neuropsychological tests (mini-mental state examination (MMSE) and delayed free recall), brain glucose metabolism ([18F]FDG-PET), MRI volumetry of the hippocampus, EEG brain connectivity, cerebrospinal fluid (CSF) markers (p-tau, t-tau, Aβ1-42, Aβ1-42/1-40 ratio, Aβ1-42/p-Tau ratio) and APOE genotype. The baseline visit includes a full cognitive and neuropsychological evaluation, as well as the collection of clinical and socio-demographic information. Prognostic models will be developed using Cox regression, incorporating individual characteristics and biomarkers through stepwise selection. Model performance will be evaluated in terms of discrimination and calibration and subjected to internal validation using the bootstrapping procedure. The final model will be visually represented as a nomogram.

Discussion: This paper contains a detailed description of the statistical analysis plan to ensure the reproducibility and transparency of the analysis. The prognostic model developed in this study aims to identify the population with MCI at higher risk of developing AD dementia, potentially eligible for drug prescriptions. The nomogram could provide a valuable tool for clinicians for risk stratification and early treatment decisions.

Trial registration: ClinicalTrials.gov NCT03834402. Registered on February 8, 2019.

背景:近年来,人们一直致力于研究和开发痴呆症的疾病改变疗法。这些药物主要针对阿尔茨海默病(AD)的前驱期(轻度认知障碍,MCI)和/或早期阶段。文献证据表明,相当一部分 MCI 患者不会发展为痴呆症。识别痴呆症高危人群对于进行适当的管理至关重要,包括在不久的将来在临床实践中使用新的疾病改变疗法:正在进行的 INTERCEPTOR 研究是一项多中心、纵向、干预性、非治疗性队列研究,旨在招募 500 名 50-85 岁的 MCI 患者。研究的主要目的是确定一种或一组生物标志物,以便在3年随访期内准确预测MCI向AD痴呆的转化。本研究调查的生物标志物包括神经心理测试(迷你精神状态检查(MMSE)和延迟自由回忆)、脑葡萄糖代谢([18F]FDG-PET)、海马体磁共振成像容积、脑电图脑连接、脑脊液(CSF)标志物(p-tau、t-tau、Aβ1-42、Aβ1-42/1-40 比值、Aβ1-42/p-Tau 比值)和 APOE 基因型。基线访问包括全面的认知和神经心理学评估,以及临床和社会人口信息的收集。将使用 Cox 回归法建立预后模型,并通过逐步选择的方法纳入个体特征和生物标志物。将从区分度和校准方面对模型性能进行评估,并使用引导程序进行内部验证。最终的模型将以提名图的形式直观呈现:本文详细描述了统计分析计划,以确保分析的可重复性和透明度。本研究开发的预后模型旨在确定哪些MCI患者有较高风险发展为AD痴呆,从而有可能获得药物处方。该提名图可为临床医生提供一个宝贵的工具,用于风险分层和早期治疗决策:试验注册:ClinicalTrials.gov NCT03834402。注册日期:2019年2月8日。
{"title":"Development of a prediction model of conversion to Alzheimer's disease in people with mild cognitive impairment: the statistical analysis plan of the INTERCEPTOR project.","authors":"Flavia L Lombardo, Patrizia Lorenzini, Flavia Mayer, Marco Massari, Paola Piscopo, Ilaria Bacigalupo, Antonio Ancidoni, Francesco Sciancalepore, Nicoletta Locuratolo, Giulia Remoli, Simone Salemme, Stefano Cappa, Daniela Perani, Patrizia Spadin, Fabrizio Tagliavini, Alberto Redolfi, Maria Cotelli, Camillo Marra, Naike Caraglia, Fabrizio Vecchio, Francesca Miraglia, Paolo Maria Rossini, Nicola Vanacore","doi":"10.1186/s41512-024-00172-6","DOIUrl":"10.1186/s41512-024-00172-6","url":null,"abstract":"<p><strong>Background: </strong>In recent years, significant efforts have been directed towards the research and development of disease-modifying therapies for dementia. These drugs focus on prodromal (mild cognitive impairment, MCI) and/or early stages of Alzheimer's disease (AD). Literature evidence indicates that a considerable proportion of individuals with MCI do not progress to dementia. Identifying individuals at higher risk of developing dementia is essential for appropriate management, including the prescription of new disease-modifying therapies expected to become available in clinical practice in the near future.</p><p><strong>Methods: </strong>The ongoing INTERCEPTOR study is a multicenter, longitudinal, interventional, non-therapeutic cohort study designed to enroll 500 individuals with MCI aged 50-85 years. The primary aim is to identify a biomarker or a set of biomarkers able to accurately predict the conversion from MCI to AD dementia within 3 years of follow-up. The biomarkers investigated in this study are neuropsychological tests (mini-mental state examination (MMSE) and delayed free recall), brain glucose metabolism ([<sup>18</sup>F]FDG-PET), MRI volumetry of the hippocampus, EEG brain connectivity, cerebrospinal fluid (CSF) markers (p-tau, t-tau, Aβ1-42, Aβ1-42/1-40 ratio, Aβ1-42/p-Tau ratio) and APOE genotype. The baseline visit includes a full cognitive and neuropsychological evaluation, as well as the collection of clinical and socio-demographic information. Prognostic models will be developed using Cox regression, incorporating individual characteristics and biomarkers through stepwise selection. Model performance will be evaluated in terms of discrimination and calibration and subjected to internal validation using the bootstrapping procedure. The final model will be visually represented as a nomogram.</p><p><strong>Discussion: </strong>This paper contains a detailed description of the statistical analysis plan to ensure the reproducibility and transparency of the analysis. The prognostic model developed in this study aims to identify the population with MCI at higher risk of developing AD dementia, potentially eligible for drug prescriptions. The nomogram could provide a valuable tool for clinicians for risk stratification and early treatment decisions.</p><p><strong>Trial registration: </strong>ClinicalTrials.gov NCT03834402. Registered on February 8, 2019.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"8 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271065/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Diagnostic and prognostic research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1