Adaptive sample size determination for the development of clinical prediction models.

Evangelia Christodoulou, Maarten van Smeden, Michael Edlinger, Dirk Timmerman, Maria Wanitschek, Ewout W Steyerberg, Ben Van Calster
{"title":"Adaptive sample size determination for the development of clinical prediction models.","authors":"Evangelia Christodoulou,&nbsp;Maarten van Smeden,&nbsp;Michael Edlinger,&nbsp;Dirk Timmerman,&nbsp;Maria Wanitschek,&nbsp;Ewout W Steyerberg,&nbsp;Ben Van Calster","doi":"10.1186/s41512-021-00096-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in.</p><p><strong>Methods: </strong>We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth's correction).</p><p><strong>Results: </strong>Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450-500) for the ovarian cancer data (22 events per parameter (EPP), 20-24) and 850 patients (750-900) for the CAD data (33 EPP, 30-35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth's correction was used.</p><p><strong>Conclusions: </strong>Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":" ","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s41512-021-00096-5","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-021-00096-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Background: We suggest an adaptive sample size calculation method for developing clinical prediction models, in which model performance is monitored sequentially as new data comes in.

Methods: We illustrate the approach using data for the diagnosis of ovarian cancer (n = 5914, 33% event fraction) and obstructive coronary artery disease (CAD; n = 4888, 44% event fraction). We used logistic regression to develop a prediction model consisting only of a priori selected predictors and assumed linear relations for continuous predictors. We mimicked prospective patient recruitment by developing the model on 100 randomly selected patients, and we used bootstrapping to internally validate the model. We sequentially added 50 random new patients until we reached a sample size of 3000 and re-estimated model performance at each step. We examined the required sample size for satisfying the following stopping rule: obtaining a calibration slope ≥ 0.9 and optimism in the c-statistic (or AUC) < = 0.02 at two consecutive sample sizes. This procedure was repeated 500 times. We also investigated the impact of alternative modeling strategies: modeling nonlinear relations for continuous predictors and correcting for bias on the model estimates (Firth's correction).

Results: Better discrimination was achieved in the ovarian cancer data (c-statistic 0.9 with 7 predictors) than in the CAD data (c-statistic 0.7 with 11 predictors). Adequate calibration and limited optimism in discrimination was achieved after a median of 450 patients (interquartile range 450-500) for the ovarian cancer data (22 events per parameter (EPP), 20-24) and 850 patients (750-900) for the CAD data (33 EPP, 30-35). A stricter criterion, requiring AUC optimism < = 0.01, was met with a median of 500 (23 EPP) and 1500 (59 EPP) patients, respectively. These sample sizes were much higher than the well-known 10 EPP rule of thumb and slightly higher than a recently published fixed sample size calculation method by Riley et al. Higher sample sizes were required when nonlinear relationships were modeled, and lower sample sizes when Firth's correction was used.

Conclusions: Adaptive sample size determination can be a useful supplement to fixed a priori sample size calculations, because it allows to tailor the sample size to the specific prediction modeling context in a dynamic fashion.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自适应样本量测定用于临床预测模型的发展。
背景:我们建议采用一种自适应样本量计算方法来开发临床预测模型,在这种方法中,随着新数据的输入,对模型的性能进行顺序监测。方法:我们用卵巢癌(n = 5914, 33%事件分数)和阻塞性冠状动脉疾病(CAD;N = 4888, 44%事件分数)。我们使用逻辑回归建立了一个预测模型,该模型仅由先验选择的预测因子和假设连续预测因子的线性关系组成。我们通过在100名随机选择的患者中开发模型来模拟前瞻性患者招募,并使用bootstrapping来内部验证模型。我们按顺序随机增加50名新患者,直到我们达到3000名样本量,并在每一步重新估计模型的性能。我们检查了满足以下停止规则所需的样本量:在两个连续样本量下获得校准斜率≥0.9和c统计量(或AUC) < = 0.02的乐观度。此过程重复500次。我们还研究了替代建模策略的影响:对连续预测器的非线性关系建模和对模型估计的偏差进行校正(Firth校正)。结果:卵巢癌数据(c-统计量为0.9,7个预测因子)比CAD数据(c-统计量为0.7,11个预测因子)具有更好的区分。在卵巢癌数据中位数为450例(四分位数范围450-500)(每个参数22个事件(EPP), 20-24)和CAD数据中位数为850例(750-900)(33 EPP, 30-35)后,获得了充分的校准和有限的判别乐观。更严格的标准要求AUC乐观度< = 0.01,中位数分别为500例(23 EPP)和1500例(59 EPP)。这些样本量远高于众所周知的10 EPP经验法则,也略高于Riley等人最近发表的固定样本量计算方法。当非线性关系建模时,需要较大的样本量,而当使用Firth校正时,需要较小的样本量。结论:自适应样本量确定可以作为固定先验样本量计算的有用补充,因为它允许以动态方式根据特定的预测建模上下文定制样本量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
18 weeks
期刊最新文献
Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. Rehabilitation outcomes after comprehensive post-acute inpatient rehabilitation following moderate to severe acquired brain injury-study protocol for an overall prognosis study based on routinely collected health data. Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis. Reported prevalence and comparison of diagnostic approaches for Candida africana: a systematic review with meta-analysis. The relative data hungriness of unpenalized and penalized logistic regression and ensemble-based machine learning methods: the case of calibration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1