Chi Gao, P. Choudhury, P. Maas, R. Tamimi, H. Eliassen, N. Chatterjee, M. García-Closas, P. Kraft
{"title":"PR02:护士健康与护士健康II研究对乳腺癌风险预测模型的验证","authors":"Chi Gao, P. Choudhury, P. Maas, R. Tamimi, H. Eliassen, N. Chatterjee, M. García-Closas, P. Kraft","doi":"10.1158/1538-7755.CARISK16-PR02","DOIUrl":null,"url":null,"abstract":"Background: Adding genetic and other biomarkers to breast cancer risk prediction models could markedly improve model discrimination; however, these expanded models have not been validated in a range of populations. In particular, the calibration of these new models how well the predicted absolute risks match observed risks has not been established. Good calibration is essential to confirm the utility of these risk models in precision prevention and treatment programs. Large cohort studies provide an ideal setting to validate risk models, as they can be used to validate both relative and absolute risks. However, in practice, genetic and biomarker data are often not available in the full cohort, but only on a sub sample of cases and controls. When the rules for sampling cases and controls into the sub sample are known, inverse-probability-of-sampling (IPW) weights can be used to estimate empirical absolute risks. When the sampling rules are unknown or complicated, the IPW weights can be estimated by regressing selection into the sub sample on matching and other inclusion criteria. Methods: We evaluated the performance of recently published breast cancer risk prediction models [Maas et al. JAMA Oncol 2016] in the Nurses Health Study (NHS) and Nurses Health Study II (NHSII). We first assess a prediction model that only includes questionnaire data (BMI, hormone replacement therapy (HRT), alcohol consumption, smoking status, height, parity, age at menarche and menopause, age at first birth, and family history of breast cancer). These data are available on all subjects in the NHS and NHSII blood subcohorts: 32,826 women in NHS (with disease follow-up from 1990-2012) and 29,611 women in NHS II (1999-2013). We will then validate a model that includes both questionnaire data and a polygenic risk score based on 92 established risk SNPs. Genetic data are available on case-control samples nested within the blood subcohorts: 2308 breast cancer cases and 3344 controls from NHS and 612 breast cancer cases and 933 controls from NHSII. We estimated IPW weights among controls using logistic regression in the blood subcohorts, with sampling as control being the outcome and the following predictors: age at baseline, menopausal status, HRT, length of HRT use for premenopausal women at baseline, and length of follow up time. We used the iCARE software package (Maas P, Chatterjee N, Wheeler W et al. 2015) to calculate predicted 5 and 10-year absolute risks of breast cancer based on the published models, empirical 5 and 10-year incidence across deciles of predicted risk, and Hosmer-Lemeshow goodness of fit and AUC statistics. Results: For the risk model without genetic information, predicted risks in the blood subcohorts ranged from 6.5/1,000 (1st decile) to 20.1/1,000 (10th decile) for NHS. Although empirical risks increased across deciles at approximately the same rate as predicted rates, empirical risks were higher than predicted (Hosmer-Lemeshow p Due to matching and selection on control status, the baseline distribution of questionnaire risk factors differed between the blood subcohorts and the controls from the nested case-control samples. The IPW-weighted distribution in controls closely matched the distribution in the full subcohorts, suggesting a well-calculated weight. We will present IPW-based validation of the risk model in the nested case-control samples (work in progress). Conclusions: These results confirm that breast cancer risk prediction models can discriminate between high-risk and low-risk women, but they also highlight that the accuracy of absolute risk estimates can vary across populations. Findings from this study can add insights into model improvement and model application. Moreover, the method of using IPW weights to approximate a full cohort analysis provides a potential solution for utilizing nested case-control studies in future validation analyses. This abstract is also being presented as Poster A05. Citation Format: Chi Gao, Parichoy Pal Choudhury, Paige Maas, Rulla Tamimi, Heather Eliassen, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft. Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies. [abstract]. In: Proceedings of the AACR Special Conference: Improving Cancer Risk Prediction for Prevention and Early Detection; Nov 16-19, 2016; Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2017;26(5 Suppl):Abstract nr PR02.","PeriodicalId":9487,"journal":{"name":"Cancer Epidemiology and Prevention Biomarkers","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Abstract PR02: Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies\",\"authors\":\"Chi Gao, P. Choudhury, P. Maas, R. Tamimi, H. Eliassen, N. Chatterjee, M. García-Closas, P. Kraft\",\"doi\":\"10.1158/1538-7755.CARISK16-PR02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Adding genetic and other biomarkers to breast cancer risk prediction models could markedly improve model discrimination; however, these expanded models have not been validated in a range of populations. In particular, the calibration of these new models how well the predicted absolute risks match observed risks has not been established. Good calibration is essential to confirm the utility of these risk models in precision prevention and treatment programs. Large cohort studies provide an ideal setting to validate risk models, as they can be used to validate both relative and absolute risks. However, in practice, genetic and biomarker data are often not available in the full cohort, but only on a sub sample of cases and controls. When the rules for sampling cases and controls into the sub sample are known, inverse-probability-of-sampling (IPW) weights can be used to estimate empirical absolute risks. When the sampling rules are unknown or complicated, the IPW weights can be estimated by regressing selection into the sub sample on matching and other inclusion criteria. Methods: We evaluated the performance of recently published breast cancer risk prediction models [Maas et al. JAMA Oncol 2016] in the Nurses Health Study (NHS) and Nurses Health Study II (NHSII). We first assess a prediction model that only includes questionnaire data (BMI, hormone replacement therapy (HRT), alcohol consumption, smoking status, height, parity, age at menarche and menopause, age at first birth, and family history of breast cancer). These data are available on all subjects in the NHS and NHSII blood subcohorts: 32,826 women in NHS (with disease follow-up from 1990-2012) and 29,611 women in NHS II (1999-2013). We will then validate a model that includes both questionnaire data and a polygenic risk score based on 92 established risk SNPs. Genetic data are available on case-control samples nested within the blood subcohorts: 2308 breast cancer cases and 3344 controls from NHS and 612 breast cancer cases and 933 controls from NHSII. We estimated IPW weights among controls using logistic regression in the blood subcohorts, with sampling as control being the outcome and the following predictors: age at baseline, menopausal status, HRT, length of HRT use for premenopausal women at baseline, and length of follow up time. We used the iCARE software package (Maas P, Chatterjee N, Wheeler W et al. 2015) to calculate predicted 5 and 10-year absolute risks of breast cancer based on the published models, empirical 5 and 10-year incidence across deciles of predicted risk, and Hosmer-Lemeshow goodness of fit and AUC statistics. Results: For the risk model without genetic information, predicted risks in the blood subcohorts ranged from 6.5/1,000 (1st decile) to 20.1/1,000 (10th decile) for NHS. Although empirical risks increased across deciles at approximately the same rate as predicted rates, empirical risks were higher than predicted (Hosmer-Lemeshow p Due to matching and selection on control status, the baseline distribution of questionnaire risk factors differed between the blood subcohorts and the controls from the nested case-control samples. The IPW-weighted distribution in controls closely matched the distribution in the full subcohorts, suggesting a well-calculated weight. We will present IPW-based validation of the risk model in the nested case-control samples (work in progress). Conclusions: These results confirm that breast cancer risk prediction models can discriminate between high-risk and low-risk women, but they also highlight that the accuracy of absolute risk estimates can vary across populations. Findings from this study can add insights into model improvement and model application. Moreover, the method of using IPW weights to approximate a full cohort analysis provides a potential solution for utilizing nested case-control studies in future validation analyses. This abstract is also being presented as Poster A05. Citation Format: Chi Gao, Parichoy Pal Choudhury, Paige Maas, Rulla Tamimi, Heather Eliassen, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft. Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies. [abstract]. In: Proceedings of the AACR Special Conference: Improving Cancer Risk Prediction for Prevention and Early Detection; Nov 16-19, 2016; Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2017;26(5 Suppl):Abstract nr PR02.\",\"PeriodicalId\":9487,\"journal\":{\"name\":\"Cancer Epidemiology and Prevention Biomarkers\",\"volume\":\"68 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Epidemiology and Prevention Biomarkers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1158/1538-7755.CARISK16-PR02\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Epidemiology and Prevention Biomarkers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1538-7755.CARISK16-PR02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
背景:在乳腺癌风险预测模型中加入遗传和其他生物标志物可以显著提高模型的辨别能力;然而,这些扩展的模型尚未在一系列人群中得到验证。特别是,尚未确定这些新模型的校准,预测的绝对风险与观察到的风险在多大程度上匹配。良好的校准对于确认这些风险模型在精确预防和治疗方案中的效用至关重要。大型队列研究为验证风险模型提供了理想的环境,因为它们可用于验证相对和绝对风险。然而,在实践中,遗传和生物标志物数据往往不能在整个队列中获得,而只能在病例和对照的子样本中获得。当子样本的抽样情况和控制规则已知时,可以使用逆抽样概率(IPW)权值来估计经验绝对风险。当采样规则未知或复杂时,可以根据匹配和其他包含标准将选择回归到子样本中,从而估计IPW权重。方法:我们评估了最近发表的乳腺癌风险预测模型的性能[Maas等]。JAMA Oncol 2016]护士健康研究(NHS)和护士健康研究II (NHSII)。我们首先评估了一个仅包括问卷数据(BMI、激素替代疗法(HRT)、饮酒、吸烟状况、身高、胎次、初潮和绝经年龄、初产年龄和乳腺癌家族史)的预测模型。这些数据可用于NHS和NHSII血液亚群的所有受试者:32,826名NHS女性(1990-2012年随访)和29,611名NHSII女性(1999-2013年)。然后,我们将验证一个模型,该模型包括问卷数据和基于92个已建立的风险snp的多基因风险评分。在血液亚群内嵌套的病例对照样本中可获得遗传数据:来自NHS的2308例乳腺癌病例和3344例对照,以及来自NHSII的612例乳腺癌病例和933例对照。我们在血液亚群中使用逻辑回归来估计对照组的IPW权重,以抽样作为对照作为结果和以下预测因子:基线年龄、绝经状态、HRT、绝经前妇女基线HRT使用时间和随访时间。我们使用iCARE软件包(Maas P, Chatterjee N, Wheeler W et al. 2015),根据已发表的模型、预测风险十分位数的5年和10年经验发病率、Hosmer-Lemeshow拟合优度和AUC统计,计算预测的5年和10年乳腺癌绝对风险。结果:对于没有遗传信息的风险模型,NHS血液亚群的预测风险范围为6.5/ 1000(第1十分位数)至20.1/ 1000(第10十分位数)。尽管经验风险以与预测率大致相同的速率在十分位数上增加,但经验风险高于预测(Hosmer-Lemeshow p)。由于对照状态的匹配和选择,问卷风险因素的基线分布在血液亚群和巢式病例对照样本的对照组之间存在差异。对照组的ipw加权分布与完整亚群的分布密切匹配,表明权重计算良好。我们将在嵌套的病例对照样本中展示基于ipw的风险模型验证(工作正在进行中)。结论:这些结果证实了乳腺癌风险预测模型可以区分高风险和低风险女性,但它们也强调了绝对风险估计的准确性在不同人群中存在差异。本研究结果可为模型的改进和应用提供参考。此外,使用IPW权重近似全队列分析的方法为在未来的验证分析中使用嵌套病例对照研究提供了一个潜在的解决方案。此摘要也以海报A05的形式呈现。引文格式:Chi Gao, Parichoy Pal Choudhury, Paige Maas, Rulla Tamimi, Heather Eliassen, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft。利用护士健康和护士健康II研究验证乳腺癌风险预测模型。[摘要]。摘自:AACR特别会议论文集:改进癌症风险预测以预防和早期发现;2016年11月16日至19日;费城(PA): AACR;癌症流行病学生物标志物pre2017;26(5增刊):摘要nr PR02。
Abstract PR02: Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies
Background: Adding genetic and other biomarkers to breast cancer risk prediction models could markedly improve model discrimination; however, these expanded models have not been validated in a range of populations. In particular, the calibration of these new models how well the predicted absolute risks match observed risks has not been established. Good calibration is essential to confirm the utility of these risk models in precision prevention and treatment programs. Large cohort studies provide an ideal setting to validate risk models, as they can be used to validate both relative and absolute risks. However, in practice, genetic and biomarker data are often not available in the full cohort, but only on a sub sample of cases and controls. When the rules for sampling cases and controls into the sub sample are known, inverse-probability-of-sampling (IPW) weights can be used to estimate empirical absolute risks. When the sampling rules are unknown or complicated, the IPW weights can be estimated by regressing selection into the sub sample on matching and other inclusion criteria. Methods: We evaluated the performance of recently published breast cancer risk prediction models [Maas et al. JAMA Oncol 2016] in the Nurses Health Study (NHS) and Nurses Health Study II (NHSII). We first assess a prediction model that only includes questionnaire data (BMI, hormone replacement therapy (HRT), alcohol consumption, smoking status, height, parity, age at menarche and menopause, age at first birth, and family history of breast cancer). These data are available on all subjects in the NHS and NHSII blood subcohorts: 32,826 women in NHS (with disease follow-up from 1990-2012) and 29,611 women in NHS II (1999-2013). We will then validate a model that includes both questionnaire data and a polygenic risk score based on 92 established risk SNPs. Genetic data are available on case-control samples nested within the blood subcohorts: 2308 breast cancer cases and 3344 controls from NHS and 612 breast cancer cases and 933 controls from NHSII. We estimated IPW weights among controls using logistic regression in the blood subcohorts, with sampling as control being the outcome and the following predictors: age at baseline, menopausal status, HRT, length of HRT use for premenopausal women at baseline, and length of follow up time. We used the iCARE software package (Maas P, Chatterjee N, Wheeler W et al. 2015) to calculate predicted 5 and 10-year absolute risks of breast cancer based on the published models, empirical 5 and 10-year incidence across deciles of predicted risk, and Hosmer-Lemeshow goodness of fit and AUC statistics. Results: For the risk model without genetic information, predicted risks in the blood subcohorts ranged from 6.5/1,000 (1st decile) to 20.1/1,000 (10th decile) for NHS. Although empirical risks increased across deciles at approximately the same rate as predicted rates, empirical risks were higher than predicted (Hosmer-Lemeshow p Due to matching and selection on control status, the baseline distribution of questionnaire risk factors differed between the blood subcohorts and the controls from the nested case-control samples. The IPW-weighted distribution in controls closely matched the distribution in the full subcohorts, suggesting a well-calculated weight. We will present IPW-based validation of the risk model in the nested case-control samples (work in progress). Conclusions: These results confirm that breast cancer risk prediction models can discriminate between high-risk and low-risk women, but they also highlight that the accuracy of absolute risk estimates can vary across populations. Findings from this study can add insights into model improvement and model application. Moreover, the method of using IPW weights to approximate a full cohort analysis provides a potential solution for utilizing nested case-control studies in future validation analyses. This abstract is also being presented as Poster A05. Citation Format: Chi Gao, Parichoy Pal Choudhury, Paige Maas, Rulla Tamimi, Heather Eliassen, Nilanjan Chatterjee, Montserrat Garcia-Closas, Peter Kraft. Validation of breast cancer risk prediction model using Nurses Health and Nurse Health II Studies. [abstract]. In: Proceedings of the AACR Special Conference: Improving Cancer Risk Prediction for Prevention and Early Detection; Nov 16-19, 2016; Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2017;26(5 Suppl):Abstract nr PR02.