{"title":"Prediction Under Interventions: Evaluation of Counterfactual Performance Using Longitudinal Observational Data.","authors":"Ruth H Keogh, Nan Van Geloven","doi":"10.1097/ede.0000000000001713","DOIUrl":null,"url":null,"abstract":"Predictions under interventions are estimates of what a person's risk of an outcome would be if they were to follow a particular treatment strategy, given their individual characteristics. Such predictions can give important input to medical decision-making. However, evaluating the predictive performance of interventional predictions is challenging. Standard ways of evaluating predictive performance do not apply when using observational data, because prediction under interventions involves obtaining predictions of the outcome under conditions that are different from those that are observed for a subset of individuals in the validation dataset. This work describes methods for evaluating counterfactual performance of predictions under interventions for time-to-event outcomes. This means we aim to assess how well predictions would match the validation data if all individuals had followed the treatment strategy under which predictions are made. We focus on counterfactual performance evaluation using longitudinal observational data, and under treatment strategies that involve sustaining a particular treatment regime over time. We introduce an estimation approach using artificial censoring and inverse probability weighting that involves creating a validation dataset mimicking the treatment strategy under which predictions are made. We extend measures of calibration, discrimination (c-index and cumulative/dynamic AUCt) and overall prediction error (Brier score) to allow assessment of counterfactual performance. The methods are evaluated using a simulation study, including scenarios in which the methods should detect poor performance. Applying our methods in the context of liver transplantation shows that our procedure allows quantification of the performance of predictions supporting crucial decisions on organ allocation.","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":"51 1","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/ede.0000000000001713","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Predictions under interventions are estimates of what a person's risk of an outcome would be if they were to follow a particular treatment strategy, given their individual characteristics. Such predictions can give important input to medical decision-making. However, evaluating the predictive performance of interventional predictions is challenging. Standard ways of evaluating predictive performance do not apply when using observational data, because prediction under interventions involves obtaining predictions of the outcome under conditions that are different from those that are observed for a subset of individuals in the validation dataset. This work describes methods for evaluating counterfactual performance of predictions under interventions for time-to-event outcomes. This means we aim to assess how well predictions would match the validation data if all individuals had followed the treatment strategy under which predictions are made. We focus on counterfactual performance evaluation using longitudinal observational data, and under treatment strategies that involve sustaining a particular treatment regime over time. We introduce an estimation approach using artificial censoring and inverse probability weighting that involves creating a validation dataset mimicking the treatment strategy under which predictions are made. We extend measures of calibration, discrimination (c-index and cumulative/dynamic AUCt) and overall prediction error (Brier score) to allow assessment of counterfactual performance. The methods are evaluated using a simulation study, including scenarios in which the methods should detect poor performance. Applying our methods in the context of liver transplantation shows that our procedure allows quantification of the performance of predictions supporting crucial decisions on organ allocation.
期刊介绍:
Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.