Abstract Randomized controlled trials (RCTs) are often considered the gold standard for estimating causal effect, but they may lack external validity when the population eligible to the RCT is substantially different from the target population. Having at hand a sample of the target population of interest allows us to generalize the causal effect. Identifying the treatment effect in the target population requires covariates to capture all treatment effect modifiers that are shifted between the two sets. Standard estimators then use either weighting (IPSW), outcome modeling (G-formula), or combine the two in doubly robust approaches (AIPSW). However, such covariates are often not available in both sets. In this article, after proving L 1 {L}^{1} -consistency of these three estimators, we compute the expected bias induced by a missing covariate, assuming a Gaussian distribution, a continuous outcome, and a semi-parametric model. Under this setting, we perform a sensitivity analysis for each missing covariate pattern and compute the sign of the expected bias. We also show that there is no gain in linearly imputing a partially unobserved covariate. Finally, we study the substitution of a missing covariate by a proxy. We illustrate all these results on simulations, as well as semi-synthetic benchmarks using data from the Tennessee student/teacher achievement ratio (STAR), and a real-world example from critical care medicine.
{"title":"Causal effect on a target population: A sensitivity analysis to handle missing covariates","authors":"B. Colnet, J. Josse, G. Varoquaux, Erwan Scornet","doi":"10.1515/jci-2021-0059","DOIUrl":"https://doi.org/10.1515/jci-2021-0059","url":null,"abstract":"Abstract Randomized controlled trials (RCTs) are often considered the gold standard for estimating causal effect, but they may lack external validity when the population eligible to the RCT is substantially different from the target population. Having at hand a sample of the target population of interest allows us to generalize the causal effect. Identifying the treatment effect in the target population requires covariates to capture all treatment effect modifiers that are shifted between the two sets. Standard estimators then use either weighting (IPSW), outcome modeling (G-formula), or combine the two in doubly robust approaches (AIPSW). However, such covariates are often not available in both sets. In this article, after proving L 1 {L}^{1} -consistency of these three estimators, we compute the expected bias induced by a missing covariate, assuming a Gaussian distribution, a continuous outcome, and a semi-parametric model. Under this setting, we perform a sensitivity analysis for each missing covariate pattern and compute the sign of the expected bias. We also show that there is no gain in linearly imputing a partially unobserved covariate. Finally, we study the substitution of a missing covariate by a proxy. We illustrate all these results on simulations, as well as semi-synthetic benchmarks using data from the Tennessee student/teacher achievement ratio (STAR), and a real-world example from critical care medicine.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"19 1","pages":"372 - 414"},"PeriodicalIF":1.4,"publicationDate":"2021-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88798547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johann A. Gagnon-Bartsch, Adam C. Sales, Edward Wu, Anthony F. Botelho, John A. Erickson, Luke W. Miratrix, N. Heffernan
Abstract Randomized controlled trials (RCTs) admit unconfounded design-based inference – randomization largely justifies the assumptions underlying statistical effect estimates – but often have limited sample sizes. However, researchers may have access to big observational data on covariates and outcomes from RCT nonparticipants. For example, data from A/B tests conducted within an educational technology platform exist alongside historical observational data drawn from student logs. We outline a design-based approach to using such observational data for variance reduction in RCTs. First, we use the observational data to train a machine learning algorithm predicting potential outcomes using covariates and then use that algorithm to generate predictions for RCT participants. Then, we use those predictions, perhaps alongside other covariates, to adjust causal effect estimates with a flexible, design-based covariate-adjustment routine. In this way, there is no danger of biases from the observational data leaking into the experimental estimates, which are guaranteed to be exactly unbiased regardless of whether the machine learning models are “correct” in any sense or whether the observational samples closely resemble RCT samples. We demonstrate the method in analyzing 33 randomized A/B tests and show that it decreases standard errors relative to other estimators, sometimes substantially.
{"title":"Precise unbiased estimation in randomized experiments using auxiliary observational data","authors":"Johann A. Gagnon-Bartsch, Adam C. Sales, Edward Wu, Anthony F. Botelho, John A. Erickson, Luke W. Miratrix, N. Heffernan","doi":"10.1515/jci-2022-0011","DOIUrl":"https://doi.org/10.1515/jci-2022-0011","url":null,"abstract":"Abstract Randomized controlled trials (RCTs) admit unconfounded design-based inference – randomization largely justifies the assumptions underlying statistical effect estimates – but often have limited sample sizes. However, researchers may have access to big observational data on covariates and outcomes from RCT nonparticipants. For example, data from A/B tests conducted within an educational technology platform exist alongside historical observational data drawn from student logs. We outline a design-based approach to using such observational data for variance reduction in RCTs. First, we use the observational data to train a machine learning algorithm predicting potential outcomes using covariates and then use that algorithm to generate predictions for RCT participants. Then, we use those predictions, perhaps alongside other covariates, to adjust causal effect estimates with a flexible, design-based covariate-adjustment routine. In this way, there is no danger of biases from the observational data leaking into the experimental estimates, which are guaranteed to be exactly unbiased regardless of whether the machine learning models are “correct” in any sense or whether the observational samples closely resemble RCT samples. We demonstrate the method in analyzing 33 randomized A/B tests and show that it decreases standard errors relative to other estimators, sometimes substantially.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"146 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86836047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We present a method for assessing the sensitivity of the true causal effect to unmeasured confounding. The method requires the analyst to set two intuitive parameters. Otherwise, the method is assumption free. The method returns an interval that contains the true causal effect and whose bounds are arbitrarily sharp, i.e., practically attainable. We show experimentally that our bounds can be tighter than those obtained by the method of Ding and VanderWeele, which, moreover, requires to set one more parameter than our method. Finally, we extend our method to bound the natural direct and indirect effects when there are measured mediators and unmeasured exposure–outcome confounding.
{"title":"Simple yet sharp sensitivity analysis for unmeasured confounding","authors":"J. Peña","doi":"10.1515/jci-2021-0041","DOIUrl":"https://doi.org/10.1515/jci-2021-0041","url":null,"abstract":"Abstract We present a method for assessing the sensitivity of the true causal effect to unmeasured confounding. The method requires the analyst to set two intuitive parameters. Otherwise, the method is assumption free. The method returns an interval that contains the true causal effect and whose bounds are arbitrarily sharp, i.e., practically attainable. We show experimentally that our bounds can be tighter than those obtained by the method of Ding and VanderWeele, which, moreover, requires to set one more parameter than our method. Finally, we extend our method to bound the natural direct and indirect effects when there are measured mediators and unmeasured exposure–outcome confounding.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"42 1","pages":"1 - 17"},"PeriodicalIF":1.4,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74560465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Perfect adaptation in a dynamical system is the phenomenon that one or more variables have an initial transient response to a persistent change in an external stimulus but revert to their original value as the system converges to equilibrium. With the help of the causal ordering algorithm, one can construct graphical representations of dynamical systems that represent the causal relations between the variables and the conditional independences in the equilibrium distribution. We apply these tools to formulate sufficient graphical conditions for identifying perfect adaptation from a set of first-order differential equations. Furthermore, we give sufficient conditions to test for the presence of perfect adaptation in experimental equilibrium data. We apply this method to a simple model for a protein signalling pathway and test its predictions in both simulations and using real-world protein expression data. We demonstrate that perfect adaptation can lead to misleading orientation of edges in the output of causal discovery algorithms.
{"title":"Causality and independence in perfectly adapted dynamical systems","authors":"Tineke Blom, J. Mooij","doi":"10.1515/jci-2021-0005","DOIUrl":"https://doi.org/10.1515/jci-2021-0005","url":null,"abstract":"Abstract Perfect adaptation in a dynamical system is the phenomenon that one or more variables have an initial transient response to a persistent change in an external stimulus but revert to their original value as the system converges to equilibrium. With the help of the causal ordering algorithm, one can construct graphical representations of dynamical systems that represent the causal relations between the variables and the conditional independences in the equilibrium distribution. We apply these tools to formulate sufficient graphical conditions for identifying perfect adaptation from a set of first-order differential equations. Furthermore, we give sufficient conditions to test for the presence of perfect adaptation in experimental equilibrium data. We apply this method to a simple model for a protein signalling pathway and test its predictions in both simulations and using real-world protein expression data. We demonstrate that perfect adaptation can lead to misleading orientation of edges in the output of causal discovery algorithms.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"18 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2021-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87517386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-26DOI: 10.12987/9780300255881-031
{"title":"Popular IV Designs","authors":"","doi":"10.12987/9780300255881-031","DOIUrl":"https://doi.org/10.12987/9780300255881-031","url":null,"abstract":"","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"78 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2021-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86622800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-26DOI: 10.12987/9780300255881-035
{"title":"Data Exercise: Survey of Adult Service Providers","authors":"","doi":"10.12987/9780300255881-035","DOIUrl":"https://doi.org/10.12987/9780300255881-035","url":null,"abstract":"","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"79 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2021-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81426386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}