Pub Date : 2026-02-23eCollection Date: 2026-01-01DOI: 10.1515/jci-2025-0033
Rebecca Knowlton, Layla Parast
Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes also extends to real-world public health and social science research, where randomized trials are often impractical. While standard methods for evaluating surrogate markers largely rely on the assumption of randomized treatment, there is a significant gap in applying these techniques to observational data, where the central challenge shifts to managing confounding. The few methods that do allow for non-randomized treatment/exposure do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in non-randomized data and implement this framework using meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify covariate profiles where the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.
{"title":"Assessing surrogate heterogeneity in real world data using meta-learners.","authors":"Rebecca Knowlton, Layla Parast","doi":"10.1515/jci-2025-0033","DOIUrl":"10.1515/jci-2025-0033","url":null,"abstract":"<p><p>Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes also extends to real-world public health and social science research, where randomized trials are often impractical. While standard methods for evaluating surrogate markers largely rely on the assumption of randomized treatment, there is a significant gap in applying these techniques to observational data, where the central challenge shifts to managing confounding. The few methods that do allow for non-randomized treatment/exposure do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in non-randomized data and implement this framework using meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify covariate profiles where the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":"20250033"},"PeriodicalIF":1.8,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12924684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-20eCollection Date: 2026-01-01DOI: 10.1515/jci-2023-0083
Lan Wen, Glen McGee
Standard methods for estimating average causal effects require complete observations of the exposure and confounders. In observational studies, however, missing data are ubiquitous. Motivated by a study on the effect of prescription opioids on mortality, we propose methods for estimating average causal effects when exposures and potential confounders may be missing. We consider missingness at random and additionally propose several specific missing not at random (MNAR) assumptions. Under our proposed MNAR assumptions, we show that the average causal effects are identified from the observed data and derive corresponding influence functions, which form the basis of our proposed estimators. Our simulations show that standard multiple imputation techniques paired with a complete data estimator is unbiased when data are missing at random (MAR) but can be biased otherwise. For each of the MNAR assumptions, we instead propose doubly robust targeted maximum likelihood estimators (TMLE), allowing misspecification of either (i) the outcome models or (ii) the exposure and missingness models. The proposed methods are suitable for any outcome types, and we apply them to a motivating study that examines the effect of prescription opioid usage on all-cause mortality using data from the National Health and Nutrition Examination Survey (NHANES).
{"title":"Estimating average causal effects with incomplete exposure and confounders.","authors":"Lan Wen, Glen McGee","doi":"10.1515/jci-2023-0083","DOIUrl":"https://doi.org/10.1515/jci-2023-0083","url":null,"abstract":"<p><p>Standard methods for estimating average causal effects require complete observations of the exposure and confounders. In observational studies, however, missing data are ubiquitous. Motivated by a study on the effect of prescription opioids on mortality, we propose methods for estimating average causal effects when exposures and potential confounders may be missing. We consider missingness at random and additionally propose several specific missing not at random (MNAR) assumptions. Under our proposed MNAR assumptions, we show that the average causal effects are identified from the observed data and derive corresponding influence functions, which form the basis of our proposed estimators. Our simulations show that standard multiple imputation techniques paired with a complete data estimator is unbiased when data are missing at random (MAR) but can be biased otherwise. For each of the MNAR assumptions, we instead propose doubly robust targeted maximum likelihood estimators (TMLE), allowing misspecification of either (i) the outcome models or (ii) the exposure and missingness models. The proposed methods are suitable for any outcome types, and we apply them to a motivating study that examines the effect of prescription opioid usage on all-cause mortality using data from the National Health and Nutrition Examination Survey (NHANES).</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":"20230083"},"PeriodicalIF":1.8,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12922761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-19DOI: 10.1515/jci-2024-0058
David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler
Understanding the complex interactions among multiple environmental exposures is critical for assessing their combined impact on health outcomes. This study introduces InterXshift, a novel semiparametric method that provides a nonparametric definition of interaction and facilitates both the discovery and efficient estimation of interaction effects in mixed exposures. Leveraging stochastic shift interventions and ensemble machine learning, InterXshift identifies and quantifies interactions through a model-independent target parameter, estimated using targeted maximum likelihood estimation (TMLE) and cross-validation. The approach contrasts expected outcomes from joint interventions against those from individual exposures, enabling the detection of synergistic and antagonistic interactions. Validation through simulations and application to the National Institute of Environmental Health Sciences (NIEHS) Mixtures Workshop data demonstrate InterXshift's efficacy in accurately identifying true interaction directions and consistently highlighting significant impacts. We apply our methodology to National Health and Nutrition Examination Survey (NHANES) data to understand the interaction effect (if any) of furan exposure on leukocyte telomere length. This method enhances the analysis of multi-exposure interactions within high-dimensional datasets, offering robust methodological improvements for elucidating complex exposure dynamics in environmental health research. Additionally, we provide an opensource implementation of InterXshift in the InterXshift R package, facilitating its adoption and application by the research community.
{"title":"Semiparametric discovery and estimation of interaction in mixed exposures using stochastic interventions.","authors":"David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler","doi":"10.1515/jci-2024-0058","DOIUrl":"10.1515/jci-2024-0058","url":null,"abstract":"<p><p>Understanding the complex interactions among multiple environmental exposures is critical for assessing their combined impact on health outcomes. This study introduces InterXshift, a novel semiparametric method that provides a nonparametric definition of interaction and facilitates both the discovery and efficient estimation of interaction effects in mixed exposures. Leveraging stochastic shift interventions and ensemble machine learning, InterXshift identifies and quantifies interactions through a model-independent target parameter, estimated using targeted maximum likelihood estimation (TMLE) and cross-validation. The approach contrasts expected outcomes from joint interventions against those from individual exposures, enabling the detection of synergistic and antagonistic interactions. Validation through simulations and application to the National Institute of Environmental Health Sciences (NIEHS) Mixtures Workshop data demonstrate InterXshift's efficacy in accurately identifying true interaction directions and consistently highlighting significant impacts. We apply our methodology to National Health and Nutrition Examination Survey (NHANES) data to understand the interaction effect (if any) of furan exposure on leukocyte telomere length. This method enhances the analysis of multi-exposure interactions within high-dimensional datasets, offering robust methodological improvements for elucidating complex exposure dynamics in environmental health research. Additionally, we provide an opensource implementation of InterXshift in the InterXshift R package, facilitating its adoption and application by the research community.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-13DOI: 10.1515/jci-2024-0049
Kaitlyn Lee, Alan Hubbard, Alejandro Schuler
The average treatment effect (ATE) is a common parameter estimated in causal inference literature, but it is only defined for binary exposures. Thus, despite concerns raised by some researchers, many studies seeking to estimate the causal effect of a continuous exposure create a new binary exposure variable by dichotomizing the continuous values into two categories. In this paper, we affirm binarization as a statistically valid method for answering causal questions about continuous exposures by showing the equivalence between the binarized ATE and the difference in the average outcomes of two specific modified treatment policies. These policies impose cut-offs corresponding to the binarized exposure variable and assume preservation of relative self-selection. Relative self-selection is the ratio of the probability density of an individual having an exposure equal to one value of the continuous exposure variable versus another. The policies assume that, for any two values of the exposure variable with non-zero probability density after the cut-off, this ratio will remain unchanged. Through this equivalence, we clarify the assumptions underlying binarization and discuss how to properly interpret the resulting estimator. Additionally, we introduce a new target parameter that can be computed after binarization that considers the observed world as a benchmark. We argue that this parameter addresses more relevant causal questions than the traditional binarized ATE parameter. We present a simulation study to illustrate the implications of these assumptions when analyzing data and to demonstrate how to correctly implement estimators of the parameters discussed. Finally, we present an application of this method to evaluate the effect of a law in the state of California which seeks to limit exposures to oil and gas wells on birth outcomes to further illustrate the underlying assumptions.
{"title":"Bridging binarization: causal inference with dichotomized continuous exposures.","authors":"Kaitlyn Lee, Alan Hubbard, Alejandro Schuler","doi":"10.1515/jci-2024-0049","DOIUrl":"10.1515/jci-2024-0049","url":null,"abstract":"<p><p>The average treatment effect (ATE) is a common parameter estimated in causal inference literature, but it is only defined for binary exposures. Thus, despite concerns raised by some researchers, many studies seeking to estimate the causal effect of a continuous exposure create a new binary exposure variable by dichotomizing the continuous values into two categories. In this paper, we affirm binarization as a statistically valid method for answering causal questions about continuous exposures by showing the equivalence between the binarized ATE and the difference in the average outcomes of two specific modified treatment policies. These policies impose cut-offs corresponding to the binarized exposure variable and assume preservation of relative self-selection. Relative self-selection is the ratio of the probability density of an individual having an exposure equal to one value of the continuous exposure variable versus another. The policies assume that, for any two values of the exposure variable with non-zero probability density after the cut-off, this ratio will remain unchanged. Through this equivalence, we clarify the assumptions underlying binarization and discuss how to properly interpret the resulting estimator. Additionally, we introduce a new target parameter that can be computed after binarization that considers the observed world as a benchmark. We argue that this parameter addresses more relevant causal questions than the traditional binarized ATE parameter. We present a simulation study to illustrate the implications of these assumptions when analyzing data and to demonstrate how to correctly implement estimators of the parameters discussed. Finally, we present an application of this method to evaluate the effect of a law in the state of California which seeks to limit exposures to oil and gas wells on birth outcomes to further illustrate the underlying assumptions.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920005/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-09DOI: 10.1515/jci-2024-0056
David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler
Regulations of chemical exposures often focus on individual substances, neglecting the amplified toxicity that can arise from multiple concurrent exposures. We propose a novel methodology to identify critical thresholds in multivariate exposure spaces and estimate the effects of policy interventions that limit exposures within these thresholds. Our approach employs a recursive partitioning algorithm integrated with targeted maximum likelihood estimation (TMLE) to discover regions in the exposure space where the expected outcome is minimized or maximized. To address potential overfitting bias from using the same data for threshold discovery and effect estimation, we utilize cross-validated TMLE (CV-TMLE), which ensures asymptotic unbiasedness and efficiency. Simulation studies demonstrate convergence to the optimal exposure region and accurate estimation of intervention effects. We apply our method to synthetic mixture data, successfully identifying true interactions, and to NHANES data, discovering harmful metal exposures affecting telomere length. Our approach provides a flexible and interpretable framework for policy-makers to assess the impact of exposure regulations, and we offer an open-source implementation in the CVtreeMLE R package.
{"title":"Discovery of critical thresholds in mixed exposures and estimation of policy intervention effects.","authors":"David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler","doi":"10.1515/jci-2024-0056","DOIUrl":"10.1515/jci-2024-0056","url":null,"abstract":"<p><p>Regulations of chemical exposures often focus on individual substances, neglecting the amplified toxicity that can arise from multiple concurrent exposures. We propose a novel methodology to identify critical thresholds in multivariate exposure spaces and estimate the effects of policy interventions that limit exposures within these thresholds. Our approach employs a recursive partitioning algorithm integrated with targeted maximum likelihood estimation (TMLE) to discover regions in the exposure space where the expected outcome is minimized or maximized. To address potential overfitting bias from using the same data for threshold discovery and effect estimation, we utilize cross-validated TMLE (CV-TMLE), which ensures asymptotic unbiasedness and efficiency. Simulation studies demonstrate convergence to the optimal exposure region and accurate estimation of intervention effects. We apply our method to synthetic mixture data, successfully identifying true interactions, and to NHANES data, discovering harmful metal exposures affecting telomere length. Our approach provides a flexible and interpretable framework for policy-makers to assess the impact of exposure regulations, and we offer an open-source implementation in the CVtreeMLE R package.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920006/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27eCollection Date: 2025-01-01DOI: 10.1515/jci-2024-0051
Stijn Vansteelandt, Paweł Morzywołek
Orthogonal meta-learners, such as DR-learner (Kennedy EH. Towards optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497 2020), R-learner (Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 2021;108:299-319) and IF-learner (Curth A, Alaa AM, van der Schaar M. Estimating structural target functions using machine learning and influence functions. arXiv preprint arXiv:2008.06461 2020), are increasingly used to estimate conditional average treatment effects. They are hoped to improve convergence rates relative to naïve meta-learners (e.g., T-, S- and X-learner (Künzel SR, Sekhon JS, Bickel PJ, Yu B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 2019;116:4156-65)) through de-biasing procedures that involve applying standard learners to specifically transformed outcome data. This leads them to disregard the possibly constrained outcome space, which can be particularly problematic for dichotomous outcomes: these typically get transformed to values that are no longer constrained to the unit interval, which may cause instability and makes it difficult for standard learners to guarantee predictions within the unit interval. To address this, we construct a non-orthogonal imputation-learner and an orthogonal 'i-learner' for the prediction of counterfactual outcomes, which respect the outcome space. These are more generally expected to outperform existing learners, even when the outcome is unconstrained, as we confirm empirically in simulation studies and an analysis of critical care data. Our development also sheds broader light onto the construction of orthogonal learners for other estimands.
{"title":"Orthogonal prediction of counterfactual outcomes.","authors":"Stijn Vansteelandt, Paweł Morzywołek","doi":"10.1515/jci-2024-0051","DOIUrl":"10.1515/jci-2024-0051","url":null,"abstract":"<p><p>Orthogonal meta-learners, such as DR-learner (Kennedy EH. Towards optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497 2020), R-learner (Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 2021;108:299-319) and IF-learner (Curth A, Alaa AM, van der Schaar M. Estimating structural target functions using machine learning and influence functions. arXiv preprint arXiv:2008.06461 2020), are increasingly used to estimate conditional average treatment effects. They are hoped to improve convergence rates relative to naïve meta-learners (e.g., T-, S- and X-learner (Künzel SR, Sekhon JS, Bickel PJ, Yu B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 2019;116:4156-65)) through de-biasing procedures that involve applying standard learners to specifically transformed outcome data. This leads them to disregard the possibly constrained outcome space, which can be particularly problematic for dichotomous outcomes: these typically get transformed to values that are no longer constrained to the unit interval, which may cause instability and makes it difficult for standard learners to guarantee predictions within the unit interval. To address this, we construct a non-orthogonal imputation-learner and an orthogonal 'i-learner' for the prediction of counterfactual outcomes, which respect the outcome space. These are more generally expected to outperform existing learners, even when the outcome is unconstrained, as we confirm empirically in simulation studies and an analysis of critical care data. Our development also sheds broader light onto the construction of orthogonal learners for other estimands.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"13 1","pages":"20240051"},"PeriodicalIF":1.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12658738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17eCollection Date: 2026-01-01DOI: 10.1515/jci-2025-0002
Debashis Ghosh, Lei Wang
There has been widespread use of causal inference methods for the rigorous analysis of observational studies and to identify policy evaluations. In this article, we consider a class of generalized coarsened procedures for confounding. At a high level, these procedures can be viewed as performing a clustering of confounding variables, followed by treatment effect and attendant variance estimation using the confounder strata. In addition, we propose two new algorithms for generalized coarsened confounding. While previous authors have developed some statistical properties for one special case in our class of procedures, we instead develop a general asymptotic framework. We provide asymptotic results for the average causal effect estimator as well as providing conditions for consistency. In addition, we provide an asymptotic justification for the variance formulae for coarsened exact matching. A bias correction technique is proposed, and we apply the proposed methodology to data from two well-known observational studies.
{"title":"Generalized coarsened confounding for causal effects: a large-sample framework.","authors":"Debashis Ghosh, Lei Wang","doi":"10.1515/jci-2025-0002","DOIUrl":"10.1515/jci-2025-0002","url":null,"abstract":"<p><p>There has been widespread use of causal inference methods for the rigorous analysis of observational studies and to identify policy evaluations. In this article, we consider a class of generalized coarsened procedures for confounding. At a high level, these procedures can be viewed as performing a clustering of confounding variables, followed by treatment effect and attendant variance estimation using the confounder strata. In addition, we propose two new algorithms for generalized coarsened confounding. While previous authors have developed some statistical properties for one special case in our class of procedures, we instead develop a general asymptotic framework. We provide asymptotic results for the average causal effect estimator as well as providing conditions for consistency. In addition, we provide an asymptotic justification for the variance formulae for coarsened exact matching. A bias correction technique is proposed, and we apply the proposed methodology to data from two well-known observational studies.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":"20250002"},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145935745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-03-05DOI: 10.1515/jci-2023-0020
Ting Ye, Qijia He, Shuxiao Chen, Bo Zhang
In an observational study, it is common to leverage known null effects to detect bias. One such strategy is to set aside a placebo sample - a subset of data immune from the hypothesized cause-and-effect relationship. Existence of an effect in the placebo sample raises concerns about unmeasured confounding bias while absence of it helps corroborate the causal conclusion. This paper describes a framework for using a placebo sample to detect and remove bias. We state the identification assumptions and develop estimation and inference methods based on outcome regression, inverse probability weighting, and doubly-robust approaches. Simulation studies investigate the finite-sample performance of the proposed methods. We illustrate the methods using an empirical study of the effect of the earned income tax credit on infant health.
{"title":"Role of placebo samples in observational studies.","authors":"Ting Ye, Qijia He, Shuxiao Chen, Bo Zhang","doi":"10.1515/jci-2023-0020","DOIUrl":"10.1515/jci-2023-0020","url":null,"abstract":"<p><p>In an observational study, it is common to leverage known null effects to detect bias. One such strategy is to set aside a placebo sample - a subset of data immune from the hypothesized cause-and-effect relationship. Existence of an effect in the placebo sample raises concerns about unmeasured confounding bias while absence of it helps corroborate the causal conclusion. This paper describes a framework for using a placebo sample to detect and remove bias. We state the identification assumptions and develop estimation and inference methods based on outcome regression, inverse probability weighting, and doubly-robust approaches. Simulation studies investigate the finite-sample performance of the proposed methods. We illustrate the methods using an empirical study of the effect of the earned income tax credit on infant health.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"13 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12345972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Configurational Comparative Methods (CCMs) aim to learn causal structures from datasets by exploiting Boolean sufficiency and necessity relationships. One important challenge for these methods is that such Boolean relationships are often not satisfied in real-life datasets, as these datasets usually contain noise. Hence, CCMs infer models that only approximately fit the data, introducing a risk of inferring incorrect or incomplete models, especially when data are also fragmented (have limited empirical diversity). To minimize this risk, evaluation measures for sufficiency and necessity should be sensitive to all relevant evidence. This article points out that the standard evaluation measures in CCMs, consistency and coverage, neglect certain evidence for these Boolean relationships. Correspondingly, two new measures, contrapositive consistency and contrapositive coverage, which are equivalent to the binary classification measures specificity and negative predictive value, respectively, are introduced to the CCM context as additions to consistency and coverage. A simulation experiment demonstrates that the introduced contrapositive measures indeed help to identify correct CCM models.
{"title":"Evaluating Boolean relationships in Configurational Comparative Methods","authors":"Luna De Souter","doi":"10.1515/jci-2023-0014","DOIUrl":"https://doi.org/10.1515/jci-2023-0014","url":null,"abstract":"Abstract Configurational Comparative Methods (CCMs) aim to learn causal structures from datasets by exploiting Boolean sufficiency and necessity relationships. One important challenge for these methods is that such Boolean relationships are often not satisfied in real-life datasets, as these datasets usually contain noise. Hence, CCMs infer models that only approximately fit the data, introducing a risk of inferring incorrect or incomplete models, especially when data are also fragmented (have limited empirical diversity). To minimize this risk, evaluation measures for sufficiency and necessity should be sensitive to all relevant evidence. This article points out that the standard evaluation measures in CCMs, consistency and coverage, neglect certain evidence for these Boolean relationships. Correspondingly, two new measures, contrapositive consistency and contrapositive coverage, which are equivalent to the binary classification measures specificity and negative predictive value, respectively, are introduced to the CCM context as additions to consistency and coverage. A simulation experiment demonstrates that the introduced contrapositive measures indeed help to identify correct CCM models.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"8 12","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139457038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-01-10DOI: 10.1515/jci-2023-0031
Amy J Pitts, Charlotte R Fowler
Many software packages have been developed to assist researchers in drawing directed acyclic graphs (DAGs), each with unique functionality and usability. We examine five of the most common software to generate DAGs: TikZ, DAGitty, ggdag, dagR, and igraph. For each package, we provide a general description of its background, analysis and visualization capabilities, and user-friendliness. Additionally in order to compare packages, we produce two DAGs in each software, the first featuring a simple confounding structure, while the second includes a more complex structure with three confounders and a mediator. We provide recommendations for when to use each software depending on the user's needs.
为了帮助研究人员绘制有向无环图(DAG),已经开发了许多软件包,每种软件都有独特的功能和可用性。我们研究了五种最常用的生成 DAG 的软件:TikZ、DAGitty、ggdag、dagR 和 igraph。我们对每个软件包的背景、分析和可视化能力以及用户友好性进行了总体描述。此外,为了对软件包进行比较,我们在每个软件中制作了两个 DAG,第一个 DAG 包含一个简单的混杂结构,第二个 DAG 包含一个包含三个混杂因素和一个中介因素的更复杂的结构。我们将根据用户的需求,为何时使用每种软件提供建议。
{"title":"Comparison of open-source software for producing directed acyclic graphs.","authors":"Amy J Pitts, Charlotte R Fowler","doi":"10.1515/jci-2023-0031","DOIUrl":"10.1515/jci-2023-0031","url":null,"abstract":"<p><p>Many software packages have been developed to assist researchers in drawing directed acyclic graphs (DAGs), each with unique functionality and usability. We examine five of the most common software to generate DAGs: Ti<i>k</i>Z, DAGitty, ggdag, dagR, and igraph. For each package, we provide a general description of its background, analysis and visualization capabilities, and user-friendliness. Additionally in order to compare packages, we produce two DAGs in each software, the first featuring a simple confounding structure, while the second includes a more complex structure with three confounders and a mediator. We provide recommendations for when to use each software depending on the user's needs.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"12 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10869111/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139742392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}