首页 > 最新文献

Journal of Causal Inference最新文献

英文 中文
Assessing surrogate heterogeneity in real world data using meta-learners. 使用元学习器评估真实世界数据中的代理异质性。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-23 eCollection Date: 2026-01-01 DOI: 10.1515/jci-2025-0033
Rebecca Knowlton, Layla Parast

Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes also extends to real-world public health and social science research, where randomized trials are often impractical. While standard methods for evaluating surrogate markers largely rely on the assumption of randomized treatment, there is a significant gap in applying these techniques to observational data, where the central challenge shifts to managing confounding. The few methods that do allow for non-randomized treatment/exposure do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in non-randomized data and implement this framework using meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify covariate profiles where the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.

替代标记物最常在随机临床试验中进行研究。然而,对替代结果的需求也延伸到现实世界的公共卫生和社会科学研究,在这些领域,随机试验往往是不切实际的。虽然评估替代标记物的标准方法在很大程度上依赖于随机治疗的假设,但在将这些技术应用于观察数据方面存在重大差距,其中主要挑战转移到管理混杂。少数允许非随机治疗/暴露的方法不能提供一种方法来检查患者特征方面的替代异质性。在本文中,我们提出了一个框架来评估非随机数据中的代理异质性,并使用元学习器实现该框架。我们的方法允许我们量化相对于患者特征的代理强度的异质性,同时通过使用灵活的、现成的机器学习方法来适应混杂因素。此外,我们使用我们的框架来识别协变量概要,其中代理是主要结果的有效替代。我们通过模拟研究和应用来检验替代空腹血糖的A1c血红蛋白的异质性,从而检验我们方法的性能。
{"title":"Assessing surrogate heterogeneity in real world data using meta-learners.","authors":"Rebecca Knowlton, Layla Parast","doi":"10.1515/jci-2025-0033","DOIUrl":"10.1515/jci-2025-0033","url":null,"abstract":"<p><p>Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes also extends to real-world public health and social science research, where randomized trials are often impractical. While standard methods for evaluating surrogate markers largely rely on the assumption of randomized treatment, there is a significant gap in applying these techniques to observational data, where the central challenge shifts to managing confounding. The few methods that do allow for non-randomized treatment/exposure do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in non-randomized data and implement this framework using meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify covariate profiles where the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":"20250033"},"PeriodicalIF":1.8,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12924684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating average causal effects with incomplete exposure and confounders. 估计不完全暴露和混杂因素的平均因果效应。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-02-20 eCollection Date: 2026-01-01 DOI: 10.1515/jci-2023-0083
Lan Wen, Glen McGee

Standard methods for estimating average causal effects require complete observations of the exposure and confounders. In observational studies, however, missing data are ubiquitous. Motivated by a study on the effect of prescription opioids on mortality, we propose methods for estimating average causal effects when exposures and potential confounders may be missing. We consider missingness at random and additionally propose several specific missing not at random (MNAR) assumptions. Under our proposed MNAR assumptions, we show that the average causal effects are identified from the observed data and derive corresponding influence functions, which form the basis of our proposed estimators. Our simulations show that standard multiple imputation techniques paired with a complete data estimator is unbiased when data are missing at random (MAR) but can be biased otherwise. For each of the MNAR assumptions, we instead propose doubly robust targeted maximum likelihood estimators (TMLE), allowing misspecification of either (i) the outcome models or (ii) the exposure and missingness models. The proposed methods are suitable for any outcome types, and we apply them to a motivating study that examines the effect of prescription opioid usage on all-cause mortality using data from the National Health and Nutrition Examination Survey (NHANES).

估计平均因果效应的标准方法需要对暴露和混杂因素进行全面观察。然而,在观察性研究中,缺失的数据是普遍存在的。在一项关于处方阿片类药物对死亡率影响的研究的激励下,我们提出了在暴露和潜在混杂因素可能缺失的情况下估计平均因果效应的方法。我们考虑了随机缺失,并提出了几个具体的非随机缺失(MNAR)假设。在我们提出的MNAR假设下,我们证明了从观测数据中识别出平均因果效应,并推导出相应的影响函数,这些影响函数构成了我们提出的估计量的基础。我们的模拟表明,当数据随机缺失(MAR)时,与完整数据估计器配对的标准多重插值技术是无偏的,但在其他情况下可能是偏的。对于每个MNAR假设,我们提出了双鲁棒目标最大似然估计器(TMLE),允许(i)结果模型或(ii)暴露和缺失模型的错误说明。建议的方法适用于任何结果类型,我们将其应用于一项激励研究,该研究使用国家健康和营养检查调查(NHANES)的数据来检查处方阿片类药物使用对全因死亡率的影响。
{"title":"Estimating average causal effects with incomplete exposure and confounders.","authors":"Lan Wen, Glen McGee","doi":"10.1515/jci-2023-0083","DOIUrl":"https://doi.org/10.1515/jci-2023-0083","url":null,"abstract":"<p><p>Standard methods for estimating average causal effects require complete observations of the exposure and confounders. In observational studies, however, missing data are ubiquitous. Motivated by a study on the effect of prescription opioids on mortality, we propose methods for estimating average causal effects when exposures and potential confounders may be missing. We consider missingness at random and additionally propose several specific missing not at random (MNAR) assumptions. Under our proposed MNAR assumptions, we show that the average causal effects are identified from the observed data and derive corresponding influence functions, which form the basis of our proposed estimators. Our simulations show that standard multiple imputation techniques paired with a complete data estimator is unbiased when data are missing at random (MAR) but can be biased otherwise. For each of the MNAR assumptions, we instead propose doubly robust targeted maximum likelihood estimators (TMLE), allowing misspecification of either (i) the outcome models or (ii) the exposure and missingness models. The proposed methods are suitable for any outcome types, and we apply them to a motivating study that examines the effect of prescription opioid usage on all-cause mortality using data from the National Health and Nutrition Examination Survey (NHANES).</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":"20230083"},"PeriodicalIF":1.8,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12922761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric discovery and estimation of interaction in mixed exposures using stochastic interventions. 使用随机干预的混合暴露中相互作用的半参数发现和估计。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2026-01-19 DOI: 10.1515/jci-2024-0058
David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler

Understanding the complex interactions among multiple environmental exposures is critical for assessing their combined impact on health outcomes. This study introduces InterXshift, a novel semiparametric method that provides a nonparametric definition of interaction and facilitates both the discovery and efficient estimation of interaction effects in mixed exposures. Leveraging stochastic shift interventions and ensemble machine learning, InterXshift identifies and quantifies interactions through a model-independent target parameter, estimated using targeted maximum likelihood estimation (TMLE) and cross-validation. The approach contrasts expected outcomes from joint interventions against those from individual exposures, enabling the detection of synergistic and antagonistic interactions. Validation through simulations and application to the National Institute of Environmental Health Sciences (NIEHS) Mixtures Workshop data demonstrate InterXshift's efficacy in accurately identifying true interaction directions and consistently highlighting significant impacts. We apply our methodology to National Health and Nutrition Examination Survey (NHANES) data to understand the interaction effect (if any) of furan exposure on leukocyte telomere length. This method enhances the analysis of multi-exposure interactions within high-dimensional datasets, offering robust methodological improvements for elucidating complex exposure dynamics in environmental health research. Additionally, we provide an opensource implementation of InterXshift in the InterXshift R package, facilitating its adoption and application by the research community.

了解多种环境暴露之间复杂的相互作用对于评估它们对健康结果的综合影响至关重要。本文介绍了一种新的半参数方法InterXshift,该方法提供了相互作用的非参数定义,有助于发现和有效估计混合暴露中的相互作用效应。利用随机移位干预和集成机器学习,InterXshift通过独立于模型的目标参数识别和量化相互作用,使用目标最大似然估计(TMLE)和交叉验证进行估计。该方法将联合干预的预期结果与个体暴露的预期结果进行了对比,从而能够检测到协同和拮抗相互作用。通过模拟和应用于国家环境健康科学研究所(NIEHS)混合物车间数据的验证表明,InterXshift在准确识别真正的相互作用方向和持续突出重要影响方面的有效性。我们将我们的方法应用于国家健康和营养检查调查(NHANES)数据,以了解呋喃暴露对白细胞端粒长度的相互作用效应(如果有的话)。该方法增强了对高维数据集中多暴露相互作用的分析,为阐明环境健康研究中的复杂暴露动力学提供了强有力的方法改进。此外,我们在InterXshift R包中提供了InterXshift的开源实现,促进了研究社区对其的采用和应用。
{"title":"Semiparametric discovery and estimation of interaction in mixed exposures using stochastic interventions.","authors":"David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler","doi":"10.1515/jci-2024-0058","DOIUrl":"10.1515/jci-2024-0058","url":null,"abstract":"<p><p>Understanding the complex interactions among multiple environmental exposures is critical for assessing their combined impact on health outcomes. This study introduces InterXshift, a novel semiparametric method that provides a nonparametric definition of interaction and facilitates both the discovery and efficient estimation of interaction effects in mixed exposures. Leveraging stochastic shift interventions and ensemble machine learning, InterXshift identifies and quantifies interactions through a model-independent target parameter, estimated using targeted maximum likelihood estimation (TMLE) and cross-validation. The approach contrasts expected outcomes from joint interventions against those from individual exposures, enabling the detection of synergistic and antagonistic interactions. Validation through simulations and application to the National Institute of Environmental Health Sciences (NIEHS) Mixtures Workshop data demonstrate InterXshift's efficacy in accurately identifying true interaction directions and consistently highlighting significant impacts. We apply our methodology to National Health and Nutrition Examination Survey (NHANES) data to understand the interaction effect (if any) of furan exposure on leukocyte telomere length. This method enhances the analysis of multi-exposure interactions within high-dimensional datasets, offering robust methodological improvements for elucidating complex exposure dynamics in environmental health research. Additionally, we provide an opensource implementation of InterXshift in the InterXshift R package, facilitating its adoption and application by the research community.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging binarization: causal inference with dichotomized continuous exposures. 桥接二值化:二分类连续暴露的因果推理。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2026-01-13 DOI: 10.1515/jci-2024-0049
Kaitlyn Lee, Alan Hubbard, Alejandro Schuler

The average treatment effect (ATE) is a common parameter estimated in causal inference literature, but it is only defined for binary exposures. Thus, despite concerns raised by some researchers, many studies seeking to estimate the causal effect of a continuous exposure create a new binary exposure variable by dichotomizing the continuous values into two categories. In this paper, we affirm binarization as a statistically valid method for answering causal questions about continuous exposures by showing the equivalence between the binarized ATE and the difference in the average outcomes of two specific modified treatment policies. These policies impose cut-offs corresponding to the binarized exposure variable and assume preservation of relative self-selection. Relative self-selection is the ratio of the probability density of an individual having an exposure equal to one value of the continuous exposure variable versus another. The policies assume that, for any two values of the exposure variable with non-zero probability density after the cut-off, this ratio will remain unchanged. Through this equivalence, we clarify the assumptions underlying binarization and discuss how to properly interpret the resulting estimator. Additionally, we introduce a new target parameter that can be computed after binarization that considers the observed world as a benchmark. We argue that this parameter addresses more relevant causal questions than the traditional binarized ATE parameter. We present a simulation study to illustrate the implications of these assumptions when analyzing data and to demonstrate how to correctly implement estimators of the parameters discussed. Finally, we present an application of this method to evaluate the effect of a law in the state of California which seeks to limit exposures to oil and gas wells on birth outcomes to further illustrate the underlying assumptions.

平均治疗效果(ATE)是因果推理文献中估计的一个常见参数,但它仅用于二元暴露。因此,尽管一些研究人员提出了担忧,但许多试图估计连续暴露的因果效应的研究通过将连续值分为两类来创建新的二元暴露变量。在本文中,我们通过显示二值化ATE与两种特定改良治疗政策的平均结果差异之间的等效性,确认二值化是一种统计上有效的方法,可以回答有关连续暴露的因果问题。这些政策施加了与二值化暴露变量相对应的截止值,并假设保持了相对自我选择。相对自我选择是个体暴露等于连续暴露变量的一个值与另一个值的概率密度之比。政策假设,对于截止后概率密度非零的暴露变量的任意两个值,该比值保持不变。通过这个等价,我们澄清了二值化的假设,并讨论了如何正确解释得到的估计量。此外,我们引入了一个新的目标参数,可以在二值化后计算,将观察到的世界作为基准。我们认为,与传统的二值化ATE参数相比,该参数解决了更多相关的因果问题。我们提出了一个模拟研究来说明这些假设在分析数据时的含义,并演示如何正确实现所讨论的参数的估计器。最后,我们提出了该方法的应用,以评估加利福尼亚州一项法律的影响,该法律旨在限制石油和天然气井对出生结果的暴露,以进一步说明潜在的假设。
{"title":"Bridging binarization: causal inference with dichotomized continuous exposures.","authors":"Kaitlyn Lee, Alan Hubbard, Alejandro Schuler","doi":"10.1515/jci-2024-0049","DOIUrl":"10.1515/jci-2024-0049","url":null,"abstract":"<p><p>The average treatment effect (ATE) is a common parameter estimated in causal inference literature, but it is only defined for binary exposures. Thus, despite concerns raised by some researchers, many studies seeking to estimate the causal effect of a continuous exposure create a new binary exposure variable by dichotomizing the continuous values into two categories. In this paper, we affirm binarization as a statistically valid method for answering causal questions about continuous exposures by showing the equivalence between the binarized ATE and the difference in the average outcomes of two specific modified treatment policies. These policies impose cut-offs corresponding to the binarized exposure variable and assume preservation of relative self-selection. Relative self-selection is the ratio of the probability density of an individual having an exposure equal to one value of the continuous exposure variable versus another. The policies assume that, for any two values of the exposure variable with non-zero probability density after the cut-off, this ratio will remain unchanged. Through this equivalence, we clarify the assumptions underlying binarization and discuss how to properly interpret the resulting estimator. Additionally, we introduce a new target parameter that can be computed after binarization that considers the observed world as a benchmark. We argue that this parameter addresses more relevant causal questions than the traditional binarized ATE parameter. We present a simulation study to illustrate the implications of these assumptions when analyzing data and to demonstrate how to correctly implement estimators of the parameters discussed. Finally, we present an application of this method to evaluate the effect of a law in the state of California which seeks to limit exposures to oil and gas wells on birth outcomes to further illustrate the underlying assumptions.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920005/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovery of critical thresholds in mixed exposures and estimation of policy intervention effects. 发现混合风险敞口的临界阈值和估计政策干预效果。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-01-01 Epub Date: 2026-01-09 DOI: 10.1515/jci-2024-0056
David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler

Regulations of chemical exposures often focus on individual substances, neglecting the amplified toxicity that can arise from multiple concurrent exposures. We propose a novel methodology to identify critical thresholds in multivariate exposure spaces and estimate the effects of policy interventions that limit exposures within these thresholds. Our approach employs a recursive partitioning algorithm integrated with targeted maximum likelihood estimation (TMLE) to discover regions in the exposure space where the expected outcome is minimized or maximized. To address potential overfitting bias from using the same data for threshold discovery and effect estimation, we utilize cross-validated TMLE (CV-TMLE), which ensures asymptotic unbiasedness and efficiency. Simulation studies demonstrate convergence to the optimal exposure region and accurate estimation of intervention effects. We apply our method to synthetic mixture data, successfully identifying true interactions, and to NHANES data, discovering harmful metal exposures affecting telomere length. Our approach provides a flexible and interpretable framework for policy-makers to assess the impact of exposure regulations, and we offer an open-source implementation in the CVtreeMLE R package.

有关化学物质接触的规定往往侧重于个别物质,而忽略了多重同时接触可能产生的放大毒性。我们提出了一种新的方法来确定多变量暴露空间的临界阈值,并估计将暴露限制在这些阈值内的政策干预的效果。我们的方法采用递归划分算法与目标最大似然估计(TMLE)相结合,以发现暴露空间中预期结果最小化或最大化的区域。为了解决使用相同数据进行阈值发现和效果估计的潜在过拟合偏差,我们使用了交叉验证的TMLE (CV-TMLE),它确保了渐近无偏性和效率。仿真研究表明,该方法收敛于最优暴露区域,并能准确估计干预效果。我们将我们的方法应用于合成混合物数据,成功地识别了真正的相互作用,并应用于NHANES数据,发现了影响端粒长度的有害金属暴露。我们的方法为政策制定者提供了一个灵活且可解释的框架,以评估暴露法规的影响,并且我们在CVtreeMLE R包中提供了一个开源实现。
{"title":"Discovery of critical thresholds in mixed exposures and estimation of policy intervention effects.","authors":"David B McCoy, Alan Hubbard, Mark van der Laan, Alejandro Schuler","doi":"10.1515/jci-2024-0056","DOIUrl":"10.1515/jci-2024-0056","url":null,"abstract":"<p><p>Regulations of chemical exposures often focus on individual substances, neglecting the amplified toxicity that can arise from multiple concurrent exposures. We propose a novel methodology to identify critical thresholds in multivariate exposure spaces and estimate the effects of policy interventions that limit exposures within these thresholds. Our approach employs a recursive partitioning algorithm integrated with targeted maximum likelihood estimation (TMLE) to discover regions in the exposure space where the expected outcome is minimized or maximized. To address potential overfitting bias from using the same data for threshold discovery and effect estimation, we utilize cross-validated TMLE (CV-TMLE), which ensures asymptotic unbiasedness and efficiency. Simulation studies demonstrate convergence to the optimal exposure region and accurate estimation of intervention effects. We apply our method to synthetic mixture data, successfully identifying true interactions, and to NHANES data, discovering harmful metal exposures affecting telomere length. Our approach provides a flexible and interpretable framework for policy-makers to assess the impact of exposure regulations, and we offer an open-source implementation in the CVtreeMLE R package.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920006/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthogonal prediction of counterfactual outcomes. 反事实结果的正交预测。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-27 eCollection Date: 2025-01-01 DOI: 10.1515/jci-2024-0051
Stijn Vansteelandt, Paweł Morzywołek

Orthogonal meta-learners, such as DR-learner (Kennedy EH. Towards optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497 2020), R-learner (Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 2021;108:299-319) and IF-learner (Curth A, Alaa AM, van der Schaar M. Estimating structural target functions using machine learning and influence functions. arXiv preprint arXiv:2008.06461 2020), are increasingly used to estimate conditional average treatment effects. They are hoped to improve convergence rates relative to naïve meta-learners (e.g., T-, S- and X-learner (Künzel SR, Sekhon JS, Bickel PJ, Yu B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 2019;116:4156-65)) through de-biasing procedures that involve applying standard learners to specifically transformed outcome data. This leads them to disregard the possibly constrained outcome space, which can be particularly problematic for dichotomous outcomes: these typically get transformed to values that are no longer constrained to the unit interval, which may cause instability and makes it difficult for standard learners to guarantee predictions within the unit interval. To address this, we construct a non-orthogonal imputation-learner and an orthogonal 'i-learner' for the prediction of counterfactual outcomes, which respect the outcome space. These are more generally expected to outperform existing learners, even when the outcome is unconstrained, as we confirm empirically in simulation studies and an analysis of critical care data. Our development also sheds broader light onto the construction of orthogonal learners for other estimands.

正交元学习器,如DR-learner (Kennedy EH)。对异质性因果效应的最优双稳健估计。[2]聂晓霞,Wager S.异构治疗效果的准神谕估计。生物统计学2021;[j] .基于机器学习和影响函数的结构目标函数估计[j] .北京:北京交通大学。arXiv预印arXiv:2008.06461 2020),越来越多地用于估计条件平均治疗效果。他们希望提高相对于naïve元学习器(例如,T-, S-和x -学习器)的收敛率。k nzel SR, Sekhon JS, Bickel PJ, Yu B.元学习器用于使用机器学习估计异质治疗效果。国家科学进展(2019);(116:4156-65))通过将标准学习器应用于特定转换结果数据的去偏程序。这导致他们忽略了可能受到约束的结果空间,这对于二分类结果来说尤其成问题:这些结果通常被转换为不再受单位区间约束的值,这可能会导致不稳定,并使标准学习器难以保证在单位区间内进行预测。为了解决这个问题,我们构建了一个非正交的假设学习器和一个正交的“i-学习器”来预测反事实结果,它们尊重结果空间。正如我们在模拟研究和重症监护数据分析中证实的那样,即使在结果不受约束的情况下,这些更普遍地被期望优于现有的学习者。我们的发展也为其他估计的正交学习器的构建提供了更广泛的启示。
{"title":"Orthogonal prediction of counterfactual outcomes.","authors":"Stijn Vansteelandt, Paweł Morzywołek","doi":"10.1515/jci-2024-0051","DOIUrl":"10.1515/jci-2024-0051","url":null,"abstract":"<p><p>Orthogonal meta-learners, such as DR-learner (Kennedy EH. Towards optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497 2020), R-learner (Nie X, Wager S. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 2021;108:299-319) and IF-learner (Curth A, Alaa AM, van der Schaar M. Estimating structural target functions using machine learning and influence functions. arXiv preprint arXiv:2008.06461 2020), are increasingly used to estimate conditional average treatment effects. They are hoped to improve convergence rates relative to naïve meta-learners (e.g., T-, S- and X-learner (Künzel SR, Sekhon JS, Bickel PJ, Yu B. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 2019;116:4156-65)) through de-biasing procedures that involve applying standard learners to specifically transformed outcome data. This leads them to disregard the possibly constrained outcome space, which can be particularly problematic for dichotomous outcomes: these typically get transformed to values that are no longer constrained to the unit interval, which may cause instability and makes it difficult for standard learners to guarantee predictions within the unit interval. To address this, we construct a non-orthogonal imputation-learner and an orthogonal 'i-learner' for the prediction of counterfactual outcomes, which respect the outcome space. These are more generally expected to outperform existing learners, even when the outcome is unconstrained, as we confirm empirically in simulation studies and an analysis of critical care data. Our development also sheds broader light onto the construction of orthogonal learners for other estimands.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"13 1","pages":"20240051"},"PeriodicalIF":1.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12658738/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized coarsened confounding for causal effects: a large-sample framework. 因果效应的广义粗化混淆:大样本框架。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-11-17 eCollection Date: 2026-01-01 DOI: 10.1515/jci-2025-0002
Debashis Ghosh, Lei Wang

There has been widespread use of causal inference methods for the rigorous analysis of observational studies and to identify policy evaluations. In this article, we consider a class of generalized coarsened procedures for confounding. At a high level, these procedures can be viewed as performing a clustering of confounding variables, followed by treatment effect and attendant variance estimation using the confounder strata. In addition, we propose two new algorithms for generalized coarsened confounding. While previous authors have developed some statistical properties for one special case in our class of procedures, we instead develop a general asymptotic framework. We provide asymptotic results for the average causal effect estimator as well as providing conditions for consistency. In addition, we provide an asymptotic justification for the variance formulae for coarsened exact matching. A bias correction technique is proposed, and we apply the proposed methodology to data from two well-known observational studies.

因果推理方法已被广泛用于观察性研究的严格分析和确定政策评估。在本文中,我们考虑了一类用于混淆的广义粗化过程。在高层次上,这些程序可以被看作是执行混杂变量的聚类,然后使用混杂层进行治疗效果和随之而来的方差估计。此外,我们还提出了两种新的广义粗化混杂算法。虽然以前的作者已经为我们这类过程中的一个特殊情况开发了一些统计性质,但我们却开发了一个一般的渐近框架。我们给出了平均因果效应估计量的渐近结果,并提供了一致性的条件。此外,我们还为粗化精确匹配的方差公式提供了渐近证明。提出了一种偏差校正技术,并将该方法应用于两项著名观察性研究的数据。
{"title":"Generalized coarsened confounding for causal effects: a large-sample framework.","authors":"Debashis Ghosh, Lei Wang","doi":"10.1515/jci-2025-0002","DOIUrl":"10.1515/jci-2025-0002","url":null,"abstract":"<p><p>There has been widespread use of causal inference methods for the rigorous analysis of observational studies and to identify policy evaluations. In this article, we consider a class of generalized coarsened procedures for confounding. At a high level, these procedures can be viewed as performing a clustering of confounding variables, followed by treatment effect and attendant variance estimation using the confounder strata. In addition, we propose two new algorithms for generalized coarsened confounding. While previous authors have developed some statistical properties for one special case in our class of procedures, we instead develop a general asymptotic framework. We provide asymptotic results for the average causal effect estimator as well as providing conditions for consistency. In addition, we provide an asymptotic justification for the variance formulae for coarsened exact matching. A bias correction technique is proposed, and we apply the proposed methodology to data from two well-known observational studies.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"14 1","pages":"20250002"},"PeriodicalIF":1.8,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777950/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145935745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Role of placebo samples in observational studies. 安慰剂样本在观察性研究中的作用。
IF 1.8 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-01 Epub Date: 2025-03-05 DOI: 10.1515/jci-2023-0020
Ting Ye, Qijia He, Shuxiao Chen, Bo Zhang

In an observational study, it is common to leverage known null effects to detect bias. One such strategy is to set aside a placebo sample - a subset of data immune from the hypothesized cause-and-effect relationship. Existence of an effect in the placebo sample raises concerns about unmeasured confounding bias while absence of it helps corroborate the causal conclusion. This paper describes a framework for using a placebo sample to detect and remove bias. We state the identification assumptions and develop estimation and inference methods based on outcome regression, inverse probability weighting, and doubly-robust approaches. Simulation studies investigate the finite-sample performance of the proposed methods. We illustrate the methods using an empirical study of the effect of the earned income tax credit on infant health.

在观察性研究中,利用已知的零效应来检测偏差是很常见的。其中一种策略是留出安慰剂样本——不受假设因果关系影响的数据子集。安慰剂样本中效果的存在引起了对无法测量的混杂偏差的担忧,而不存在它有助于证实因果结论。本文描述了一个使用安慰剂样本来检测和消除偏见的框架。我们陈述了识别假设,并基于结果回归、逆概率加权和双稳健方法开发了估计和推理方法。仿真研究了所提出方法的有限样本性能。我们说明了方法使用的经验研究的影响,所得所得税抵免对婴儿健康。
{"title":"Role of placebo samples in observational studies.","authors":"Ting Ye, Qijia He, Shuxiao Chen, Bo Zhang","doi":"10.1515/jci-2023-0020","DOIUrl":"10.1515/jci-2023-0020","url":null,"abstract":"<p><p>In an observational study, it is common to leverage known null effects to detect bias. One such strategy is to set aside a placebo sample - a subset of data immune from the hypothesized cause-and-effect relationship. Existence of an effect in the placebo sample raises concerns about unmeasured confounding bias while absence of it helps corroborate the causal conclusion. This paper describes a framework for using a placebo sample to detect and remove bias. We state the identification assumptions and develop estimation and inference methods based on outcome regression, inverse probability weighting, and doubly-robust approaches. Simulation studies investigate the finite-sample performance of the proposed methods. We illustrate the methods using an empirical study of the effect of the earned income tax credit on infant health.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"13 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12345972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Boolean relationships in Configurational Comparative Methods 在配置比较法中评估布尔关系
IF 1.4 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 DOI: 10.1515/jci-2023-0014
Luna De Souter
Abstract Configurational Comparative Methods (CCMs) aim to learn causal structures from datasets by exploiting Boolean sufficiency and necessity relationships. One important challenge for these methods is that such Boolean relationships are often not satisfied in real-life datasets, as these datasets usually contain noise. Hence, CCMs infer models that only approximately fit the data, introducing a risk of inferring incorrect or incomplete models, especially when data are also fragmented (have limited empirical diversity). To minimize this risk, evaluation measures for sufficiency and necessity should be sensitive to all relevant evidence. This article points out that the standard evaluation measures in CCMs, consistency and coverage, neglect certain evidence for these Boolean relationships. Correspondingly, two new measures, contrapositive consistency and contrapositive coverage, which are equivalent to the binary classification measures specificity and negative predictive value, respectively, are introduced to the CCM context as additions to consistency and coverage. A simulation experiment demonstrates that the introduced contrapositive measures indeed help to identify correct CCM models.
摘要 配置比较方法(CCM)旨在通过利用布尔充分性和必要性关系,从数据集中学习因果结构。这些方法面临的一个重要挑战是,这种布尔关系在现实生活数据集中往往无法满足,因为这些数据集通常包含噪声。因此,CCM 只能推断出近似符合数据的模型,这就带来了推断出不正确或不完整模型的风险,尤其是当数据也是碎片化的(经验多样性有限)时。为了最大限度地降低这种风险,充分性和必要性的评估措施应该对所有相关证据敏感。本文指出,CCM 的标准评估指标--一致性和覆盖率--忽略了这些布尔关系的某些证据。因此,本文在 CCM 中引入了两个新的评估指标--对偶一致性和对偶覆盖率,这两个指标分别相当于二元分类的特异性和负预测值,是对一致性和覆盖率的补充。模拟实验证明,引入的对等度量确实有助于识别正确的 CCM 模型。
{"title":"Evaluating Boolean relationships in Configurational Comparative Methods","authors":"Luna De Souter","doi":"10.1515/jci-2023-0014","DOIUrl":"https://doi.org/10.1515/jci-2023-0014","url":null,"abstract":"Abstract Configurational Comparative Methods (CCMs) aim to learn causal structures from datasets by exploiting Boolean sufficiency and necessity relationships. One important challenge for these methods is that such Boolean relationships are often not satisfied in real-life datasets, as these datasets usually contain noise. Hence, CCMs infer models that only approximately fit the data, introducing a risk of inferring incorrect or incomplete models, especially when data are also fragmented (have limited empirical diversity). To minimize this risk, evaluation measures for sufficiency and necessity should be sensitive to all relevant evidence. This article points out that the standard evaluation measures in CCMs, consistency and coverage, neglect certain evidence for these Boolean relationships. Correspondingly, two new measures, contrapositive consistency and contrapositive coverage, which are equivalent to the binary classification measures specificity and negative predictive value, respectively, are introduced to the CCM context as additions to consistency and coverage. A simulation experiment demonstrates that the introduced contrapositive measures indeed help to identify correct CCM models.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"8 12","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139457038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of open-source software for producing directed acyclic graphs. 制作有向无环图的开源软件比较。
IF 1.4 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-01-01 Epub Date: 2024-01-10 DOI: 10.1515/jci-2023-0031
Amy J Pitts, Charlotte R Fowler

Many software packages have been developed to assist researchers in drawing directed acyclic graphs (DAGs), each with unique functionality and usability. We examine five of the most common software to generate DAGs: TikZ, DAGitty, ggdag, dagR, and igraph. For each package, we provide a general description of its background, analysis and visualization capabilities, and user-friendliness. Additionally in order to compare packages, we produce two DAGs in each software, the first featuring a simple confounding structure, while the second includes a more complex structure with three confounders and a mediator. We provide recommendations for when to use each software depending on the user's needs.

为了帮助研究人员绘制有向无环图(DAG),已经开发了许多软件包,每种软件都有独特的功能和可用性。我们研究了五种最常用的生成 DAG 的软件:TikZ、DAGitty、ggdag、dagR 和 igraph。我们对每个软件包的背景、分析和可视化能力以及用户友好性进行了总体描述。此外,为了对软件包进行比较,我们在每个软件中制作了两个 DAG,第一个 DAG 包含一个简单的混杂结构,第二个 DAG 包含一个包含三个混杂因素和一个中介因素的更复杂的结构。我们将根据用户的需求,为何时使用每种软件提供建议。
{"title":"Comparison of open-source software for producing directed acyclic graphs.","authors":"Amy J Pitts, Charlotte R Fowler","doi":"10.1515/jci-2023-0031","DOIUrl":"10.1515/jci-2023-0031","url":null,"abstract":"<p><p>Many software packages have been developed to assist researchers in drawing directed acyclic graphs (DAGs), each with unique functionality and usability. We examine five of the most common software to generate DAGs: Ti<i>k</i>Z, DAGitty, ggdag, dagR, and igraph. For each package, we provide a general description of its background, analysis and visualization capabilities, and user-friendliness. Additionally in order to compare packages, we produce two DAGs in each software, the first featuring a simple confounding structure, while the second includes a more complex structure with three confounders and a mediator. We provide recommendations for when to use each software depending on the user's needs.</p>","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"12 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10869111/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139742392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Causal Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1