Double machine learning and automated confounder selection: A cautionary tale

IF 1.7 4区医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Journal of Causal Inference Pub Date : 2021-08-25 DOI:10.1515/jci-2022-0078

Paul Hünermund, Beyers Louw, Itamar Caspi

引用次数: 7

Abstract

Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

双重机器学习和自动混淆选择:一个警世故事

双机器学习(DML)已经成为一种越来越流行的高维环境中自动变量选择的工具。尽管处理大量潜在协变量的能力可以使可观测选择假设更加合理，但同时也存在内生变量被包括在内的风险，这将导致违反条件独立性。本文证明了DML对协变量空间中仅包含少数“坏控制”非常敏感。由此产生的偏差随理论因果模型的性质而变化，这引起了人们对以数据驱动的方式选择控制变量的可行性的关注。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Causal Inference Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

1.90

自引率

14.30%

发文量

审稿时长

86 weeks

期刊介绍： Journal of Causal Inference (JCI) publishes papers on theoretical and applied causal research across the range of academic disciplines that use quantitative tools to study causality.