{"title":"Distilling interpretable causal trees from causal forests","authors":"Patrick Rehill","doi":"arxiv-2408.01023","DOIUrl":null,"url":null,"abstract":"Machine learning methods for estimating treatment effect heterogeneity\npromise greater flexibility than existing methods that test a few pre-specified\nhypotheses. However, one problem these methods can have is that it can be\nchallenging to extract insights from complicated machine learning models. A\nhigh-dimensional distribution of conditional average treatment effects may give\naccurate, individual-level estimates, but it can be hard to understand the\nunderlying patterns; hard to know what the implications of the analysis are.\nThis paper proposes the Distilled Causal Tree, a method for distilling a\nsingle, interpretable causal tree from a causal forest. This compares well to\nexisting methods of extracting a single tree, particularly in noisy data or\nhigh-dimensional data where there are many correlated features. Here it even\noutperforms the base causal forest in most simulations. Its estimates are\ndoubly robust and asymptotically normal just as those of the causal forest are.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning methods for estimating treatment effect heterogeneity
promise greater flexibility than existing methods that test a few pre-specified
hypotheses. However, one problem these methods can have is that it can be
challenging to extract insights from complicated machine learning models. A
high-dimensional distribution of conditional average treatment effects may give
accurate, individual-level estimates, but it can be hard to understand the
underlying patterns; hard to know what the implications of the analysis are.
This paper proposes the Distilled Causal Tree, a method for distilling a
single, interpretable causal tree from a causal forest. This compares well to
existing methods of extracting a single tree, particularly in noisy data or
high-dimensional data where there are many correlated features. Here it even
outperforms the base causal forest in most simulations. Its estimates are
doubly robust and asymptotically normal just as those of the causal forest are.