自然实验估算器基准：新数据集和双重稳健算法

arXiv - STAT - Methodology Pub Date : 2024-09-06 DOI:arxiv-2409.04500

R. Teal Witter, Christopher Musco

{"title":"自然实验估算器基准：新数据集和双重稳健算法","authors":"R. Teal Witter, Christopher Musco","doi":"arxiv-2409.04500","DOIUrl":null,"url":null,"abstract":"Estimating the effect of treatments from natural experiments, where\ntreatments are pre-assigned, is an important and well-studied problem. We\nintroduce a novel natural experiment dataset obtained from an early childhood\nliteracy nonprofit. Surprisingly, applying over 20 established estimators to\nthe dataset produces inconsistent results in evaluating the nonprofit's\nefficacy. To address this, we create a benchmark to evaluate estimator accuracy\nusing synthetic outcomes, whose design was guided by domain experts. The\nbenchmark extensively explores performance as real world conditions like sample\nsize, treatment correlation, and propensity score accuracy vary. Based on our\nbenchmark, we observe that the class of doubly robust treatment effect\nestimators, which are based on simple and intuitive regression adjustment,\ngenerally outperform other more complicated estimators by orders of magnitude.\nTo better support our theoretical understanding of doubly robust estimators, we\nderive a closed form expression for the variance of any such estimator that\nuses dataset splitting to obtain an unbiased estimate. This expression\nmotivates the design of a new doubly robust estimator that uses a novel loss\nfunction when fitting functions for regression adjustment. We release the\ndataset and benchmark in a Python package; the package is built in a modular\nway to facilitate new datasets and estimators.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm\",\"authors\":\"R. Teal Witter, Christopher Musco\",\"doi\":\"arxiv-2409.04500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating the effect of treatments from natural experiments, where\\ntreatments are pre-assigned, is an important and well-studied problem. We\\nintroduce a novel natural experiment dataset obtained from an early childhood\\nliteracy nonprofit. Surprisingly, applying over 20 established estimators to\\nthe dataset produces inconsistent results in evaluating the nonprofit's\\nefficacy. To address this, we create a benchmark to evaluate estimator accuracy\\nusing synthetic outcomes, whose design was guided by domain experts. The\\nbenchmark extensively explores performance as real world conditions like sample\\nsize, treatment correlation, and propensity score accuracy vary. Based on our\\nbenchmark, we observe that the class of doubly robust treatment effect\\nestimators, which are based on simple and intuitive regression adjustment,\\ngenerally outperform other more complicated estimators by orders of magnitude.\\nTo better support our theoretical understanding of doubly robust estimators, we\\nderive a closed form expression for the variance of any such estimator that\\nuses dataset splitting to obtain an unbiased estimate. This expression\\nmotivates the design of a new doubly robust estimator that uses a novel loss\\nfunction when fitting functions for regression adjustment. We release the\\ndataset and benchmark in a Python package; the package is built in a modular\\nway to facilitate new datasets and estimators.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

从自然实验中估计治疗效果是一个重要的、经过深入研究的问题，因为自然实验中的治疗是预先分配的。我们引入了一个新的自然实验数据集，该数据集来自一家非营利性儿童早期扫盲机构。令人惊讶的是，在数据集上应用超过 20 种既定的估计方法，在评估该非营利组织的有效性时产生了不一致的结果。为了解决这个问题，我们创建了一个基准，利用合成结果来评估估计器的准确性，其设计由领域专家指导。该基准广泛探讨了样本大小、治疗相关性和倾向得分准确性等现实条件发生变化时的性能。为了更好地支持我们对双重稳健估计器的理论理解，我们为任何此类估计器的方差求出了一个封闭形式的表达式，该估计器使用数据集分割来获得无偏估计。这个表达式促使我们设计了一种新的双重稳健估计器，它在拟合回归调整函数时使用了一种新的损失函数。我们在 Python 软件包中发布了数据集和基准；该软件包以模块化方式构建，以便于使用新的数据集和估计器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm

Estimating the effect of treatments from natural experiments, where treatments are pre-assigned, is an important and well-studied problem. We introduce a novel natural experiment dataset obtained from an early childhood literacy nonprofit. Surprisingly, applying over 20 established estimators to the dataset produces inconsistent results in evaluating the nonprofit's efficacy. To address this, we create a benchmark to evaluate estimator accuracy using synthetic outcomes, whose design was guided by domain experts. The benchmark extensively explores performance as real world conditions like sample size, treatment correlation, and propensity score accuracy vary. Based on our benchmark, we observe that the class of doubly robust treatment effect estimators, which are based on simple and intuitive regression adjustment, generally outperform other more complicated estimators by orders of magnitude. To better support our theoretical understanding of doubly robust estimators, we derive a closed form expression for the variance of any such estimator that uses dataset splitting to obtain an unbiased estimate. This expression motivates the design of a new doubly robust estimator that uses a novel loss function when fitting functions for regression adjustment. We release the dataset and benchmark in a Python package; the package is built in a modular way to facilitate new datasets and estimators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Methodology

自引率

0.00%

发文量