自然实验估算器基准:新数据集和双重稳健算法

R. Teal Witter, Christopher Musco
{"title":"自然实验估算器基准:新数据集和双重稳健算法","authors":"R. Teal Witter, Christopher Musco","doi":"arxiv-2409.04500","DOIUrl":null,"url":null,"abstract":"Estimating the effect of treatments from natural experiments, where\ntreatments are pre-assigned, is an important and well-studied problem. We\nintroduce a novel natural experiment dataset obtained from an early childhood\nliteracy nonprofit. Surprisingly, applying over 20 established estimators to\nthe dataset produces inconsistent results in evaluating the nonprofit's\nefficacy. To address this, we create a benchmark to evaluate estimator accuracy\nusing synthetic outcomes, whose design was guided by domain experts. The\nbenchmark extensively explores performance as real world conditions like sample\nsize, treatment correlation, and propensity score accuracy vary. Based on our\nbenchmark, we observe that the class of doubly robust treatment effect\nestimators, which are based on simple and intuitive regression adjustment,\ngenerally outperform other more complicated estimators by orders of magnitude.\nTo better support our theoretical understanding of doubly robust estimators, we\nderive a closed form expression for the variance of any such estimator that\nuses dataset splitting to obtain an unbiased estimate. This expression\nmotivates the design of a new doubly robust estimator that uses a novel loss\nfunction when fitting functions for regression adjustment. We release the\ndataset and benchmark in a Python package; the package is built in a modular\nway to facilitate new datasets and estimators.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm\",\"authors\":\"R. Teal Witter, Christopher Musco\",\"doi\":\"arxiv-2409.04500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating the effect of treatments from natural experiments, where\\ntreatments are pre-assigned, is an important and well-studied problem. We\\nintroduce a novel natural experiment dataset obtained from an early childhood\\nliteracy nonprofit. Surprisingly, applying over 20 established estimators to\\nthe dataset produces inconsistent results in evaluating the nonprofit's\\nefficacy. To address this, we create a benchmark to evaluate estimator accuracy\\nusing synthetic outcomes, whose design was guided by domain experts. The\\nbenchmark extensively explores performance as real world conditions like sample\\nsize, treatment correlation, and propensity score accuracy vary. Based on our\\nbenchmark, we observe that the class of doubly robust treatment effect\\nestimators, which are based on simple and intuitive regression adjustment,\\ngenerally outperform other more complicated estimators by orders of magnitude.\\nTo better support our theoretical understanding of doubly robust estimators, we\\nderive a closed form expression for the variance of any such estimator that\\nuses dataset splitting to obtain an unbiased estimate. This expression\\nmotivates the design of a new doubly robust estimator that uses a novel loss\\nfunction when fitting functions for regression adjustment. We release the\\ndataset and benchmark in a Python package; the package is built in a modular\\nway to facilitate new datasets and estimators.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从自然实验中估计治疗效果是一个重要的、经过深入研究的问题,因为自然实验中的治疗是预先分配的。我们引入了一个新的自然实验数据集,该数据集来自一家非营利性儿童早期扫盲机构。令人惊讶的是,在数据集上应用超过 20 种既定的估计方法,在评估该非营利组织的有效性时产生了不一致的结果。为了解决这个问题,我们创建了一个基准,利用合成结果来评估估计器的准确性,其设计由领域专家指导。该基准广泛探讨了样本大小、治疗相关性和倾向得分准确性等现实条件发生变化时的性能。为了更好地支持我们对双重稳健估计器的理论理解,我们为任何此类估计器的方差求出了一个封闭形式的表达式,该估计器使用数据集分割来获得无偏估计。这个表达式促使我们设计了一种新的双重稳健估计器,它在拟合回归调整函数时使用了一种新的损失函数。我们在 Python 软件包中发布了数据集和基准;该软件包以模块化方式构建,以便于使用新的数据集和估计器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm
Estimating the effect of treatments from natural experiments, where treatments are pre-assigned, is an important and well-studied problem. We introduce a novel natural experiment dataset obtained from an early childhood literacy nonprofit. Surprisingly, applying over 20 established estimators to the dataset produces inconsistent results in evaluating the nonprofit's efficacy. To address this, we create a benchmark to evaluate estimator accuracy using synthetic outcomes, whose design was guided by domain experts. The benchmark extensively explores performance as real world conditions like sample size, treatment correlation, and propensity score accuracy vary. Based on our benchmark, we observe that the class of doubly robust treatment effect estimators, which are based on simple and intuitive regression adjustment, generally outperform other more complicated estimators by orders of magnitude. To better support our theoretical understanding of doubly robust estimators, we derive a closed form expression for the variance of any such estimator that uses dataset splitting to obtain an unbiased estimate. This expression motivates the design of a new doubly robust estimator that uses a novel loss function when fitting functions for regression adjustment. We release the dataset and benchmark in a Python package; the package is built in a modular way to facilitate new datasets and estimators.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Poisson approximate likelihood compared to the particle filter Optimising the Trade-Off Between Type I and Type II Errors: A Review and Extensions Bias Reduction in Matched Observational Studies with Continuous Treatments: Calipered Non-Bipartite Matching and Bias-Corrected Estimation and Inference Forecasting age distribution of life-table death counts via α-transformation Probability-scale residuals for event-time data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1