{"title":"Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm","authors":"R. Teal Witter, Christopher Musco","doi":"arxiv-2409.04500","DOIUrl":null,"url":null,"abstract":"Estimating the effect of treatments from natural experiments, where\ntreatments are pre-assigned, is an important and well-studied problem. We\nintroduce a novel natural experiment dataset obtained from an early childhood\nliteracy nonprofit. Surprisingly, applying over 20 established estimators to\nthe dataset produces inconsistent results in evaluating the nonprofit's\nefficacy. To address this, we create a benchmark to evaluate estimator accuracy\nusing synthetic outcomes, whose design was guided by domain experts. The\nbenchmark extensively explores performance as real world conditions like sample\nsize, treatment correlation, and propensity score accuracy vary. Based on our\nbenchmark, we observe that the class of doubly robust treatment effect\nestimators, which are based on simple and intuitive regression adjustment,\ngenerally outperform other more complicated estimators by orders of magnitude.\nTo better support our theoretical understanding of doubly robust estimators, we\nderive a closed form expression for the variance of any such estimator that\nuses dataset splitting to obtain an unbiased estimate. This expression\nmotivates the design of a new doubly robust estimator that uses a novel loss\nfunction when fitting functions for regression adjustment. We release the\ndataset and benchmark in a Python package; the package is built in a modular\nway to facilitate new datasets and estimators.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Estimating the effect of treatments from natural experiments, where
treatments are pre-assigned, is an important and well-studied problem. We
introduce a novel natural experiment dataset obtained from an early childhood
literacy nonprofit. Surprisingly, applying over 20 established estimators to
the dataset produces inconsistent results in evaluating the nonprofit's
efficacy. To address this, we create a benchmark to evaluate estimator accuracy
using synthetic outcomes, whose design was guided by domain experts. The
benchmark extensively explores performance as real world conditions like sample
size, treatment correlation, and propensity score accuracy vary. Based on our
benchmark, we observe that the class of doubly robust treatment effect
estimators, which are based on simple and intuitive regression adjustment,
generally outperform other more complicated estimators by orders of magnitude.
To better support our theoretical understanding of doubly robust estimators, we
derive a closed form expression for the variance of any such estimator that
uses dataset splitting to obtain an unbiased estimate. This expression
motivates the design of a new doubly robust estimator that uses a novel loss
function when fitting functions for regression adjustment. We release the
dataset and benchmark in a Python package; the package is built in a modular
way to facilitate new datasets and estimators.