Calibration of Heterogeneous Treatment Effects in Randomized Experiments

IF 5.1 3区管理学 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Information Systems Research Pub Date : 2024-01-12 DOI:10.1287/isre.2021.0343

Yan Leng, Drew Dimmery

{"title":"Calibration of Heterogeneous Treatment Effects in Randomized Experiments","authors":"Yan Leng, Drew Dimmery","doi":"10.1287/isre.2021.0343","DOIUrl":null,"url":null,"abstract":"Machine learning is commonly used to estimate the heterogeneous treatment effects (HTEs) in randomized experiments. Using large-scale randomized experiments on Facebook and Criteo platforms, we observe substantial discrepancies between machine learning-based treatment effect estimates and difference-in-means estimates directly from the randomized experiment. This paper provides a two-step framework for practitioners and researchers to diagnose and rectify this discrepancy. We first introduce a diagnostic tool to assess whether bias exists in the model-based estimates from machine learning. If bias exists, we then offer a model-agnostic method to calibrate any HTE estimates to known, unbiased, subgroup difference-in-means estimates, ensuring that the sign and magnitude of the subgroup estimates approximate the model-free benchmarks. This calibration method requires no additional data and can be scaled for large data sets. To highlight potential sources of bias, we theoretically show that this bias can result from regularization, and further use synthetic simulation to show biases result from misspecification and high-dimensional features. We demonstrate the efficacy of our calibration method using extensive synthetic simulations and two real-world randomized experiments. We further demonstrate the practical value of this calibration in three typical policy-making settings: a prescriptive, budget-constrained optimization framework; a setting seeking to maximize multiple performance indicators; and a multitreatment uplift modeling setting.","PeriodicalId":48411,"journal":{"name":"Information Systems Research","volume":"35 1","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1287/isre.2021.0343","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning is commonly used to estimate the heterogeneous treatment effects (HTEs) in randomized experiments. Using large-scale randomized experiments on Facebook and Criteo platforms, we observe substantial discrepancies between machine learning-based treatment effect estimates and difference-in-means estimates directly from the randomized experiment. This paper provides a two-step framework for practitioners and researchers to diagnose and rectify this discrepancy. We first introduce a diagnostic tool to assess whether bias exists in the model-based estimates from machine learning. If bias exists, we then offer a model-agnostic method to calibrate any HTE estimates to known, unbiased, subgroup difference-in-means estimates, ensuring that the sign and magnitude of the subgroup estimates approximate the model-free benchmarks. This calibration method requires no additional data and can be scaled for large data sets. To highlight potential sources of bias, we theoretically show that this bias can result from regularization, and further use synthetic simulation to show biases result from misspecification and high-dimensional features. We demonstrate the efficacy of our calibration method using extensive synthetic simulations and two real-world randomized experiments. We further demonstrate the practical value of this calibration in three typical policy-making settings: a prescriptive, budget-constrained optimization framework; a setting seeking to maximize multiple performance indicators; and a multitreatment uplift modeling setting.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

校准随机实验中的异质治疗效果

机器学习通常用于估计随机实验中的异质性治疗效果（HTE）。利用 Facebook 和 Criteo 平台上的大规模随机实验，我们观察到基于机器学习的治疗效果估计值与直接来自随机实验的均值差估计值之间存在巨大差异。本文为从业人员和研究人员提供了一个两步框架，用于诊断和纠正这种差异。我们首先介绍了一种诊断工具，用于评估基于模型的机器学习估计值是否存在偏差。如果存在偏差，我们将提供一种与模型无关的方法，将任何 HTE 估计值校准为已知的、无偏见的、分组均值差估计值，确保分组估计值的符号和幅度接近无模型基准。这种校准方法不需要额外的数据，并可根据大型数据集进行调整。为了突出偏差的潜在来源，我们从理论上证明了正则化可能会导致偏差，并进一步使用合成模拟来证明错误规范和高维特征会导致偏差。我们通过大量的合成模拟和两个真实世界的随机实验证明了我们的校准方法的有效性。我们还进一步证明了这种校准方法在三种典型决策环境中的实用价值：一种规范性的、预算受限的优化框架；一种寻求多种绩效指标最大化的环境；以及一种多处理提升建模环境。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Systems Research Multiple-

CiteScore

9.10

自引率

8.20%

发文量

120

期刊介绍： ISR (Information Systems Research) is a journal of INFORMS, the Institute for Operations Research and the Management Sciences. Information Systems Research is a leading international journal of theory, research, and intellectual development, focused on information systems in organizations, institutions, the economy, and society.