Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations

IF 9.9 3区经济学 Q1 ECONOMICS Journal of Econometrics Pub Date : 2024-03-01 DOI:10.1016/j.jeconom.2020.09.013

Susan Athey , Guido W. Imbens , Jonas Metzger , Evan Munro

{"title":"Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations","authors":"Susan Athey , Guido W. Imbens , Jonas Metzger , Evan Munro","doi":"10.1016/j.jeconom.2020.09.013","DOIUrl":null,"url":null,"abstract":"<div>When researchers develop new econometric methods it is common practice to compare the performance of the new methods to those of existing methods in Monte Carlo studies. The credibility of such Monte Carlo studies is often limited because of the discretion the researcher has in choosing the Monte Carlo designs reported. To improve the credibility we propose using a class of generative models that has recently been developed in the machine learning literature, termed Generative Adversarial Networks (GANs) which can be used to systematically generate artificial data that closely mimics existing datasets. Thus, in combination with existing real data sets, GANs can be used to limit the degrees of freedom in Monte Carlo study designs for the researcher, making any comparisons more convincing. In addition, if an applied researcher is concerned with the performance of a particular statistical method on a specific data set (beyond its theoretical properties in large samples), she can use such GANs to assess the performance of the proposed method, e.g. the coverage rate of confidence intervals or the bias of the estimator, using simulated data which closely resembles the exact setting of interest. To illustrate these methods we apply Wasserstein GANs (WGANs) to the estimation of average treatment effects. In this example, we find that <math><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></math> there is not a single estimator that outperforms the others in all three settings, so researchers should tailor their analytic approach to a given setting, <math><mrow><mo>(</mo><mi>i</mi><mi>i</mi><mo>)</mo></mrow></math> systematic simulation studies can be helpful for selecting among competing methods in this situation, and <math><mrow><mo>(</mo><mi>i</mi><mi>i</mi><mi>i</mi><mo>)</mo></mrow></math> the generated data closely resemble the actual data.</div>","PeriodicalId":15629,"journal":{"name":"Journal of Econometrics","volume":"240 2","pages":"Article 105076"},"PeriodicalIF":9.9000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Econometrics","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304407621000440","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

When researchers develop new econometric methods it is common practice to compare the performance of the new methods to those of existing methods in Monte Carlo studies. The credibility of such Monte Carlo studies is often limited because of the discretion the researcher has in choosing the Monte Carlo designs reported. To improve the credibility we propose using a class of generative models that has recently been developed in the machine learning literature, termed Generative Adversarial Networks (GANs) which can be used to systematically generate artificial data that closely mimics existing datasets. Thus, in combination with existing real data sets, GANs can be used to limit the degrees of freedom in Monte Carlo study designs for the researcher, making any comparisons more convincing. In addition, if an applied researcher is concerned with the performance of a particular statistical method on a specific data set (beyond its theoretical properties in large samples), she can use such GANs to assess the performance of the proposed method, e.g. the coverage rate of confidence intervals or the bias of the estimator, using simulated data which closely resembles the exact setting of interest. To illustrate these methods we apply Wasserstein GANs (WGANs) to the estimation of average treatment effects. In this example, we find that $(i)$ there is not a single estimator that outperforms the others in all three settings, so researchers should tailor their analytic approach to a given setting, $(i i)$ systematic simulation studies can be helpful for selecting among competing methods in this situation, and $(i i i)$ the generated data closely resemble the actual data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用瓦瑟斯坦生成对抗网络设计蒙特卡罗模拟

当研究人员开发新的计量经济学方法时，通常的做法是在蒙特卡罗研究中将新方法的性能与现有方法的性能进行比较。这种蒙特卡罗研究的可信度往往有限，因为研究人员在选择所报告的蒙特卡罗设计时具有随意性。为了提高可信度，我们建议使用最近在机器学习文献中开发的一类生成模型，即生成对抗网络（GANs），它可以用来系统地生成与现有数据集非常相似的人工数据。因此，结合现有的真实数据集，GANs 可用于限制研究人员蒙特卡罗研究设计中的自由度，使任何比较更有说服力。此外，如果应用研究人员关注特定统计方法在特定数据集上的性能（超出其在大样本中的理论特性），她可以使用这种 GANs 评估所建议方法的性能，例如置信区间的覆盖率或估计器的偏差，使用的模拟数据与感兴趣的确切设置非常相似。为了说明这些方法，我们将 Wasserstein GANs（WGANs）应用于平均治疗效果的估计。在这个例子中，我们发现：(i) 没有一个估计器在所有三种情况下都优于其他估计器，因此研究人员应该根据给定的情况调整他们的分析方法；(ii) 在这种情况下，系统的模拟研究有助于在相互竞争的方法中进行选择；(iii) 生成的数据与实际数据非常相似。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Econometrics 社会科学-数学跨学科应用

CiteScore

8.60

自引率

1.60%

发文量

220

审稿时长

3-8 weeks

期刊介绍： The Journal of Econometrics serves as an outlet for important, high quality, new research in both theoretical and applied econometrics. The scope of the Journal includes papers dealing with identification, estimation, testing, decision, and prediction issues encountered in economic research. Classical Bayesian statistics, and machine learning methods, are decidedly within the range of the Journal''s interests. The Annals of Econometrics is a supplement to the Journal of Econometrics.

期刊最新文献

GLS under monotone heteroskedasticity Multivariate spatiotemporal models with low rank coefficient matrix Inference in cluster randomized trials with matched pairs Why are replication rates so low? On the spectral density of fractional Ornstein–Uhlenbeck processes