Treatment of sample under-representation and skewed heavy-tailed distributions in survey-based microsimulation: An analysis of redistribution effects in compulsory health care insurance in Switzerland

AStA Wirtschafts- und Sozialstatistisches Archiv Pub Date : 2020-09-01 DOI:10.1007/s11943-020-00275-8

Tobias Schoch, André Müller

{"title":"Treatment of sample under-representation and skewed heavy-tailed distributions in survey-based microsimulation: An analysis of redistribution effects in compulsory health care insurance in Switzerland","authors":"Tobias Schoch, André Müller","doi":"10.1007/s11943-020-00275-8","DOIUrl":null,"url":null,"abstract":"<div><p> The credibility of microsimulation modeling with the research community and policymakers depends on high-quality baseline surveys. Quality problems with the baseline survey tend to impair the quality of microsimulation built on top of the survey data. We address two potential issues that both relate to skewed and heavy-tailed distributions.</p><p>First, we find that ultra-high-income households are under-represented in the baseline household survey. Moreover, the sample estimate of average income underestimates the known population average. Although the Deville–Särndal calibration method corrects the under-representation, it cannot achieve alignment of estimated average income in the right tail of the distribution with known population values without distorting the empirical income distribution. To overcome the problem, we introduce a Pareto tail model. With the help of the tail model, we can adjust the sample income distribution in the tail to meet the alignment targets. Our method can be a useful tool for microsimulation modelers working with survey income data.</p><p>The second contribution refers to the treatment of an outlier-prone variable that has been added to the survey by record linkage (our empirical example is health care cost). The nature of the baseline survey is not affected by record linkage, that is, the baseline survey still covers only a small part of the population. Hence, the sampling weights are relatively large. An outlying observation together with a high sampling weight can heavily influence or even ruin an estimate of a population characteristic. Thus, we argue that it is beneficial—in terms of mean square error—to use robust estimation and alignment methods, because robust methods are less affected by the presence of outliers.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"14 3-4","pages":"267 - 304"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s11943-020-00275-8","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AStA Wirtschafts- und Sozialstatistisches Archiv","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s11943-020-00275-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The credibility of microsimulation modeling with the research community and policymakers depends on high-quality baseline surveys. Quality problems with the baseline survey tend to impair the quality of microsimulation built on top of the survey data. We address two potential issues that both relate to skewed and heavy-tailed distributions.

First, we find that ultra-high-income households are under-represented in the baseline household survey. Moreover, the sample estimate of average income underestimates the known population average. Although the Deville–Särndal calibration method corrects the under-representation, it cannot achieve alignment of estimated average income in the right tail of the distribution with known population values without distorting the empirical income distribution. To overcome the problem, we introduce a Pareto tail model. With the help of the tail model, we can adjust the sample income distribution in the tail to meet the alignment targets. Our method can be a useful tool for microsimulation modelers working with survey income data.

The second contribution refers to the treatment of an outlier-prone variable that has been added to the survey by record linkage (our empirical example is health care cost). The nature of the baseline survey is not affected by record linkage, that is, the baseline survey still covers only a small part of the population. Hence, the sampling weights are relatively large. An outlying observation together with a high sampling weight can heavily influence or even ruin an estimate of a population characteristic. Thus, we argue that it is beneficial—in terms of mean square error—to use robust estimation and alignment methods, because robust methods are less affected by the presence of outliers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于调查的微观模拟中样本代表性不足和偏态重尾分布的处理：瑞士强制性医疗保险中再分配效应的分析

微观模拟模型在研究界和决策者中的可信度取决于高质量的基线调查。基线调查的质量问题往往会损害建立在调查数据之上的微观模拟的质量。我们讨论了两个潜在的问题，这两个问题都与偏斜和重尾分布有关。首先，我们发现超高收入家庭在基线家庭调查中的代表性不足。此外，对平均收入的抽样估计低估了已知的人口平均数。尽管Deville–Särndal校准方法纠正了表示不足的情况，但在不扭曲经验收入分布的情况下，它无法实现分布右尾的估计平均收入与已知人口值的一致性。为了克服这个问题，我们引入了一个Pareto尾部模型。借助尾部模型，我们可以调整尾部的样本收入分布，以满足对齐目标。我们的方法对于处理调查收入数据的微观模拟建模人员来说是一个有用的工具。第二个贡献是指对通过记录链接添加到调查中的异常值倾向变量的处理（我们的经验例子是医疗保健成本）。基线调查的性质不受记录联系的影响，即基线调查仍然只覆盖一小部分人口。因此，采样权重相对较大。一个孤立的观察加上高采样权重可能会严重影响甚至破坏对种群特征的估计。因此，我们认为，就均方误差而言，使用稳健的估计和对齐方法是有益的，因为稳健的方法较少受到异常值的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

AStA Wirtschafts- und Sozialstatistisches Archiv

自引率

0.00%

发文量

期刊最新文献

Measuring the productivity effects of digital capital—a conceptual approach Vorwort der Herausgeber Interview mit Helmut Küchenhoff Vorwort der Herausgeber Connecting algorithmic fairness to quality dimensions in machine learning in official statistics and survey production