Estimator Comparison for the Prediction of Election Results

Pub Date : 2024-07-01 DOI:10.3390/stats7030040

M. Chalikias, Georgios X. Papageorgiou, Dimitrios P. Zarogiannis

{"title":"Estimator Comparison for the Prediction of Election Results","authors":"M. Chalikias, Georgios X. Papageorgiou, Dimitrios P. Zarogiannis","doi":"10.3390/stats7030040","DOIUrl":null,"url":null,"abstract":"Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available.","PeriodicalId":0,"journal":{"name":"","volume":"8 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/stats7030040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available.

查看原文

微信好友朋友圈 QQ好友复制链接

预测选举结果的估算器比较

聚类随机实验和估计器比较是有据可查的课题。本文使用美国总统选举（2012、2016、2020 年）的普选数据集，评估了三种聚类抽样估计器的特性（SE、MSE）：比率估计器、霍维茨-汤普森估计器和线性回归估计器。虽然比率估计器和 Horvitz-Thompson 估计器在聚类分析中得到了广泛应用，但我们提出了一种线性回归估计器，该估计器定义为不等聚类大小，在许多情况下，其性能优于其他两种估计器。本文的主要目的有两个。首先，指出哪个估计器最适合预测美国的普选结果。为此，我们对数据采用了单阶段聚类抽样技术。在第一个分区中，我们使用 50 个州加上哥伦比亚特区作为主要抽样单位，而在第二个分区中，我们使用 3112 个县作为主要抽样单位。其次，根据上述程序的结果，我们估算了设定标准误差下的样本聚类数，同时也考虑了增加样本聚类数带来的收益递减。在大多数情况下，线性回归估算结果都是最好的。如果有先前的投票结果，这种比较也可用于其他国家选举的估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助