A Supervised Machine Learning Procedure to Detect Electoral Fraud Using Digital Analysis

Francisco Cantú, Sebastián Saiegh
{"title":"A Supervised Machine Learning Procedure to Detect Electoral Fraud Using Digital Analysis","authors":"Francisco Cantú, Sebastián Saiegh","doi":"10.2139/ssrn.1594406","DOIUrl":null,"url":null,"abstract":"This paper introduces a naive Bayes classifier to detect electoral fraud using digit patterns in vote counts with authentic and synthetic data. The procedure is the following: (1) we create 10,000 simulated electoral contests between two parties using Monte Carlo methods. This training set is composed of two disjoint subsets: one containing electoral returns that follow a Benford distribution, and another where the vote counts are purposively \"manipulated\" by electoral tampering – a percentage of votes are taken away from one party and given to the other; (2) we calibrate membership values of the simulated elections (i.e. clean or fraudulent) using logistic regression; (3) we recover class-conditional densities using the relative frequencies from the training set; (4) we apply Bayes' rule to class-conditional probabilities and class priors to establish the membership probabilities of authentic observations. To illustrate our technique, we examine elections in the province of Buenos Aires (Argentina) between 1932 and 1942, a period with a checkered history of fraud. Our analysis allows us to successfully classify electoral contests according to their degree of fraud. More generally, our findings indicate that Benford's Law is an effective tool for identifying fraud, even when minimal information (i.e. electoral returns) is available.","PeriodicalId":117077,"journal":{"name":"Political Methods: Computational eJournal","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Political Methods: Computational eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.1594406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

This paper introduces a naive Bayes classifier to detect electoral fraud using digit patterns in vote counts with authentic and synthetic data. The procedure is the following: (1) we create 10,000 simulated electoral contests between two parties using Monte Carlo methods. This training set is composed of two disjoint subsets: one containing electoral returns that follow a Benford distribution, and another where the vote counts are purposively "manipulated" by electoral tampering – a percentage of votes are taken away from one party and given to the other; (2) we calibrate membership values of the simulated elections (i.e. clean or fraudulent) using logistic regression; (3) we recover class-conditional densities using the relative frequencies from the training set; (4) we apply Bayes' rule to class-conditional probabilities and class priors to establish the membership probabilities of authentic observations. To illustrate our technique, we examine elections in the province of Buenos Aires (Argentina) between 1932 and 1942, a period with a checkered history of fraud. Our analysis allows us to successfully classify electoral contests according to their degree of fraud. More generally, our findings indicate that Benford's Law is an effective tool for identifying fraud, even when minimal information (i.e. electoral returns) is available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用数字分析检测选举舞弊的监督机器学习程序
本文介绍了一种朴素贝叶斯分类器,利用真实和合成数据的计票数字模式来检测选举舞弊。程序如下:(1)我们使用蒙特卡罗方法在两个政党之间创建10,000个模拟选举竞赛。这个训练集由两个不相交的子集组成:一个包含遵循本福德分布的选举结果,另一个包含通过选举篡改故意“操纵”的选票计数——从一个政党拿走一定比例的选票给另一个政党;(2)我们使用逻辑回归校准模拟选举(即干净或欺诈)的成员值;(3)利用训练集的相对频率恢复类条件密度;(4)将贝叶斯规则应用于类条件概率和类先验,建立可信观测值的隶属性概率。为了说明我们的技术,我们研究了1932年至1942年阿根廷布宜诺斯艾利斯省的选举,这是一个充满欺诈历史的时期。我们的分析使我们能够根据舞弊程度成功地对选举竞赛进行分类。更一般地说,我们的研究结果表明,本福德定律是识别欺诈的有效工具,即使只有很少的信息(即选举结果)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the Resource Allocation for Political Campaigns What's the Talk in Brussels? Leveraging Daily News Coverage to Measure Issue Attention in the European Union Fake News in Social Networks Text-as-Data Analysis of Preferential Trade Agreements: Mapping the PTA Landscape Selected Research Methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1