{"title":"Anomaly Detection in Election Data and its Representation of U.S. Infrastructure Vulnerability","authors":"Jason Green","doi":"10.1109/iemcon53756.2021.9623111","DOIUrl":null,"url":null,"abstract":"The purpose of this paper is to showcase an idea to research election data fraud attempting to alter outcomes and to assess if the implications fall in line with weaknesses of the U.S. infrastructures. By employing supervised and unsupervised machine learning techniques such as Decision Tree, Random Forest, and Isolation Forest on the 2016 U.S. Presidential Election and Polling datasets, this paper explores potential data fraud via any possible detected anomalies. Through the experiment and analysis, results indicate a ~9% anomalous data entries in the polling results dataset. Due to lack of ground truth on the latter dataset, it is impossible to determine its accuracy. Therefore, the link between possible anomalies and data fraud attempts cannot be drawn. Further research can be done to better examine this link. Despite that, sufficient known publications about the dangers of data manipulation, especially to US infrastructures, can already indicate an alarming vulnerability of the US infrastructures.","PeriodicalId":272590,"journal":{"name":"2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iemcon53756.2021.9623111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The purpose of this paper is to showcase an idea to research election data fraud attempting to alter outcomes and to assess if the implications fall in line with weaknesses of the U.S. infrastructures. By employing supervised and unsupervised machine learning techniques such as Decision Tree, Random Forest, and Isolation Forest on the 2016 U.S. Presidential Election and Polling datasets, this paper explores potential data fraud via any possible detected anomalies. Through the experiment and analysis, results indicate a ~9% anomalous data entries in the polling results dataset. Due to lack of ground truth on the latter dataset, it is impossible to determine its accuracy. Therefore, the link between possible anomalies and data fraud attempts cannot be drawn. Further research can be done to better examine this link. Despite that, sufficient known publications about the dangers of data manipulation, especially to US infrastructures, can already indicate an alarming vulnerability of the US infrastructures.