A Graph-Based Spatial Cross-Validation Approach for Assessing Models Learned with Selected Features to Understand Election Results

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2021-12-01 DOI:10.1109/ICMLA52953.2021.00150

Tiago Pinho da Silva, A. R. Parmezan, Gustavo E. A. P. A. Batista

{"title":"A Graph-Based Spatial Cross-Validation Approach for Assessing Models Learned with Selected Features to Understand Election Results","authors":"Tiago Pinho da Silva, A. R. Parmezan, Gustavo E. A. P. A. Batista","doi":"10.1109/ICMLA52953.2021.00150","DOIUrl":null,"url":null,"abstract":"Elections are complex activities fundamental to any democracy. The contextualized analysis of election data allows us to understand electoral behavior and the factors that influence it. Multidisciplinary studies have been prioritized the predictive modeling of electoral features from thousands of explanatory features, considering geographic and spatial aspects inherent to the data. When building a model for such a purpose, it must be rigorously evaluated to understand its prediction error in future test cases. Although cross-validation is a widely used procedure for this task, it leads to optimistic results because the spatial independence between test and training data is not ensured in the resampling. On the other hand, alternatives to deal with spatial dependence may fall into a pessimistic scenario by assuming total spatial independence between the test and training sets regardless of the size of the first one, increasing the probability of overfitting. This paper addresses these issues by proposing a graph-based spatial cross-validation approach to assess models learned with selected features from spatially contextualized electoral datasets. Our approach takes advantage of the spatial graph structure provided by the lattice-type spatial objects to define a local training set to each test fold. We generate the local training sets by removing spatially close data that are highly correlated and irrelevant distant data that may interfere with error estimates. Experiments involving the second round of the 2018 Brazilian presidential election demonstrate that our approach contributes to the fair evaluation of models by enabling more realistic and local modeling.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"81 1","pages":"909-915"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Elections are complex activities fundamental to any democracy. The contextualized analysis of election data allows us to understand electoral behavior and the factors that influence it. Multidisciplinary studies have been prioritized the predictive modeling of electoral features from thousands of explanatory features, considering geographic and spatial aspects inherent to the data. When building a model for such a purpose, it must be rigorously evaluated to understand its prediction error in future test cases. Although cross-validation is a widely used procedure for this task, it leads to optimistic results because the spatial independence between test and training data is not ensured in the resampling. On the other hand, alternatives to deal with spatial dependence may fall into a pessimistic scenario by assuming total spatial independence between the test and training sets regardless of the size of the first one, increasing the probability of overfitting. This paper addresses these issues by proposing a graph-based spatial cross-validation approach to assess models learned with selected features from spatially contextualized electoral datasets. Our approach takes advantage of the spatial graph structure provided by the lattice-type spatial objects to define a local training set to each test fold. We generate the local training sets by removing spatially close data that are highly correlated and irrelevant distant data that may interfere with error estimates. Experiments involving the second round of the 2018 Brazilian presidential election demonstrate that our approach contributes to the fair evaluation of models by enabling more realistic and local modeling.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种基于图的空间交叉验证方法，用于评估使用选定特征学习的模型以理解选举结果

选举是复杂的活动，是任何民主制度的基础。对选举数据的情境化分析使我们能够理解选举行为及其影响因素。考虑到数据固有的地理和空间方面，多学科研究已优先考虑从数千个解释特征对选举特征进行预测建模。当为这样的目的构建模型时，必须严格地评估它，以了解它在未来测试用例中的预测误差。虽然交叉验证是该任务中广泛使用的一种方法，但由于在重采样中不能保证测试数据和训练数据之间的空间独立性，因此结果并不乐观。另一方面，处理空间依赖性的替代方案可能会陷入悲观的情况，即假设测试集和训练集之间的空间完全独立，而不管第一个集的大小，从而增加了过拟合的概率。本文通过提出基于图的空间交叉验证方法来解决这些问题，以评估从空间上下文化选举数据集中选择特征学习的模型。我们的方法利用格子型空间对象提供的空间图结构为每个测试折叠定义一个局部训练集。我们通过去除可能干扰误差估计的高度相关和不相关的远程数据的空间接近数据来生成局部训练集。涉及2018年巴西总统选举第二轮的实验表明，我们的方法通过实现更现实和局部的建模，有助于对模型进行公平评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量

期刊最新文献

Detecting Offensive Content on Twitter During Proud Boys Riots Explainable Zero-Shot Modelling of Clinical Depression Symptoms from Text Deep Learning Methods for the Prediction of Information Display Type Using Eye Tracking Sequences Step Detection using SVM on NURVV Trackers Condition Monitoring for Power Converters via Deep One-Class Classification