Kellyton Brito , Rogério Luiz Cardoso Silva Filho , Paulo Jorge Leitão Adeodato
{"title":"Stop trying to predict elections only with twitter – There are other data sources and technical issues to be improved","authors":"Kellyton Brito , Rogério Luiz Cardoso Silva Filho , Paulo Jorge Leitão Adeodato","doi":"10.1016/j.giq.2023.101899","DOIUrl":null,"url":null,"abstract":"<div><p>Since the popularization of social media (SM) platforms, researchers have been trying to use their data to predict electoral results. Previous surveys point out that the most used approach is based on volume and sentiment analysis of posts on Twitter. However, they are almost unanimous in presenting that the results are not better than chance. In this context, this study aims to investigate the feasibility of predicting electoral results based only on Twitter, discover the main issues, and draw guidelines for future alternative directions. For this, we reviewed the evolution of election polling and predictions, including the “polling crises” of 1936 and 1948, and their similarities with current approaches. We also built on the official SM platforms' documentation and on our experience collecting and analyzing large-scale data from many SM platforms. Lastly, we analyzed nine reviews on predicting elections with SM data from 2013 to 2021. We observed that, contrary to initial expectations, most of the current research with Twitter has been unable to solve many of the challenges encountered since initial studies, and also shares many of the characteristics of unsuccessful straw polls performed before 1936. We illustrate that by highlighting the impracticability of polling over Twitter due to several biases and technical barriers, the need for external data, the high dependency on the arbitrary decisions of researchers, and the constant change in platforms' scenarios, that may invalidate specific models. Lastly, we indicate some of the possible future directions, such as a focus on creating repeatable processes; the use of SM data as part of statistical models, instead of polling; diversifying the input data sources, including multiple SM platforms and non-SM data such as polls and economic indicators; using machine learning for regression of the vote share, rather than for sentiment analysis; and dealing with the uncertainty of the highly divergent polling results.</p></div>","PeriodicalId":48258,"journal":{"name":"Government Information Quarterly","volume":"41 1","pages":"Article 101899"},"PeriodicalIF":7.8000,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0740624X23000990/pdfft?md5=64f7c88d6c59b3220fd93efa1461c658&pid=1-s2.0-S0740624X23000990-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Government Information Quarterly","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0740624X23000990","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Since the popularization of social media (SM) platforms, researchers have been trying to use their data to predict electoral results. Previous surveys point out that the most used approach is based on volume and sentiment analysis of posts on Twitter. However, they are almost unanimous in presenting that the results are not better than chance. In this context, this study aims to investigate the feasibility of predicting electoral results based only on Twitter, discover the main issues, and draw guidelines for future alternative directions. For this, we reviewed the evolution of election polling and predictions, including the “polling crises” of 1936 and 1948, and their similarities with current approaches. We also built on the official SM platforms' documentation and on our experience collecting and analyzing large-scale data from many SM platforms. Lastly, we analyzed nine reviews on predicting elections with SM data from 2013 to 2021. We observed that, contrary to initial expectations, most of the current research with Twitter has been unable to solve many of the challenges encountered since initial studies, and also shares many of the characteristics of unsuccessful straw polls performed before 1936. We illustrate that by highlighting the impracticability of polling over Twitter due to several biases and technical barriers, the need for external data, the high dependency on the arbitrary decisions of researchers, and the constant change in platforms' scenarios, that may invalidate specific models. Lastly, we indicate some of the possible future directions, such as a focus on creating repeatable processes; the use of SM data as part of statistical models, instead of polling; diversifying the input data sources, including multiple SM platforms and non-SM data such as polls and economic indicators; using machine learning for regression of the vote share, rather than for sentiment analysis; and dealing with the uncertainty of the highly divergent polling results.
期刊介绍:
Government Information Quarterly (GIQ) delves into the convergence of policy, information technology, government, and the public. It explores the impact of policies on government information flows, the role of technology in innovative government services, and the dynamic between citizens and governing bodies in the digital age. GIQ serves as a premier journal, disseminating high-quality research and insights that bridge the realms of policy, information technology, government, and public engagement.