Stop trying to predict elections only with twitter – There are other data sources and technical issues to be improved

IF 10 1区管理学 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE Government Information Quarterly Pub Date : 2023-12-23 DOI:10.1016/j.giq.2023.101899

Kellyton Brito , Rogério Luiz Cardoso Silva Filho , Paulo Jorge Leitão Adeodato

{"title":"Stop trying to predict elections only with twitter – There are other data sources and technical issues to be improved","authors":"Kellyton Brito , Rogério Luiz Cardoso Silva Filho , Paulo Jorge Leitão Adeodato","doi":"10.1016/j.giq.2023.101899","DOIUrl":null,"url":null,"abstract":"<div><p>Since the popularization of social media (SM) platforms, researchers have been trying to use their data to predict electoral results. Previous surveys point out that the most used approach is based on volume and sentiment analysis of posts on Twitter. However, they are almost unanimous in presenting that the results are not better than chance. In this context, this study aims to investigate the feasibility of predicting electoral results based only on Twitter, discover the main issues, and draw guidelines for future alternative directions. For this, we reviewed the evolution of election polling and predictions, including the “polling crises” of 1936 and 1948, and their similarities with current approaches. We also built on the official SM platforms' documentation and on our experience collecting and analyzing large-scale data from many SM platforms. Lastly, we analyzed nine reviews on predicting elections with SM data from 2013 to 2021. We observed that, contrary to initial expectations, most of the current research with Twitter has been unable to solve many of the challenges encountered since initial studies, and also shares many of the characteristics of unsuccessful straw polls performed before 1936. We illustrate that by highlighting the impracticability of polling over Twitter due to several biases and technical barriers, the need for external data, the high dependency on the arbitrary decisions of researchers, and the constant change in platforms' scenarios, that may invalidate specific models. Lastly, we indicate some of the possible future directions, such as a focus on creating repeatable processes; the use of SM data as part of statistical models, instead of polling; diversifying the input data sources, including multiple SM platforms and non-SM data such as polls and economic indicators; using machine learning for regression of the vote share, rather than for sentiment analysis; and dealing with the uncertainty of the highly divergent polling results.</p></div>","PeriodicalId":48258,"journal":{"name":"Government Information Quarterly","volume":"41 1","pages":"Article 101899"},"PeriodicalIF":10.0000,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0740624X23000990/pdfft?md5=64f7c88d6c59b3220fd93efa1461c658&pid=1-s2.0-S0740624X23000990-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Government Information Quarterly","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0740624X23000990","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Since the popularization of social media (SM) platforms, researchers have been trying to use their data to predict electoral results. Previous surveys point out that the most used approach is based on volume and sentiment analysis of posts on Twitter. However, they are almost unanimous in presenting that the results are not better than chance. In this context, this study aims to investigate the feasibility of predicting electoral results based only on Twitter, discover the main issues, and draw guidelines for future alternative directions. For this, we reviewed the evolution of election polling and predictions, including the “polling crises” of 1936 and 1948, and their similarities with current approaches. We also built on the official SM platforms' documentation and on our experience collecting and analyzing large-scale data from many SM platforms. Lastly, we analyzed nine reviews on predicting elections with SM data from 2013 to 2021. We observed that, contrary to initial expectations, most of the current research with Twitter has been unable to solve many of the challenges encountered since initial studies, and also shares many of the characteristics of unsuccessful straw polls performed before 1936. We illustrate that by highlighting the impracticability of polling over Twitter due to several biases and technical barriers, the need for external data, the high dependency on the arbitrary decisions of researchers, and the constant change in platforms' scenarios, that may invalidate specific models. Lastly, we indicate some of the possible future directions, such as a focus on creating repeatable processes; the use of SM data as part of statistical models, instead of polling; diversifying the input data sources, including multiple SM platforms and non-SM data such as polls and economic indicators; using machine learning for regression of the vote share, rather than for sentiment analysis; and dealing with the uncertainty of the highly divergent polling results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不要再试图只用 twitter 来预测选举 - 还有其他数据来源和技术问题有待改进

自社交媒体（SM）平台普及以来，研究人员一直试图利用其数据来预测选举结果。以往的调查指出，最常用的方法是基于 Twitter 上帖子的数量和情感分析。然而，他们几乎一致认为，其结果并不比偶然性好。在此背景下，本研究旨在调查仅基于 Twitter 预测选举结果的可行性，发现主要问题，并为未来的替代方向提供指导。为此，我们回顾了选举民调和预测的演变，包括 1936 年和 1948 年的 "民调危机"，以及它们与当前方法的相似之处。我们还参考了官方SM平台的文档，以及我们从许多SM平台收集和分析大规模数据的经验。最后，我们分析了2013年至2021年利用SM数据预测选举的九篇评论。我们发现，与最初的预期相反，目前大多数利用推特进行的研究都无法解决自最初研究以来遇到的许多难题，而且还与 1936 年之前进行的不成功的草根民意调查有许多相同之处。为此，我们强调了在推特上进行民意调查的不可行性，其原因包括一些偏差和技术障碍、对外部数据的需求、对研究人员任意决定的高度依赖以及平台情景的不断变化，这些都可能使特定模型失效。最后，我们指出了一些可能的未来发展方向，如重点关注创建可重复的流程；将 SM 数据用作统计模型的一部分，而不是民意调查；使输入数据源多样化，包括多个 SM 平台和非 SM 数据，如民意调查和经济指标；使用机器学习对投票率进行回归，而不是进行情感分析；以及处理差异巨大的民意调查结果的不确定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Government Information Quarterly INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

15.70

自引率

16.70%

发文量

106

期刊介绍： Government Information Quarterly (GIQ) delves into the convergence of policy, information technology, government, and the public. It explores the impact of policies on government information flows, the role of technology in innovative government services, and the dynamic between citizens and governing bodies in the digital age. GIQ serves as a premier journal, disseminating high-quality research and insights that bridge the realms of policy, information technology, government, and public engagement.