{"title":"Predicting the Political Polarity of Tweets Using Supervised Machine Learning","authors":"Michelle Voong, Keerthana Gunda, S. Gokhale","doi":"10.1109/COMPSAC48688.2020.000-9","DOIUrl":null,"url":null,"abstract":"With the advent of social media; politicians, media outlets, and ordinary citizens alike are routinely turning to Twitter to share their thoughts and feelings. Discerning politically biased tweets from neutral ones can assist in determining the propensity of an elected official or a media outlet in engaging in political rhetoric. This paper presents a supervised machine learning approach to predict whether a tweet is politically biased or neutral. The approach uses a labeled data set available at Crowdflower, where each tweet is tagged with a partisan/neutral label plus its message type and audience. The approach considers a combination of linguistic features including Term Frequency-Inverse Document Frequency (TF-IDF), bigrams, and trigrams along with metadata features including mentions, retweets, and URLs, as well as the additional labels of message type and audience. It trains both simple and ensemble classifiers and assesses their performance using precision, recall, and F1-score. The results demonstrate that the classifiers can predict the polarity of a tweet accurately when trained on a combination of TF-IDF and metadata features that can be extracted automatically from the tweets, eliminating the need for additional tagging which is manual, cumbersome and error prone.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC48688.2020.000-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the advent of social media; politicians, media outlets, and ordinary citizens alike are routinely turning to Twitter to share their thoughts and feelings. Discerning politically biased tweets from neutral ones can assist in determining the propensity of an elected official or a media outlet in engaging in political rhetoric. This paper presents a supervised machine learning approach to predict whether a tweet is politically biased or neutral. The approach uses a labeled data set available at Crowdflower, where each tweet is tagged with a partisan/neutral label plus its message type and audience. The approach considers a combination of linguistic features including Term Frequency-Inverse Document Frequency (TF-IDF), bigrams, and trigrams along with metadata features including mentions, retweets, and URLs, as well as the additional labels of message type and audience. It trains both simple and ensemble classifiers and assesses their performance using precision, recall, and F1-score. The results demonstrate that the classifiers can predict the polarity of a tweet accurately when trained on a combination of TF-IDF and metadata features that can be extracted automatically from the tweets, eliminating the need for additional tagging which is manual, cumbersome and error prone.