Eric Baucom, Azade Sanjari, Xiaozhong Liu, Miao Chen
In recent years social media has been used to characterize and predict real world events, and in this research we seek to investigate how closely Twitter mirrors the real world. Specifically, we wish to characterize the relationship between the language used on Twitter and the results of the 2011 NBA Playoff games. We hypothesize that the language used by Twitter users will be useful in classifying the users' locations combined with the current status of which team is in the lead during the game. This is based on the common assumption that "fans" of a team have more positive sentiment and will accordingly use different language when their team is doing well. We investigate this hypothesis by labeling each tweet according the the location of the user along with the team that is in the lead at the time of the tweet. The hypothesized difference in language (as measured by tfidf) should then have predictive power over the tweet labels. We find that indeed it does and we experiment further by adding semantic orientation (SO) information as part of the feature set. The SO does not offer much improvement over tf-idf alone. We discuss the relative strengths of the two types of features for our data.
{"title":"Mirroring the real world in social media: twitter, geolocation, and sentiment analysis","authors":"Eric Baucom, Azade Sanjari, Xiaozhong Liu, Miao Chen","doi":"10.1145/2513549.2513559","DOIUrl":"https://doi.org/10.1145/2513549.2513559","url":null,"abstract":"In recent years social media has been used to characterize and predict real world events, and in this research we seek to investigate how closely Twitter mirrors the real world. Specifically, we wish to characterize the relationship between the language used on Twitter and the results of the 2011 NBA Playoff games. We hypothesize that the language used by Twitter users will be useful in classifying the users' locations combined with the current status of which team is in the lead during the game. This is based on the common assumption that \"fans\" of a team have more positive sentiment and will accordingly use different language when their team is doing well. We investigate this hypothesis by labeling each tweet according the the location of the user along with the team that is in the lead at the time of the tweet. The hypothesized difference in language (as measured by tfidf) should then have predictive power over the tweet labels. We find that indeed it does and we experiment further by adding semantic orientation (SO) information as part of the feature set. The SO does not offer much improvement over tf-idf alone. We discuss the relative strengths of the two types of features for our data.","PeriodicalId":126426,"journal":{"name":"Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing","volume":"109 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120971228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Castillo, G. D. F. Morales, Marcelo Mendoza, Nasir Khan
We perform an automatic analysis of television news programs, based on the closed captions that accompany them. Specifically, we collect all the news broadcasted in over 140 television channels in the US during a period of six months. We start by segmenting, processing, and annotating the closed captions automatically. Next, we focus on the analysis of their linguistic style and on mentions of people using NLP methods. We present a series of key insights about news providers, people in the news, and we discuss the biases that can be uncovered by automatic means. These insights are contrasted by looking at the data from multiple points of view, including qualitative assessment.
{"title":"Says who?: automatic text-based content analysis of television news","authors":"Carlos Castillo, G. D. F. Morales, Marcelo Mendoza, Nasir Khan","doi":"10.1145/2513549.2513558","DOIUrl":"https://doi.org/10.1145/2513549.2513558","url":null,"abstract":"We perform an automatic analysis of television news programs, based on the closed captions that accompany them. Specifically, we collect all the news broadcasted in over 140 television channels in the US during a period of six months. We start by segmenting, processing, and annotating the closed captions automatically. Next, we focus on the analysis of their linguistic style and on mentions of people using NLP methods. We present a series of key insights about news providers, people in the news, and we discuss the biases that can be uncovered by automatic means. These insights are contrasted by looking at the data from multiple points of view, including qualitative assessment.","PeriodicalId":126426,"journal":{"name":"Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124138562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing","authors":"","doi":"10.1145/2513549","DOIUrl":"https://doi.org/10.1145/2513549","url":null,"abstract":"","PeriodicalId":126426,"journal":{"name":"Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134412965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}