D. Archambault, Derek Greene, P. Cunningham, N. Hurley
Users of social media sites, such as Twitter, rapidly generate large volumes of text content on a daily basis. Visual summaries are needed to understand what groups of people are saying collectively in this unstructured text data. Users will typically discuss a wide variety of topics, where the number of authors talking about a specific topic can quickly grow or diminish over time, and what the collective is saying about the subject can shift as a situation develops. In this paper, we present a technique that summarises what collections of Twitter users are saying about certain topics over time. As the correct resolution for inspecting the data is unknown in advance, the users are clustered hierarchically over a fixed time interval based on the similarity of their posts. The visualisation technique takes this data structure as its input. Given a topic, it finds the correct resolution of users at each time interval and provides tags to summarise what the collective is discussing. The technique is tested on a large microblogging corpus, consisting of millions of tweets and over a million users.
{"title":"ThemeCrowds: multiresolution summaries of twitter usage","authors":"D. Archambault, Derek Greene, P. Cunningham, N. Hurley","doi":"10.1145/2065023.2065041","DOIUrl":"https://doi.org/10.1145/2065023.2065041","url":null,"abstract":"Users of social media sites, such as Twitter, rapidly generate large volumes of text content on a daily basis. Visual summaries are needed to understand what groups of people are saying collectively in this unstructured text data. Users will typically discuss a wide variety of topics, where the number of authors talking about a specific topic can quickly grow or diminish over time, and what the collective is saying about the subject can shift as a situation develops. In this paper, we present a technique that summarises what collections of Twitter users are saying about certain topics over time. As the correct resolution for inspecting the data is unknown in advance, the users are clustered hierarchically over a fixed time interval based on the similarity of their posts. The visualisation technique takes this data structure as its input. Given a topic, it finds the correct resolution of users at each time interval and provides tags to summarise what the collective is discussing. The technique is tested on a large microblogging corpus, consisting of millions of tweets and over a million users.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125453935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This talk is focused on a key task in the area of Opinion Mining and Sentiment Analysis: polarity classification of social media documents (e.g. blog posts). Estimating polarity is much more demanding than estimating topicality. As a matter of fact, the effectiveness of polarity classification is still modest and does not compare with the effectiveness of standard retrieval tasks. Polarity estimation is severely affected by parts of the text that are off-topic or that simply do not express any opinion. In fact, the key sentiments in a document often appear in specific locations of the text. Furthermore, there are usually conflicting opinions in a given document and this mixed set of opinions harms the performance of automatic methods designed to estimate the overall orientation of the text. In this talk, I will argue that understanding the flow of sentiments in a text is a major challenge for effectively predicting the document's orientation towards a given topic. I will briefly outline some possible avenues to address this challenging issue and review some recent papers that take steps in this direction.
{"title":"The challenge of understanding the flow of sentiments in social media documents","authors":"D. Losada","doi":"10.1145/2065023.2065025","DOIUrl":"https://doi.org/10.1145/2065023.2065025","url":null,"abstract":"This talk is focused on a key task in the area of Opinion Mining and Sentiment Analysis: polarity classification of social media documents (e.g. blog posts). Estimating polarity is much more demanding than estimating topicality. As a matter of fact, the effectiveness of polarity classification is still modest and does not compare with the effectiveness of standard retrieval tasks. Polarity estimation is severely affected by parts of the text that are off-topic or that simply do not express any opinion. In fact, the key sentiments in a document often appear in specific locations of the text. Furthermore, there are usually conflicting opinions in a given document and this mixed set of opinions harms the performance of automatic methods designed to estimate the overall orientation of the text.\u0000 In this talk, I will argue that understanding the flow of sentiments in a text is a major challenge for effectively predicting the document's orientation towards a given topic. I will briefly outline some possible avenues to address this challenging issue and review some recent papers that take steps in this direction.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Folksonomies are becoming increasingly popular, both among users who find them simple and intuitive to use, and scientists as interesting research objects. Folksonomies can be viewed as large informal sources of semantics. Harnessing the semantics for search or concept extraction requires us to be able to recognize linguistic similarity between tags. In this paper we propose an approach that uses a combination of morpho-syntactic and semantic similarity measures without using any external linguistic resources to mine tag pairs that can be reduced to base tags. Our approach is based on the Levenshtein distance for morpho-syntactic similarity and tag signatures for semantic similarity. The evaluation of our approach, based on a data set crawled from Delicious, shows that we are able to recognize a wide range of linguistic variations with high quality.
{"title":"Mining tag similarity in folksonomies","authors":"Geir Solskinnsbakk, J. Gulla","doi":"10.1145/2065023.2065037","DOIUrl":"https://doi.org/10.1145/2065023.2065037","url":null,"abstract":"Folksonomies are becoming increasingly popular, both among users who find them simple and intuitive to use, and scientists as interesting research objects. Folksonomies can be viewed as large informal sources of semantics. Harnessing the semantics for search or concept extraction requires us to be able to recognize linguistic similarity between tags. In this paper we propose an approach that uses a combination of morpho-syntactic and semantic similarity measures without using any external linguistic resources to mine tag pairs that can be reduced to base tags. Our approach is based on the Levenshtein distance for morpho-syntactic similarity and tag signatures for semantic similarity. The evaluation of our approach, based on a data set crawled from Delicious, shows that we are able to recognize a wide range of linguistic variations with high quality.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130701294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}