The Social News site reddit.com is composed of thousands of independent user-created subreddits where people use the site's submission and voting features in a variety of ways. This paper offers a brief overview of different types of subreddit and how user activity is distributed between these.
{"title":"Reddit.com: A census of subreddits","authors":"Richard A. Mills","doi":"10.1145/2786451.2786491","DOIUrl":"https://doi.org/10.1145/2786451.2786491","url":null,"abstract":"The Social News site reddit.com is composed of thousands of independent user-created subreddits where people use the site's submission and voting features in a variety of ways. This paper offers a brief overview of different types of subreddit and how user activity is distributed between these.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80731026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introducing Social Machines as web-enabled entities integrating social energies and computational powers into a socio-technical system (whether purposeful or not) where social dynamics animate communities, this paper proposes a theoretical framework in which to observe them. Attempting to strike a balance between the roles of humans and non-humans, and aware of the difficulties that this heterogeneity presents, we propose to approach the questions of capturing the social dynamics of a social machine through prosopography. Prosopography is a method, used in particular by historians, that allows to systematically study a collection of biographies, be they of persons, artefacts, infrastructures of groups thereof. Systematization is achieved through designing an appropriate questionnaire to gather homogeneous data across the biographies. Our questionnaire design relies on the identification of five archetypal elements in biographical narratives. Illustrating our method with three examples, we demonstrate how our archetypal narratives have the potential to describe at least aspects of the social dynamics in social machines.
{"title":"Archetypal Narratives in Social Machines: Approaching Sociality through Prosopography","authors":"S. Tarte, P. Willcox, H. Glaser, D. D. Roure","doi":"10.1145/2786451.2786471","DOIUrl":"https://doi.org/10.1145/2786451.2786471","url":null,"abstract":"Introducing Social Machines as web-enabled entities integrating social energies and computational powers into a socio-technical system (whether purposeful or not) where social dynamics animate communities, this paper proposes a theoretical framework in which to observe them. Attempting to strike a balance between the roles of humans and non-humans, and aware of the difficulties that this heterogeneity presents, we propose to approach the questions of capturing the social dynamics of a social machine through prosopography. Prosopography is a method, used in particular by historians, that allows to systematically study a collection of biographies, be they of persons, artefacts, infrastructures of groups thereof. Systematization is achieved through designing an appropriate questionnaire to gather homogeneous data across the biographies. Our questionnaire design relies on the identification of five archetypal elements in biographical narratives. Illustrating our method with three examples, we demonstrate how our archetypal narratives have the potential to describe at least aspects of the social dynamics in social machines.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"83 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79344536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helena Webb, M. Jirotka, B. Stahl, W. Housley, Adam Edwards, M. Williams, R. Procter, O. Rana, P. Burnap
The increasing popularity of social media platforms such as Facebook, Twitter, Instagram and Tumblr has been accompanied by concerns over the growing prevalence of 'harmful' online interactions. The term 'digital wildfire' has been coined to characterise the capacity for provocative content on social media to propagate rapidly and cause offline harm. The apparent risks posed by digital wildfires create questions over the suitable governance of digital social spaces. This paper provides an overview of some preliminary findings of an ongoing research project that seeks to build an empirically grounded methodology for the study and advancement of the responsible governance of social media.
{"title":"'Digital Wildfires': a challenge to the governance of social media?","authors":"Helena Webb, M. Jirotka, B. Stahl, W. Housley, Adam Edwards, M. Williams, R. Procter, O. Rana, P. Burnap","doi":"10.1145/2786451.2786929","DOIUrl":"https://doi.org/10.1145/2786451.2786929","url":null,"abstract":"The increasing popularity of social media platforms such as Facebook, Twitter, Instagram and Tumblr has been accompanied by concerns over the growing prevalence of 'harmful' online interactions. The term 'digital wildfire' has been coined to characterise the capacity for provocative content on social media to propagate rapidly and cause offline harm. The apparent risks posed by digital wildfires create questions over the suitable governance of digital social spaces. This paper provides an overview of some preliminary findings of an ongoing research project that seeks to build an empirically grounded methodology for the study and advancement of the responsible governance of social media.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88064442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite considerable number of studies on risk factors for asthma onset, very little is known about their relative importance. To have a full picture of these factors, both categories, personal and environmental data, have to be taken into account simultaneously, which is missing in previous studies. We propose a framework to rank the risk factors from heterogeneous data sources of the two categories. Established on top of EventShop and Personal EventShop, this framework extracts about 400 features, and analyzes them by employing a gradient boosting tree. The features come from sources including personal profile and life-event data, and environmental data on air pollution, weather and PM2.5 emission sources. The top ranked risk factors derived from our framework agree well with the general medical consensus. Thus, our framework is a reliable approach, and the discovered rankings of relative importance of risk factors can provide insights for the prevention of asthma.
{"title":"Habits vs Environment: What Really Causes Asthma?","authors":"Mengfan Tang, Pranav Agrawal, R. Jain","doi":"10.1145/2786451.2786481","DOIUrl":"https://doi.org/10.1145/2786451.2786481","url":null,"abstract":"Despite considerable number of studies on risk factors for asthma onset, very little is known about their relative importance. To have a full picture of these factors, both categories, personal and environmental data, have to be taken into account simultaneously, which is missing in previous studies. We propose a framework to rank the risk factors from heterogeneous data sources of the two categories. Established on top of EventShop and Personal EventShop, this framework extracts about 400 features, and analyzes them by employing a gradient boosting tree. The features come from sources including personal profile and life-event data, and environmental data on air pollution, weather and PM2.5 emission sources. The top ranked risk factors derived from our framework agree well with the general medical consensus. Thus, our framework is a reliable approach, and the discovered rankings of relative importance of risk factors can provide insights for the prevention of asthma.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90629453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information communication technology has enabled criminals to remain distant from the crimes they commit with reduced risk. However, by moving this underground criminal activity online, digital evidence of communication with members of the crime group, and also victims, presents an interesting research opportunity into human trafficking and may reveal actionable information for law enforcement agencies. Specifically, this research paper investigates whether a webscraping tool could be employed to gather intelligence on organized crime groups at the recruitment stage of the trafficking operation as a means to understand their modus operandi. Preliminary findings presented in this paper indicate that the UK is a popular destination country for job advertisements hosted in Romania and further analysis will be undertaken to identify if there are in fact indicators of trafficking evident in these identified websites.
{"title":"Webscraping as an Investigation Tool to Identify Potential Human Trafficking Operations in Romania","authors":"R. McAlister","doi":"10.1145/2786451.2786510","DOIUrl":"https://doi.org/10.1145/2786451.2786510","url":null,"abstract":"Information communication technology has enabled criminals to remain distant from the crimes they commit with reduced risk. However, by moving this underground criminal activity online, digital evidence of communication with members of the crime group, and also victims, presents an interesting research opportunity into human trafficking and may reveal actionable information for law enforcement agencies. Specifically, this research paper investigates whether a webscraping tool could be employed to gather intelligence on organized crime groups at the recruitment stage of the trafficking operation as a means to understand their modus operandi. Preliminary findings presented in this paper indicate that the UK is a popular destination country for job advertisements hosted in Romania and further analysis will be undertaken to identify if there are in fact indicators of trafficking evident in these identified websites.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77208918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We analyse the usage of quotes in forum.rpg.net, the largest online forum on tabletop roleplaying games. Quote usage appears pervasive and surprisingly consistent over time and users; it seems to have a role in aiding intra-thread navigation; and it reveals an underlying "social" structure in a community that otherwise lacks all trappings (from friends and followers to reputations) of today's social networks. This is the first work to investigate community structure and interaction through the lens of quotes in an online forum.
{"title":"Quotes in forum.rpg.net","authors":"Mattia Samory, E. Peserico","doi":"10.1145/2786451.2786928","DOIUrl":"https://doi.org/10.1145/2786451.2786928","url":null,"abstract":"We analyse the usage of quotes in forum.rpg.net, the largest online forum on tabletop roleplaying games. Quote usage appears pervasive and surprisingly consistent over time and users; it seems to have a role in aiding intra-thread navigation; and it reveals an underlying \"social\" structure in a community that otherwise lacks all trappings (from friends and followers to reputations) of today's social networks. This is the first work to investigate community structure and interaction through the lens of quotes in an online forum.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76072563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julia Perl, Claudia Wagner, Jérôme Kunegis, Steffen Staab
It has widely been observed that many public figures and in particular politicians use Twitter as a medium for communication with their fans or followers. However, Twitter is also used by public figures for communication among themselves, allowing Twitter to be used as a tool to observe the social network among such public figures -- a network which is otherwise much more difficult to observe. Accordingly, we study in this paper the behavior of German politicians with respect to their social interconnections on Twitter, by way of asking the question whether the following and unfollowing between them can be predicted with accuracy. We show which measures are useful for predicting the formation and dissolution of social ties in the network of German politicians, and quantify the added value of unlinking information for both prediction tasks. Our results show that interesting differences exist in the factors that are related with the formation and dissolution of social ties.
{"title":"Twitter as a Political Network: Predicting the Following and Unfollowing Behavior of German Politicians","authors":"Julia Perl, Claudia Wagner, Jérôme Kunegis, Steffen Staab","doi":"10.1145/2786451.2786506","DOIUrl":"https://doi.org/10.1145/2786451.2786506","url":null,"abstract":"It has widely been observed that many public figures and in particular politicians use Twitter as a medium for communication with their fans or followers. However, Twitter is also used by public figures for communication among themselves, allowing Twitter to be used as a tool to observe the social network among such public figures -- a network which is otherwise much more difficult to observe. Accordingly, we study in this paper the behavior of German politicians with respect to their social interconnections on Twitter, by way of asking the question whether the following and unfollowing between them can be predicted with accuracy. We show which measures are useful for predicting the formation and dissolution of social ties in the network of German politicians, and quantify the added value of unlinking information for both prediction tasks. Our results show that interesting differences exist in the factors that are related with the formation and dissolution of social ties.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84669428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.
{"title":"Does Dialectal Variation Matter in Term-Based Feature Selection of Sentiment Analysis?: An Investigation into Multi-dialectal Chinese Microblogs","authors":"K. C. Chan, King-wa Fu, Chung-hong Chan","doi":"10.1145/2786451.2786924","DOIUrl":"https://doi.org/10.1145/2786451.2786924","url":null,"abstract":"This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89054886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this study is to investigate the temporal association between cyberbalkanization and real life polarization of public opinion during the Hong Kong Occupy Movement in 2014. 1,387 Facebook Pages about Hong Kong during July 1 to December 15, 2014 were collected, their publicly accessible posts were retrieved, and a post sharing network (1,397 nodes and 41,404 edges) was constructed. Network communities were computationally extracted to determine the community membership for each Facebook page. Daily degree of cyberbalkanization was quantified with the number of sharings through strong ties (intra-community sharing) connections. The level of political polarization was derived from the opinion polls data with the proportion of respondents who gave extreme ratings to the government leader in Hong Kong. In a time series analysis, the daily degree of cyberbalkanization, as measured by the number of sharing through the strong ties, was significantly associated with the level of political polarization, particularly with the younger age group's opinion poll result. This is the first study that provides empirical evidence for supporting cyberbalkanization to serve as a leading predictive indicator of the polarization of public opinion for at least 10 days ahead, suggesting that social media data analysis can supplement traditional public opinion research methods, such as phone survey, during social controversy.
{"title":"Predicting Political Polarization from Cyberbalkanization: Time series analysis of Facebook pages and Opinion Poll during the Hong Kong Occupy Movement","authors":"Chung-hong Chan, King-wa Fu","doi":"10.1145/2786451.2786509","DOIUrl":"https://doi.org/10.1145/2786451.2786509","url":null,"abstract":"The purpose of this study is to investigate the temporal association between cyberbalkanization and real life polarization of public opinion during the Hong Kong Occupy Movement in 2014. 1,387 Facebook Pages about Hong Kong during July 1 to December 15, 2014 were collected, their publicly accessible posts were retrieved, and a post sharing network (1,397 nodes and 41,404 edges) was constructed. Network communities were computationally extracted to determine the community membership for each Facebook page. Daily degree of cyberbalkanization was quantified with the number of sharings through strong ties (intra-community sharing) connections. The level of political polarization was derived from the opinion polls data with the proportion of respondents who gave extreme ratings to the government leader in Hong Kong. In a time series analysis, the daily degree of cyberbalkanization, as measured by the number of sharing through the strong ties, was significantly associated with the level of political polarization, particularly with the younger age group's opinion poll result. This is the first study that provides empirical evidence for supporting cyberbalkanization to serve as a leading predictive indicator of the polarization of public opinion for at least 10 days ahead, suggesting that social media data analysis can supplement traditional public opinion research methods, such as phone survey, during social controversy.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77798322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Are web search results usually dominated by major websites and therefore lacking diversity? In this paper, we aim to answer this question by quantitatively modelling the diversity of search results for popular queries using two diversity measures well-studied in ecology, namely Simpson's diversity index and Shannon's diversity index. Our theoretical analysis shows how the diversity of search results is determined by the Zipfian distribution of websites. Our empirical analysis reveals that comparing Google and Bing, the former is more diverse in the top-50 search results, while the latter is more diverse in the top-10 search results.
{"title":"Diversity Analysis of Web Search Results","authors":"Suneel Kumar Kingrani, M. Levene, Dell Zhang","doi":"10.1145/2786451.2786502","DOIUrl":"https://doi.org/10.1145/2786451.2786502","url":null,"abstract":"Are web search results usually dominated by major websites and therefore lacking diversity? In this paper, we aim to answer this question by quantitatively modelling the diversity of search results for popular queries using two diversity measures well-studied in ecology, namely Simpson's diversity index and Shannon's diversity index. Our theoretical analysis shows how the diversity of search results is determined by the Zipfian distribution of websites. Our empirical analysis reveals that comparing Google and Bing, the former is more diverse in the top-50 search results, while the latter is more diverse in the top-10 search results.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91169241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}