Despite considerable number of studies on risk factors for asthma onset, very little is known about their relative importance. To have a full picture of these factors, both categories, personal and environmental data, have to be taken into account simultaneously, which is missing in previous studies. We propose a framework to rank the risk factors from heterogeneous data sources of the two categories. Established on top of EventShop and Personal EventShop, this framework extracts about 400 features, and analyzes them by employing a gradient boosting tree. The features come from sources including personal profile and life-event data, and environmental data on air pollution, weather and PM2.5 emission sources. The top ranked risk factors derived from our framework agree well with the general medical consensus. Thus, our framework is a reliable approach, and the discovered rankings of relative importance of risk factors can provide insights for the prevention of asthma.
{"title":"Habits vs Environment: What Really Causes Asthma?","authors":"Mengfan Tang, Pranav Agrawal, R. Jain","doi":"10.1145/2786451.2786481","DOIUrl":"https://doi.org/10.1145/2786451.2786481","url":null,"abstract":"Despite considerable number of studies on risk factors for asthma onset, very little is known about their relative importance. To have a full picture of these factors, both categories, personal and environmental data, have to be taken into account simultaneously, which is missing in previous studies. We propose a framework to rank the risk factors from heterogeneous data sources of the two categories. Established on top of EventShop and Personal EventShop, this framework extracts about 400 features, and analyzes them by employing a gradient boosting tree. The features come from sources including personal profile and life-event data, and environmental data on air pollution, weather and PM2.5 emission sources. The top ranked risk factors derived from our framework agree well with the general medical consensus. Thus, our framework is a reliable approach, and the discovered rankings of relative importance of risk factors can provide insights for the prevention of asthma.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90629453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helena Webb, M. Jirotka, B. Stahl, W. Housley, Adam Edwards, M. Williams, R. Procter, O. Rana, P. Burnap
The increasing popularity of social media platforms such as Facebook, Twitter, Instagram and Tumblr has been accompanied by concerns over the growing prevalence of 'harmful' online interactions. The term 'digital wildfire' has been coined to characterise the capacity for provocative content on social media to propagate rapidly and cause offline harm. The apparent risks posed by digital wildfires create questions over the suitable governance of digital social spaces. This paper provides an overview of some preliminary findings of an ongoing research project that seeks to build an empirically grounded methodology for the study and advancement of the responsible governance of social media.
{"title":"'Digital Wildfires': a challenge to the governance of social media?","authors":"Helena Webb, M. Jirotka, B. Stahl, W. Housley, Adam Edwards, M. Williams, R. Procter, O. Rana, P. Burnap","doi":"10.1145/2786451.2786929","DOIUrl":"https://doi.org/10.1145/2786451.2786929","url":null,"abstract":"The increasing popularity of social media platforms such as Facebook, Twitter, Instagram and Tumblr has been accompanied by concerns over the growing prevalence of 'harmful' online interactions. The term 'digital wildfire' has been coined to characterise the capacity for provocative content on social media to propagate rapidly and cause offline harm. The apparent risks posed by digital wildfires create questions over the suitable governance of digital social spaces. This paper provides an overview of some preliminary findings of an ongoing research project that seeks to build an empirically grounded methodology for the study and advancement of the responsible governance of social media.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88064442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.
{"title":"Wikipedia Page View Reflects Web Search Trend","authors":"Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto","doi":"10.1145/2786451.2786495","DOIUrl":"https://doi.org/10.1145/2786451.2786495","url":null,"abstract":"The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84042228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How will the reputations of individuals in a social network be influenced by their communities in a quantitative way? This work attempts to observe the collaborative events occurring at individuals involved in a social network to obtain such crucial knowledge. We propose a Factorization Machine approach to find out the latent social influence among the individuals based on their collaborations. Experiments conducted on a real-world DBLP dataset verify that the proposed approach can discover the latent social influence among individuals and provide a better predictive model than several baselines.
{"title":"Social Influencer Analysis with Factorization Machines","authors":"Ming-Feng Tsai, Chuan-Ju Wang, Zhe-Li Lin","doi":"10.1145/2786451.2786490","DOIUrl":"https://doi.org/10.1145/2786451.2786490","url":null,"abstract":"How will the reputations of individuals in a social network be influenced by their communities in a quantitative way? This work attempts to observe the collaborative events occurring at individuals involved in a social network to obtain such crucial knowledge. We propose a Factorization Machine approach to find out the latent social influence among the individuals based on their collaborations. Experiments conducted on a real-world DBLP dataset verify that the proposed approach can discover the latent social influence among individuals and provide a better predictive model than several baselines.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84197057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces an approach to classification and formalization of interdisciplinary social research with the web. The research project built upon an initial arraying work of Richard Rogers that introduced digital methods as a new form of research with the web as a source of perceptions about society [1]. Our work formalized the digital methods domain by construing an ontology with help of the Web Ontology Language (OWL), and interpreted the resulting representation for universal perceptions about web-based social research, such as the identification of accumulations of research activities, and predictions about epistemological shifts in the future.
{"title":"Digitizing »Digital Methods« The Journey of a Research Domain from a Book into the Semantic Web","authors":"Miriam Schmitz, Kristian Fischer","doi":"10.1145/2786451.2786503","DOIUrl":"https://doi.org/10.1145/2786451.2786503","url":null,"abstract":"This paper introduces an approach to classification and formalization of interdisciplinary social research with the web. The research project built upon an initial arraying work of Richard Rogers that introduced digital methods as a new form of research with the web as a source of perceptions about society [1]. Our work formalized the digital methods domain by construing an ontology with help of the Web Ontology Language (OWL), and interpreted the resulting representation for universal perceptions about web-based social research, such as the identification of accumulations of research activities, and predictions about epistemological shifts in the future.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73444341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information communication technology has enabled criminals to remain distant from the crimes they commit with reduced risk. However, by moving this underground criminal activity online, digital evidence of communication with members of the crime group, and also victims, presents an interesting research opportunity into human trafficking and may reveal actionable information for law enforcement agencies. Specifically, this research paper investigates whether a webscraping tool could be employed to gather intelligence on organized crime groups at the recruitment stage of the trafficking operation as a means to understand their modus operandi. Preliminary findings presented in this paper indicate that the UK is a popular destination country for job advertisements hosted in Romania and further analysis will be undertaken to identify if there are in fact indicators of trafficking evident in these identified websites.
{"title":"Webscraping as an Investigation Tool to Identify Potential Human Trafficking Operations in Romania","authors":"R. McAlister","doi":"10.1145/2786451.2786510","DOIUrl":"https://doi.org/10.1145/2786451.2786510","url":null,"abstract":"Information communication technology has enabled criminals to remain distant from the crimes they commit with reduced risk. However, by moving this underground criminal activity online, digital evidence of communication with members of the crime group, and also victims, presents an interesting research opportunity into human trafficking and may reveal actionable information for law enforcement agencies. Specifically, this research paper investigates whether a webscraping tool could be employed to gather intelligence on organized crime groups at the recruitment stage of the trafficking operation as a means to understand their modus operandi. Preliminary findings presented in this paper indicate that the UK is a popular destination country for job advertisements hosted in Romania and further analysis will be undertaken to identify if there are in fact indicators of trafficking evident in these identified websites.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77208918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Are web search results usually dominated by major websites and therefore lacking diversity? In this paper, we aim to answer this question by quantitatively modelling the diversity of search results for popular queries using two diversity measures well-studied in ecology, namely Simpson's diversity index and Shannon's diversity index. Our theoretical analysis shows how the diversity of search results is determined by the Zipfian distribution of websites. Our empirical analysis reveals that comparing Google and Bing, the former is more diverse in the top-50 search results, while the latter is more diverse in the top-10 search results.
{"title":"Diversity Analysis of Web Search Results","authors":"Suneel Kumar Kingrani, M. Levene, Dell Zhang","doi":"10.1145/2786451.2786502","DOIUrl":"https://doi.org/10.1145/2786451.2786502","url":null,"abstract":"Are web search results usually dominated by major websites and therefore lacking diversity? In this paper, we aim to answer this question by quantitatively modelling the diversity of search results for popular queries using two diversity measures well-studied in ecology, namely Simpson's diversity index and Shannon's diversity index. Our theoretical analysis shows how the diversity of search results is determined by the Zipfian distribution of websites. Our empirical analysis reveals that comparing Google and Bing, the former is more diverse in the top-50 search results, while the latter is more diverse in the top-10 search results.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91169241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.
{"title":"Does Dialectal Variation Matter in Term-Based Feature Selection of Sentiment Analysis?: An Investigation into Multi-dialectal Chinese Microblogs","authors":"K. C. Chan, King-wa Fu, Chung-hong Chan","doi":"10.1145/2786451.2786924","DOIUrl":"https://doi.org/10.1145/2786451.2786924","url":null,"abstract":"This paper examines the feature selection procedures of sentiment analysis on a multi-dialectal language. We analyzed a dataset with over 6 million microblogs in China, a multi-dialectal country, deployed sentiment classifier to examine the positive/negative emotion carried by the microblogs, and explored the regional variations in the optimal feature vectors. The results support a localized feature vectors in some China's regions can maximize the classification accuracy and show that geographical distance between provinces and common dialect used contribute to explaining the provincial difference in the feature vectors. This research can be applied to other multicultural countries for feature vector optimization in sentiment analysis.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89054886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Huber, B. Scheibehenne, Alexandre Chapiro, Seth Frey, R. Sumner
In an increasingly competitive media environment, producers of online content need analytics that can predict the success of a video. In recent years the field of visual computation has produced a variety of mathematical models that quantify an image's salience, that is, its potential to capture attention. To test how a video's content might predict its success, we applied the standard saliency model of Itti, Koch, and Niebur [1] to more than 1000 video clips that were broadcast on a large video streaming website. We also obtained fine-grained data on the viewership of these clips. Based on a survival analysis, we find that people prefer more salient videos. The results were robust towards the inclusion of other predictors such as the genre of the video, but not to video length, which remains correlated with salience even after comparing videos only within show and genre. Our analyses suggest that visual salience provides an objective and easy-to-compute supplement to previously suggested predictors of video consumption behavior.
{"title":"The influence of visual salience on video consumption behavior: A survival analysis approach","authors":"Rafael Huber, B. Scheibehenne, Alexandre Chapiro, Seth Frey, R. Sumner","doi":"10.1145/2786451.2786507","DOIUrl":"https://doi.org/10.1145/2786451.2786507","url":null,"abstract":"In an increasingly competitive media environment, producers of online content need analytics that can predict the success of a video. In recent years the field of visual computation has produced a variety of mathematical models that quantify an image's salience, that is, its potential to capture attention. To test how a video's content might predict its success, we applied the standard saliency model of Itti, Koch, and Niebur [1] to more than 1000 video clips that were broadcast on a large video streaming website. We also obtained fine-grained data on the viewership of these clips. Based on a survival analysis, we find that people prefer more salient videos. The results were robust towards the inclusion of other predictors such as the genre of the video, but not to video length, which remains correlated with salience even after comparing videos only within show and genre. Our analyses suggest that visual salience provides an objective and easy-to-compute supplement to previously suggested predictors of video consumption behavior.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"57 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91435039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this study is to investigate the temporal association between cyberbalkanization and real life polarization of public opinion during the Hong Kong Occupy Movement in 2014. 1,387 Facebook Pages about Hong Kong during July 1 to December 15, 2014 were collected, their publicly accessible posts were retrieved, and a post sharing network (1,397 nodes and 41,404 edges) was constructed. Network communities were computationally extracted to determine the community membership for each Facebook page. Daily degree of cyberbalkanization was quantified with the number of sharings through strong ties (intra-community sharing) connections. The level of political polarization was derived from the opinion polls data with the proportion of respondents who gave extreme ratings to the government leader in Hong Kong. In a time series analysis, the daily degree of cyberbalkanization, as measured by the number of sharing through the strong ties, was significantly associated with the level of political polarization, particularly with the younger age group's opinion poll result. This is the first study that provides empirical evidence for supporting cyberbalkanization to serve as a leading predictive indicator of the polarization of public opinion for at least 10 days ahead, suggesting that social media data analysis can supplement traditional public opinion research methods, such as phone survey, during social controversy.
{"title":"Predicting Political Polarization from Cyberbalkanization: Time series analysis of Facebook pages and Opinion Poll during the Hong Kong Occupy Movement","authors":"Chung-hong Chan, King-wa Fu","doi":"10.1145/2786451.2786509","DOIUrl":"https://doi.org/10.1145/2786451.2786509","url":null,"abstract":"The purpose of this study is to investigate the temporal association between cyberbalkanization and real life polarization of public opinion during the Hong Kong Occupy Movement in 2014. 1,387 Facebook Pages about Hong Kong during July 1 to December 15, 2014 were collected, their publicly accessible posts were retrieved, and a post sharing network (1,397 nodes and 41,404 edges) was constructed. Network communities were computationally extracted to determine the community membership for each Facebook page. Daily degree of cyberbalkanization was quantified with the number of sharings through strong ties (intra-community sharing) connections. The level of political polarization was derived from the opinion polls data with the proportion of respondents who gave extreme ratings to the government leader in Hong Kong. In a time series analysis, the daily degree of cyberbalkanization, as measured by the number of sharing through the strong ties, was significantly associated with the level of political polarization, particularly with the younger age group's opinion poll result. This is the first study that provides empirical evidence for supporting cyberbalkanization to serve as a leading predictive indicator of the polarization of public opinion for at least 10 days ahead, suggesting that social media data analysis can supplement traditional public opinion research methods, such as phone survey, during social controversy.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77798322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}