Matt McVicar, Cédric Mesnage, Jefrey Lijffijt, Eirini Spyropoulou, T. D. Bie
As in any dynamic market, supply and demand of music are in a constant state of disequilibrium. Music charts have for many years documented the demand for the most popular music, but a more comprehensive understanding of this market has remained beyond reach. In this paper, we provide a proof of concept for how web resources now make it possible to study both demand and supply sides, accounting also for smaller, independent artists.
{"title":"Supply and demand of independent UK music artists on the web","authors":"Matt McVicar, Cédric Mesnage, Jefrey Lijffijt, Eirini Spyropoulou, T. D. Bie","doi":"10.1145/2786451.2786488","DOIUrl":"https://doi.org/10.1145/2786451.2786488","url":null,"abstract":"As in any dynamic market, supply and demand of music are in a constant state of disequilibrium. Music charts have for many years documented the demand for the most popular music, but a more comprehensive understanding of this market has remained beyond reach. In this paper, we provide a proof of concept for how web resources now make it possible to study both demand and supply sides, accounting also for smaller, independent artists.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89942604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social networking sites (SNS) aimed at academics have the potential to enhance academic practice through developing an online academic identity and as a portal to further opportunities for collaboration and communication. This paper explores part of the communicative affordance offered by academic SNS through an analysis of the questions posed by academics via the Academia.edu website.
{"title":"What do academics ask their online networks?: An analysis of questions posed via Academia.edu","authors":"Katy Jordan","doi":"10.1145/2786451.2786501","DOIUrl":"https://doi.org/10.1145/2786451.2786501","url":null,"abstract":"Social networking sites (SNS) aimed at academics have the potential to enhance academic practice through developing an online academic identity and as a portal to further opportunities for collaboration and communication. This paper explores part of the communicative affordance offered by academic SNS through an analysis of the questions posed by academics via the Academia.edu website.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"os-3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87682098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article analyzes the issue of degradation of data accuracy in large-scale longitudinal data sets. Recent research points to a number of issues with large-scale data, including problems of reliability, accuracy and quality over time. Simultaneously, large-scale data is increasingly being utilized in the social sciences. As scholars work to produce theoretically grounded research utilized "small-scale" methods, it is important for researchers to better understand the critical issues associated with the analysis of large-scale data. In order to illustrate the issues associated with this type of research, a case study analysis of archival Internet data is presented focusing on the issues of degradation of data accuracy over time. Suggestions for future studies are given.
{"title":"Big Data?: Big Issues Degradation in Longitudinal Data and Implications for Social Sciences","authors":"Matthew S. Weber, Hai Nguyen","doi":"10.1145/2786451.2786482","DOIUrl":"https://doi.org/10.1145/2786451.2786482","url":null,"abstract":"This article analyzes the issue of degradation of data accuracy in large-scale longitudinal data sets. Recent research points to a number of issues with large-scale data, including problems of reliability, accuracy and quality over time. Simultaneously, large-scale data is increasingly being utilized in the social sciences. As scholars work to produce theoretically grounded research utilized \"small-scale\" methods, it is important for researchers to better understand the critical issues associated with the analysis of large-scale data. In order to illustrate the issues associated with this type of research, a case study analysis of archival Internet data is presented focusing on the issues of degradation of data accuracy over time. Suggestions for future studies are given.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83590623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is aimed at studying malware propagation on social media and community link prediction. Twitter is taken as the social media platform and data is collected using Twitter4j and MongoDB. A high interaction client honeypot is used to classify benign and malicious URL's. The retweet volume and links between the users are then analyzed. Further to this, the work aims to detect communities that arise from these links between users with the help of BIGClam algorithm.
{"title":"Prediction of Malware Propagation and Links within Communities in Social Media Based Events","authors":"Abinaya Sowriraghavan, P. Burnap","doi":"10.1145/2786451.2786494","DOIUrl":"https://doi.org/10.1145/2786451.2786494","url":null,"abstract":"This paper is aimed at studying malware propagation on social media and community link prediction. Twitter is taken as the social media platform and data is collected using Twitter4j and MongoDB. A high interaction client honeypot is used to classify benign and malicious URL's. The retweet volume and links between the users are then analyzed. Further to this, the work aims to detect communities that arise from these links between users with the help of BIGClam algorithm.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85304063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexey Tikhonov, L. Ostroumova, A. Chelnokov, Ivan Bogatyy, Gleb Gusev
In this paper, we suggest a novel approach to studying user browsing behavior, i.e., the ways users get to different pages on the Web. Namely, we classified all user browsing paths leading to web pages into several types or browsing patterns. In order to define browsing patterns, we consider several important points of the browsing path: its origin, the last page before the user gets to the domain of the target page, and the target page referrer. Each point can be of several types, which leads to 56 possible patterns. The distribution of the browsing paths over these patterns forms the navigational profile of a web page. We conducted a comprehensive large-scale study of navigational profiles of different web pages. First, we demonstrated that the navigational profile of a web page carry crucial information about the properties of this page (e.g., its popularity and age). Second, we found that the Web consists of several typical non-overlapping clusters formed by pages of similar ranges of incoming traffic. These clusters can be characterized by the functionality of their pages.
{"title":"What can be Found on the Web and How: A Characterization of Web Browsing Patterns","authors":"Alexey Tikhonov, L. Ostroumova, A. Chelnokov, Ivan Bogatyy, Gleb Gusev","doi":"10.1145/2786451.2786468","DOIUrl":"https://doi.org/10.1145/2786451.2786468","url":null,"abstract":"In this paper, we suggest a novel approach to studying user browsing behavior, i.e., the ways users get to different pages on the Web. Namely, we classified all user browsing paths leading to web pages into several types or browsing patterns. In order to define browsing patterns, we consider several important points of the browsing path: its origin, the last page before the user gets to the domain of the target page, and the target page referrer. Each point can be of several types, which leads to 56 possible patterns. The distribution of the browsing paths over these patterns forms the navigational profile of a web page. We conducted a comprehensive large-scale study of navigational profiles of different web pages. First, we demonstrated that the navigational profile of a web page carry crucial information about the properties of this page (e.g., its popularity and age). Second, we found that the Web consists of several typical non-overlapping clusters formed by pages of similar ranges of incoming traffic. These clusters can be characterized by the functionality of their pages.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81598846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Complementing studies on the economic impact of Open Government Data (OGD), we investigate how this novel Web-enabled movement supports sustainability. An analysis of OGD-based applications reveals that: (1) OGD supports all three pillars of sustainability; (2) citizens and app developers alike are receptive and motivated by sustainability implications of OGD; and (3) few regional differences exist between Vienna and New York City. We derive recommendations for further improving the sustainability impact of OGD.
{"title":"Sustainability Implications of Open Government Data: A Cross-Regional Study","authors":"Alison Koczanski, M. Sabou","doi":"10.1145/2786451.2786463","DOIUrl":"https://doi.org/10.1145/2786451.2786463","url":null,"abstract":"Complementing studies on the economic impact of Open Government Data (OGD), we investigate how this novel Web-enabled movement supports sustainability. An analysis of OGD-based applications reveals that: (1) OGD supports all three pillars of sustainability; (2) citizens and app developers alike are receptive and motivated by sustainability implications of OGD; and (3) few regional differences exist between Vienna and New York City. We derive recommendations for further improving the sustainability impact of OGD.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88267741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese, and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying "how much can be said" in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.
{"title":"How much is said in a microblog?: A multilingual inquiry based on Weibo and Twitter","authors":"H. Liao, King-wa Fu, Scott A. Hale","doi":"10.1145/2786451.2786486","DOIUrl":"https://doi.org/10.1145/2786451.2786486","url":null,"abstract":"This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese, and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying \"how much can be said\" in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87926107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those speeches. To detect latent themes in legislative speeches over time, speech content is analyzed using a new dynamic topic modeling method, based on two layers of matrix factorization. This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that the political agenda of the EP has evolved significantly over time, is impacted upon by the committee structure of the Parliament, and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis have a significant impact on what is being discussed in Parliament.
{"title":"Unveiling the Political Agenda of the European Parliament Plenary: A Topical Analysis","authors":"Derek Greene, J. Cross","doi":"10.1145/2786451.2786464","DOIUrl":"https://doi.org/10.1145/2786451.2786464","url":null,"abstract":"This study analyzes political interactions in the European Parliament (EP) by considering how the political agenda of the plenary sessions has evolved over time and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making Parliamentary speeches. It does so by considering the context in which speeches are made, and the content of those speeches. To detect latent themes in legislative speeches over time, speech content is analyzed using a new dynamic topic modeling method, based on two layers of matrix factorization. This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that the political agenda of the EP has evolved significantly over time, is impacted upon by the committee structure of the Parliament, and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis have a significant impact on what is being discussed in Parliament.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90580967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. V. Kleek, Dave Murray-Rust, Amy Guy, Daniel A. Smith, K. O’Hara, N. Shadbolt
Portraying matters as other than they truly are is an important part of everyday human communication. In this paper, we use a survey to examine ways in which people fabricate, omit or alter the truth online. Many reasons are found, including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential to maintaining a humane online society.
{"title":"Self Curation, Social Partitioning, Escaping from Prejudice and Harassment: the Many Dimensions of Lying Online","authors":"M. V. Kleek, Dave Murray-Rust, Amy Guy, Daniel A. Smith, K. O’Hara, N. Shadbolt","doi":"10.1145/2786451.2786461","DOIUrl":"https://doi.org/10.1145/2786451.2786461","url":null,"abstract":"Portraying matters as other than they truly are is an important part of everyday human communication. In this paper, we use a survey to examine ways in which people fabricate, omit or alter the truth online. Many reasons are found, including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential to maintaining a humane online society.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"209 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75679671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abigail Z. Jacobs, Samuel F. Way, J. Ugander, A. Clauset
Online social networks represent a popular and diverse class of social media systems. Despite this variety, each of these systems undergoes a general process of online social network assembly, which represents the complicated and heterogeneous changes that transform newly born systems into mature platforms. However, little is known about this process. For example, how much of a network's assembly is driven by simple growth? How does a network's structure change as it matures? How does network structure vary with adoption rates and user heterogeneity, and do these properties play different roles at different points in the assembly? We investigate these and other questions using a unique dataset of online connections among the roughly one million users at the first 100 colleges admitted to Facebook, captured just 20 months after its launch. We first show that different vintages and adoption rates across this population of networks reveal temporal dynamics of the assembly process, and that assembly is only loosely related to network growth. We then exploit natural experiments embedded in this dataset and complementary data obtained via Internet archaeology to show that different subnetworks matured at different rates toward similar end states. These results shed light on the processes and patterns of online social network assembly, and may facilitate more effective design for online social systems.
{"title":"Assembling thefacebook: Using Heterogeneity to Understand Online Social Network Assembly","authors":"Abigail Z. Jacobs, Samuel F. Way, J. Ugander, A. Clauset","doi":"10.1145/2786451.2786477","DOIUrl":"https://doi.org/10.1145/2786451.2786477","url":null,"abstract":"Online social networks represent a popular and diverse class of social media systems. Despite this variety, each of these systems undergoes a general process of online social network assembly, which represents the complicated and heterogeneous changes that transform newly born systems into mature platforms. However, little is known about this process. For example, how much of a network's assembly is driven by simple growth? How does a network's structure change as it matures? How does network structure vary with adoption rates and user heterogeneity, and do these properties play different roles at different points in the assembly? We investigate these and other questions using a unique dataset of online connections among the roughly one million users at the first 100 colleges admitted to Facebook, captured just 20 months after its launch. We first show that different vintages and adoption rates across this population of networks reveal temporal dynamics of the assembly process, and that assembly is only loosely related to network growth. We then exploit natural experiments embedded in this dataset and complementary data obtained via Internet archaeology to show that different subnetworks matured at different rates toward similar end states. These results shed light on the processes and patterns of online social network assembly, and may facilitate more effective design for online social systems.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86833995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}