Konstantinos Semertzidis, E. Pitoura, Panayiotis Tsaparas
Twitter, being both a micro-blogging service and a social network, has become one of the primary means of communicating and disseminating information online. As such, significant amount of research has been devoted to analyzing the Twitter graph, the tweets, and the behavior of its users. In this work, we undertake a study of the user profile bios on Twitter. The goal of our study is two-fold: first, to understand what Twitter users choose to expose about themselves in their profile bio, and second, to investigate if it is possible to exploit the information in the user bio for tasks such as predicting connections between Twitter users.
{"title":"How people describe themselves on Twitter","authors":"Konstantinos Semertzidis, E. Pitoura, Panayiotis Tsaparas","doi":"10.1145/2484702.2484708","DOIUrl":"https://doi.org/10.1145/2484702.2484708","url":null,"abstract":"Twitter, being both a micro-blogging service and a social network, has become one of the primary means of communicating and disseminating information online. As such, significant amount of research has been devoted to analyzing the Twitter graph, the tweets, and the behavior of its users. In this work, we undertake a study of the user profile bios on Twitter. The goal of our study is two-fold: first, to understand what Twitter users choose to expose about themselves in their profile bio, and second, to investigate if it is possible to exploit the information in the user bio for tasks such as predicting connections between Twitter users.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Alvanaki, Evica Milchevski, S. Michel, A. Stupar
Everything is relative. Cars are compared by gas per mile, websites by page rank, students based on GPA, scientists by number of publications, and celebrities by beauty or wealth. In this paper, we study the characteristics of such entity rankings based on a set of rankings obtained from a popular Web portal. The obtained insights are integrated in our approach, coined Pantheon. Pantheon maintains sets of top-k rankings and reports identified changes in a way that appeals to users, using a novel combination of different characteristics like competitiveness, information entropy, and scale of change. Entity rankings are assembled by combining entity type attributes with data-driven categorical constraints and sorting criteria on numeric attributes. We report on the results of an experimental evaluation using real-world data obtained from a basketball statistics website.
{"title":"Interesting event detection through hall of fame rankings","authors":"F. Alvanaki, Evica Milchevski, S. Michel, A. Stupar","doi":"10.1145/2484702.2484704","DOIUrl":"https://doi.org/10.1145/2484702.2484704","url":null,"abstract":"Everything is relative. Cars are compared by gas per mile, websites by page rank, students based on GPA, scientists by number of publications, and celebrities by beauty or wealth. In this paper, we study the characteristics of such entity rankings based on a set of rankings obtained from a popular Web portal. The obtained insights are integrated in our approach, coined Pantheon. Pantheon maintains sets of top-k rankings and reports identified changes in a way that appeals to users, using a novel combination of different characteristics like competitiveness, information entropy, and scale of change. Entity rankings are assembled by combining entity type attributes with data-driven categorical constraints and sorting criteria on numeric attributes. We report on the results of an experimental evaluation using real-world data obtained from a basketball statistics website.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work we consider the continuous computation of set correlations over a stream of set-valued attributes, such as Tweets and their hashtags, social annotations of blog posts obtained through RSS, or updates to set-valued attributes of databases. In order to compute tag correlations in a distributed fashion, all necessary information has to be present at the computing node(s). Our approach makes use of a partitioning scheme based on set covers for efficient and replication-lean information flow. We report on the results of a preliminary performance evaluation using Tweets obtained through Twitter's streaming API.
{"title":"Scalable, continuous tracking of tag co-occurrences between short sets using (almost) disjoint tag partitions","authors":"F. Alvanaki, S. Michel","doi":"10.1145/2484702.2484705","DOIUrl":"https://doi.org/10.1145/2484702.2484705","url":null,"abstract":"In this work we consider the continuous computation of set correlations over a stream of set-valued attributes, such as Tweets and their hashtags, social annotations of blog posts obtained through RSS, or updates to set-valued attributes of databases. In order to compute tag correlations in a distributed fashion, all necessary information has to be present at the computing node(s). Our approach makes use of a partitioning scheme based on set covers for efficient and replication-lean information flow. We report on the results of a preliminary performance evaluation using Tweets obtained through Twitter's streaming API.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127839415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the problem of anonymizing social networks to prevent individual identifications which use both structural (node degrees) and textual (edge labels) information in social networks. We introduce the concept of Structural and Textual (ST)-equivalence of individuals at two levels (strict and loose), and formally define the problem as Structure and Text aware K-anonymity of social networks (STK-Anonymity). In an STK-anonymized network, each individual is ST-equivalent to at least K-1 other nodes. The major challenge in achieving STK-Anonymity comes from the correlation of edge labels, which causes the propagation of edge anonymization. To address the challenge, we present a two-phase approach. In particular, a set-enumeration tree based approach and three pruning strategies are introduced in the second phase to avoid the propagation problem during anonymization. Experimental results on both real and synthetic datasets are presented to show the effectiveness and efficiency of our approaches.
{"title":"STK-anonymity: k-anonymity of social networks containing both structural and textual information","authors":"Yifan Hao, H. Cao, K. Bhattarai, S. Misra","doi":"10.1145/2484702.2484707","DOIUrl":"https://doi.org/10.1145/2484702.2484707","url":null,"abstract":"We study the problem of anonymizing social networks to prevent individual identifications which use both structural (node degrees) and textual (edge labels) information in social networks. We introduce the concept of Structural and Textual (ST)-equivalence of individuals at two levels (strict and loose), and formally define the problem as Structure and Text aware K-anonymity of social networks (STK-Anonymity). In an STK-anonymized network, each individual is ST-equivalent to at least K-1 other nodes. The major challenge in achieving STK-Anonymity comes from the correlation of edge labels, which causes the propagation of edge anonymization. To address the challenge, we present a two-phase approach. In particular, a set-enumeration tree based approach and three pruning strategies are introduced in the second phase to avoid the propagation problem during anonymization. Experimental results on both real and synthetic datasets are presented to show the effectiveness and efficiency of our approaches.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124983972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eunsu Ryu, Yao Rong, Jie Li, Ashwin Machanavajjhala
While social networking platforms allow users to control how their private information is shared, recent research has shown that a user's sensitive attribute can be inferred based on friendship links and group memberships, even when the attribute value is not shared with anyone else. Thus, existing access control mechanisms are unable to protect against such privacy breaches. Our research goal is to develop tools that help a user Alice be aware of privacy breaches via attribute inference. In this paper, we specifically focus on two problems: (a) whether Alice's sensitive attribute can be inferred based on public information in Alice's neighborhood, and (b) whether making Alice's sensitive attribute public leads to the disclosure of sensitive information of another user Bob in Alice's neighborhood. We propose three algorithms to detect the aforementioned privacy breaches. We limit our scope to the one-hop neighbors of Alice -- information that is visible to an app that can be executed on behalf of Alice. Our results indicate that analyzing local networks is sufficient to extract a significant amount of information about most users.
{"title":"curso: protect yourself from curse of attribute inference: a social network privacy-analyzer","authors":"Eunsu Ryu, Yao Rong, Jie Li, Ashwin Machanavajjhala","doi":"10.1145/2484702.2484706","DOIUrl":"https://doi.org/10.1145/2484702.2484706","url":null,"abstract":"While social networking platforms allow users to control how their private information is shared, recent research has shown that a user's sensitive attribute can be inferred based on friendship links and group memberships, even when the attribute value is not shared with anyone else. Thus, existing access control mechanisms are unable to protect against such privacy breaches.\u0000 Our research goal is to develop tools that help a user Alice be aware of privacy breaches via attribute inference. In this paper, we specifically focus on two problems: (a) whether Alice's sensitive attribute can be inferred based on public information in Alice's neighborhood, and (b) whether making Alice's sensitive attribute public leads to the disclosure of sensitive information of another user Bob in Alice's neighborhood. We propose three algorithms to detect the aforementioned privacy breaches. We limit our scope to the one-hop neighbors of Alice -- information that is visible to an app that can be executed on behalf of Alice. Our results indicate that analyzing local networks is sufficient to extract a significant amount of information about most users.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129693095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Storing and querying large social networks is a challenging problem, due both to the scale of the data, and to intricate querying requirements. One common type of query over a social network is link prediction, which is used to suggest new friends for existing nodes in the network. There is no gold standard metric for predicting new links. However, past work has been effective at identifying a number of metrics that work well for this problem. These metrics vastly differ one from another in their computational complexity, e.g., they may consider a small neighborhood of a node for which new links should be predicted, or they may perform random walks over the entire social network graph. This paper considers the problem of implementing metrics for link prediction in a social network over different types of database systems. We consider the use of a relational database, a key-value store and a graph database. We show the type of database system affects the ease in which link prediction may be performed. Our results are empirically validated by extensive experimentation over real social networks of varying sizes.
{"title":"Implementing link-prediction for social networks in a database system","authors":"Sara Cohen, Netanel Cohen-Tzemach","doi":"10.1145/2484702.2484710","DOIUrl":"https://doi.org/10.1145/2484702.2484710","url":null,"abstract":"Storing and querying large social networks is a challenging problem, due both to the scale of the data, and to intricate querying requirements. One common type of query over a social network is link prediction, which is used to suggest new friends for existing nodes in the network. There is no gold standard metric for predicting new links. However, past work has been effective at identifying a number of metrics that work well for this problem. These metrics vastly differ one from another in their computational complexity, e.g., they may consider a small neighborhood of a node for which new links should be predicted, or they may perform random walks over the entire social network graph. This paper considers the problem of implementing metrics for link prediction in a social network over different types of database systems. We consider the use of a relational database, a key-value store and a graph database. We show the type of database system affects the ease in which link prediction may be performed. Our results are empirically validated by extensive experimentation over real social networks of varying sizes.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129060463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Weiler, M. Scholl, Franz Wanner, Christian Rohrdantz
Unprecedented success and active usage of social media services result in massive amounts of user-generated data. An increasing interest in the contained information from social media data leads to more and more sophisticated analysis and visualization applications. Because of the fast pace and distribution of news in social media data it is an appropriate source to identify events in the data and directly display their occurrence to analysts or other users. This paper presents a method for event identification in local areas using the Twitter data stream. We implement and use a combined log-likelihood ratio approach for the geographic and time dimension of real-life Twitter data in predefined areas of the world to detect events occurring in the message contents. We present a case study with two interesting scenarios to show the usefulness of our approach.
{"title":"Event identification for local areas using social media streaming data","authors":"Andreas Weiler, M. Scholl, Franz Wanner, Christian Rohrdantz","doi":"10.1145/2484702.2484703","DOIUrl":"https://doi.org/10.1145/2484702.2484703","url":null,"abstract":"Unprecedented success and active usage of social media services result in massive amounts of user-generated data. An increasing interest in the contained information from social media data leads to more and more sophisticated analysis and visualization applications. Because of the fast pace and distribution of news in social media data it is an appropriate source to identify events in the data and directly display their occurrence to analysts or other users. This paper presents a method for event identification in local areas using the Twitter data stream. We implement and use a combined log-likelihood ratio approach for the geographic and time dimension of real-life Twitter data in predefined areas of the world to detect events occurring in the message contents. We present a case study with two interesting scenarios to show the usefulness of our approach.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124062213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cache Augmented Database Management Systems, CADBMSs, enhance the velocity of simple operations that read and write a small amount of data from big data. They are most suitable for those applications with workloads that exhibit a high read to write ratio, e.g., interactive social networking actions. This study surveys state of the art with CADBMSs and presents physical data independence as the next step in their evolution. We detail the requirements of this evolution, technological trends and software practices, and our research efforts in this area.
{"title":"Cache augmented database management systems","authors":"Shahram Ghandeharizadeh, Jason Yap","doi":"10.1145/2484702.2484709","DOIUrl":"https://doi.org/10.1145/2484702.2484709","url":null,"abstract":"Cache Augmented Database Management Systems, CADBMSs, enhance the velocity of simple operations that read and write a small amount of data from big data. They are most suitable for those applications with workloads that exhibit a high read to write ratio, e.g., interactive social networking actions. This study surveys state of the art with CADBMSs and presents physical data independence as the next step in their evolution. We detail the requirements of this evolution, technological trends and software practices, and our research efforts in this area.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116476480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies show that vertex similarity measures are good at predicting link formation over the near term, but are less effective in predicting over the long term. This indicates that, generally, as links age, their degree of influence diminishes. However, few papers have systematically studied this phenomenon. In this paper, we apply a supervised learning approach to study age as a factor for link formation. Experiments on several real-world datasets show that younger links are more informative than older ones in predicting the formation of new links. Since older links become less useful, it might be appropriate to remove them when studying network evolution. Several previously observed network properties and network evolution phenomena, such as "the number of edges grows super-linearly in the number of nodes" and "the diameter is decreasing as the network grows", may need to be reconsidered under a dynamic network model where old, inactive links are removed.
{"title":"The predictive value of young and old links in a social network","authors":"Hung-Hsuan Chen, David J. Miller, C. Lee Giles","doi":"10.1145/2484702.2484711","DOIUrl":"https://doi.org/10.1145/2484702.2484711","url":null,"abstract":"Recent studies show that vertex similarity measures are good at predicting link formation over the near term, but are less effective in predicting over the long term. This indicates that, generally, as links age, their degree of influence diminishes. However, few papers have systematically studied this phenomenon. In this paper, we apply a supervised learning approach to study age as a factor for link formation. Experiments on several real-world datasets show that younger links are more informative than older ones in predicting the formation of new links. Since older links become less useful, it might be appropriate to remove them when studying network evolution. Several previously observed network properties and network evolution phenomena, such as \"the number of edges grows super-linearly in the number of nodes\" and \"the diameter is decreasing as the network grows\", may need to be reconsidered under a dynamic network model where old, inactive links are removed.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116497462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tags are words that users add to shared multimedia contents as metadata to facilitate better categorization and improved sharing experiences. With the burgeoning growth of shared images and videos over online social networks, a huge number of tags is being populated everyday in public or shared databases. While one major reason for tagging a photo or a video incorporates the functional needs for the organization of that shared object, people also use tags as a medium of communication for conveying their emotions to their family, friends, and other contacts. The diversity in the linguistic features of these tags demonstrates some interesting patterns that reflect different facets of human nature in managing their online impression to their social peers. This paper investigates how some linguistic features of tags associated with the Flickr photos change with the distance between the user's home location and the location where the photo is taken. In our exploratory analysis "affective" and "relativ" words and their multiplicative interaction show correlations with this distance. These initial findings help us to have a better understanding of online social phenomena related to the expression of emotions and sharing information. At the same time, this might have some indirect implications to understand the insight of impression management in online communities.
{"title":"Distance matters: an exploratory analysis of the linguistic features of Flickr photo tag metadata in relation to impression management","authors":"Syed Ishtiaque Ahmed, Shion Guha","doi":"10.1145/2304536.2304538","DOIUrl":"https://doi.org/10.1145/2304536.2304538","url":null,"abstract":"Tags are words that users add to shared multimedia contents as metadata to facilitate better categorization and improved sharing experiences. With the burgeoning growth of shared images and videos over online social networks, a huge number of tags is being populated everyday in public or shared databases. While one major reason for tagging a photo or a video incorporates the functional needs for the organization of that shared object, people also use tags as a medium of communication for conveying their emotions to their family, friends, and other contacts. The diversity in the linguistic features of these tags demonstrates some interesting patterns that reflect different facets of human nature in managing their online impression to their social peers. This paper investigates how some linguistic features of tags associated with the Flickr photos change with the distance between the user's home location and the location where the photo is taken. In our exploratory analysis \"affective\" and \"relativ\" words and their multiplicative interaction show correlations with this distance. These initial findings help us to have a better understanding of online social phenomena related to the expression of emotions and sharing information. At the same time, this might have some indirect implications to understand the insight of impression management in online communities.","PeriodicalId":104130,"journal":{"name":"ACM SIGMOD Workshop on Databases and Social Networks","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122551175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}