Yang Guo, Xuekui Zhang, F. Esfahani, Venkatesh Srinivasan, Alex Thomo, Li Xing
Mining dense subgraphs where vertices connect closely with each other is a common task when analyzing graphs. A very popular notion in subgraph analysis is core decomposition. Recently, Esfahani et al. presented a probabilistic core decomposition algorithm based on graph peeling and Central Limit Theorem (CLT) that is capable of handling very large graphs. Their proposed peeling algorithm (PA) starts from the lowest degree vertices and recursively deletes these vertices, assigning core numbers, and updating the degree of neighbour vertices until it reached the maximum core. However, in many applications, particularly in biology, more valuable information can be obtained from dense sub-communities and we are not interested in small cores where vertices do not interact much with others. To make the previous PA focus more on dense subgraphs, we propose a multi-stage graph peeling algorithm (M-PA) that has a two-stage data screening procedure added before the previous PA. After removing vertices from the graph based on the user-defined thresholds, we can reduce the graph complexity largely and without affecting the vertices in subgraphs that we are interested in. We show that M-PA is more efficient than the previous PA and with the properly set filtering threshold, can produce very similar if not identical dense subgraphs to the previous PA (in terms of graph density and clustering coefficient).
{"title":"Multi-stage graph peeling algorithm for probabilistic core decomposition","authors":"Yang Guo, Xuekui Zhang, F. Esfahani, Venkatesh Srinivasan, Alex Thomo, Li Xing","doi":"10.1145/3487351.3489470","DOIUrl":"https://doi.org/10.1145/3487351.3489470","url":null,"abstract":"Mining dense subgraphs where vertices connect closely with each other is a common task when analyzing graphs. A very popular notion in subgraph analysis is core decomposition. Recently, Esfahani et al. presented a probabilistic core decomposition algorithm based on graph peeling and Central Limit Theorem (CLT) that is capable of handling very large graphs. Their proposed peeling algorithm (PA) starts from the lowest degree vertices and recursively deletes these vertices, assigning core numbers, and updating the degree of neighbour vertices until it reached the maximum core. However, in many applications, particularly in biology, more valuable information can be obtained from dense sub-communities and we are not interested in small cores where vertices do not interact much with others. To make the previous PA focus more on dense subgraphs, we propose a multi-stage graph peeling algorithm (M-PA) that has a two-stage data screening procedure added before the previous PA. After removing vertices from the graph based on the user-defined thresholds, we can reduce the graph complexity largely and without affecting the vertices in subgraphs that we are interested in. We show that M-PA is more efficient than the previous PA and with the properly set filtering threshold, can produce very similar if not identical dense subgraphs to the previous PA (in terms of graph density and clustering coefficient).","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Communities in social networks evolve over time as people enter and leave the network and their activity behaviors shift. The task of predicting structural changes in communities over time is known as community evolution prediction. Existing work in this area has focused on the development of frameworks for defining events while using traditional classification methods to perform the actual prediction. We present a novel graph neural network for predicting community evolution events from structural and temporal information. The model (GNAN) includes a group-node attention component which enables support for variable-sized inputs and learned representation of groups based on member and neighbor node features. A comparative evaluation with standard baseline methods is performed and we demonstrate that our model outperforms the baselines. Additionally, we show the effects of network trends on model performance.
{"title":"Group-node attention for community evolution prediction","authors":"Matt Revelle, C. Domeniconi, Ben U. Gelman","doi":"10.1145/3487351.3488348","DOIUrl":"https://doi.org/10.1145/3487351.3488348","url":null,"abstract":"Communities in social networks evolve over time as people enter and leave the network and their activity behaviors shift. The task of predicting structural changes in communities over time is known as community evolution prediction. Existing work in this area has focused on the development of frameworks for defining events while using traditional classification methods to perform the actual prediction. We present a novel graph neural network for predicting community evolution events from structural and temporal information. The model (GNAN) includes a group-node attention component which enables support for variable-sized inputs and learned representation of groups based on member and neighbor node features. A comparative evaluation with standard baseline methods is performed and we demonstrate that our model outperforms the baselines. Additionally, we show the effects of network trends on model performance.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127180643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shangbin Feng, Herun Wan, Ningnan Wang, Minnan Luo
Twitter bot detection is an important and challenging task. Existing bot detection measures fail to address the challenge of community and disguise, falling short of detecting bots that disguise as genuine users and attack collectively. To address these two challenges of Twitter bot detection, we propose BotRGCN, which is short for Bot detection with Relational Graph Convolutional Networks. BotRGCN addresses the challenge of community by constructing a heterogeneous graph from follow relationships and applies relational graph convolutional networks. Apart from that, BotRGCN makes use of multi-modal user semantic and property information to avoid feature engineering and augment its ability to capture bots with diversified disguise. Extensive experiments demonstrate that BotRGCN outperforms competitive baselines on a comprehensive benchmark TwiBot-20 which provides follow relationships.
{"title":"BotRGCN: Twitter bot detection with relational graph convolutional networks","authors":"Shangbin Feng, Herun Wan, Ningnan Wang, Minnan Luo","doi":"10.1145/3487351.3488336","DOIUrl":"https://doi.org/10.1145/3487351.3488336","url":null,"abstract":"Twitter bot detection is an important and challenging task. Existing bot detection measures fail to address the challenge of community and disguise, falling short of detecting bots that disguise as genuine users and attack collectively. To address these two challenges of Twitter bot detection, we propose BotRGCN, which is short for Bot detection with Relational Graph Convolutional Networks. BotRGCN addresses the challenge of community by constructing a heterogeneous graph from follow relationships and applies relational graph convolutional networks. Apart from that, BotRGCN makes use of multi-modal user semantic and property information to avoid feature engineering and augment its ability to capture bots with diversified disguise. Extensive experiments demonstrate that BotRGCN outperforms competitive baselines on a comprehensive benchmark TwiBot-20 which provides follow relationships.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115629450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the fact that it is publicly available, collecting and processing the full bitcoin blockchain data is not trivial. Its mere size, history, and other features indeed raise quite specific challenges, that we address in this paper. The strengths of our approach are the following: it relies on very basic and standard tools, which makes the procedure reliable and easily reproducible; it is a purely lossless procedure ensuring that we catch and preserve all existing data; it provides additional indexing that makes it easy to further process the whole data and select appropriate subsets of it. We present our procedure in details and provide an implementation online, as well as the obtained dataset.
{"title":"Full Bitcoin blockchain data made easy","authors":"Jules Azad Emery, Matthieu Latapy","doi":"10.1145/3487351.3488326","DOIUrl":"https://doi.org/10.1145/3487351.3488326","url":null,"abstract":"Despite the fact that it is publicly available, collecting and processing the full bitcoin blockchain data is not trivial. Its mere size, history, and other features indeed raise quite specific challenges, that we address in this paper. The strengths of our approach are the following: it relies on very basic and standard tools, which makes the procedure reliable and easily reproducible; it is a purely lossless procedure ensuring that we catch and preserve all existing data; it provides additional indexing that makes it easy to further process the whole data and select appropriate subsets of it. We present our procedure in details and provide an implementation online, as well as the obtained dataset.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"35643 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123658074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Singh, Chirag Jain, Jivitesh Jain, R. Jain, Shradha Sehgal, Tanisha Pandey, P. Kumaraguru
Social media has grown exponentially in a short period, coming to the forefront of communications and online interactions. Despite their rapid growth, social media platforms have been unable to scale to different languages globally and remain inaccessible to many. In this paper, we characterize Koo, a multilingual micro-blogging site that rose in popularity in 2021, as an Indian alternative to Twitter. We collected a dataset of 4.07 million users, 163.12 million follower-following relationships, and their content and activity across 12 languages. We study the user demographic along the lines of language, location, gender, and profession. The prominent presence of Indian languages in the discourse on Koo indicates the platform's success in promoting regional languages. We observe Koo's follower-following network to be much denser than Twitter's, comprising of closely-knit linguistic communities. An N-gram analysis of posts on Koo shows a #KooVsTwitter rhetoric, revealing the debate comparing the two platforms. Our characterization highlights the dynamics of the multilingual social network and its diverse Indian user base.
{"title":"What's kooking?: characterizing India's emerging social network, Koo","authors":"A. Singh, Chirag Jain, Jivitesh Jain, R. Jain, Shradha Sehgal, Tanisha Pandey, P. Kumaraguru","doi":"10.1145/3487351.3488354","DOIUrl":"https://doi.org/10.1145/3487351.3488354","url":null,"abstract":"Social media has grown exponentially in a short period, coming to the forefront of communications and online interactions. Despite their rapid growth, social media platforms have been unable to scale to different languages globally and remain inaccessible to many. In this paper, we characterize Koo, a multilingual micro-blogging site that rose in popularity in 2021, as an Indian alternative to Twitter. We collected a dataset of 4.07 million users, 163.12 million follower-following relationships, and their content and activity across 12 languages. We study the user demographic along the lines of language, location, gender, and profession. The prominent presence of Indian languages in the discourse on Koo indicates the platform's success in promoting regional languages. We observe Koo's follower-following network to be much denser than Twitter's, comprising of closely-knit linguistic communities. An N-gram analysis of posts on Koo shows a #KooVsTwitter rhetoric, revealing the debate comparing the two platforms. Our characterization highlights the dynamics of the multilingual social network and its diverse Indian user base.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115772270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online social media enables mass-level, transparent, and democratized discussion on numerous socio-political issues. Due to such openness, these platforms often endure manipulation and misinformation - leading to negative impacts. To prevent such harmful activities, platform moderators employ countermeasures to safeguard against actors violating their rules. However, the correlation between publicly outlined policies and employed action is less clear to general people. In this work, we examine violations and subsequent moderations related to the 2020 U.S. President Election discussion on Twitter. We focus on quantifying plausible reasons for the suspension, drawing on Twitter's rules and policies by identifying suspended users (Case) and comparing their activities and properties with (yet) non-suspended (Control) users. Using a dataset of 240M election-related tweets made by 21M unique users, we observe that Suspended users violate Twitter's rules at a higher rate (statistically significant) than Control users across all the considered aspects - hate speech, offensiveness, spamming, and civic integrity. Moreover, through the lens of Twitter's suspension mechanism, we qualitatively examine the targeted topics for manipulation.
{"title":"Examining factors associated with Twitter account suspension following the 2020 U.S. presidential election","authors":"Farhan Asif Chowdhury, Dheeman Saha, Md Rashidul Hasan, Koustuv Saha, A. Mueen","doi":"10.1145/3487351.3492715","DOIUrl":"https://doi.org/10.1145/3487351.3492715","url":null,"abstract":"Online social media enables mass-level, transparent, and democratized discussion on numerous socio-political issues. Due to such openness, these platforms often endure manipulation and misinformation - leading to negative impacts. To prevent such harmful activities, platform moderators employ countermeasures to safeguard against actors violating their rules. However, the correlation between publicly outlined policies and employed action is less clear to general people. In this work, we examine violations and subsequent moderations related to the 2020 U.S. President Election discussion on Twitter. We focus on quantifying plausible reasons for the suspension, drawing on Twitter's rules and policies by identifying suspended users (Case) and comparing their activities and properties with (yet) non-suspended (Control) users. Using a dataset of 240M election-related tweets made by 21M unique users, we observe that Suspended users violate Twitter's rules at a higher rate (statistically significant) than Control users across all the considered aspects - hate speech, offensiveness, spamming, and civic integrity. Moreover, through the lens of Twitter's suspension mechanism, we qualitatively examine the targeted topics for manipulation.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131312570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An information outbreak occurs on social media along with the COVID-19 pandemic and leads to infodemic. Predicting the popularity of online content, known as cascade prediction, allows for not only catching in advance hot information that deserves attention, but also identifying false information that will widely spread and require quick response to mitigate its impact. Among the various information diffusion patterns leveraged in previous works, the spillover effect of the information exposed to users on their decision to participate in diffusing certain information is still not studied. In this paper, we focus on the diffusion of information related to COVID-19 preventive measures. Through our collected Twitter dataset, we validated the existence of this spillover effect. Building on the finding, we proposed extensions to three cascade prediction methods based on Graph Neural Networks (GNNs). Experiments conducted on our dataset demonstrated that the use of the identified spillover effect significantly improves the state-of-the-art GNNs methods in predicting the popularity of not only preventive measure messages, but also other COVID-19 related messages.
{"title":"From #jobsearch to #mask: improving COVID-19 cascade prediction with spillover effects","authors":"Ninghan Chen, Zhiqiang Zhong, Jun Pang","doi":"10.1145/3487351.3488555","DOIUrl":"https://doi.org/10.1145/3487351.3488555","url":null,"abstract":"An information outbreak occurs on social media along with the COVID-19 pandemic and leads to infodemic. Predicting the popularity of online content, known as cascade prediction, allows for not only catching in advance hot information that deserves attention, but also identifying false information that will widely spread and require quick response to mitigate its impact. Among the various information diffusion patterns leveraged in previous works, the spillover effect of the information exposed to users on their decision to participate in diffusing certain information is still not studied. In this paper, we focus on the diffusion of information related to COVID-19 preventive measures. Through our collected Twitter dataset, we validated the existence of this spillover effect. Building on the finding, we proposed extensions to three cascade prediction methods based on Graph Neural Networks (GNNs). Experiments conducted on our dataset demonstrated that the use of the identified spillover effect significantly improves the state-of-the-art GNNs methods in predicting the popularity of not only preventive measure messages, but also other COVID-19 related messages.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterspeech in mitigating this spread. In this work, we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months, containing over 206 million tweets, and a social network with over 127 million nodes. By creating a novel hand-labeled dataset of 3,355 tweets, we train a text classifier to identify hateful and counterspeech tweets that achieves an average macro-F1 score of 0.832. Using this dataset, we conduct longitudinal analysis of tweets and users. Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities. We find that nodes were highly likely to become hateful after being exposed to hateful content in the year 2020. Notably, counterspeech messages discourage users from turning hateful, potentially suggesting a solution to curb hate on web and social media platforms. Data and code is available at http://claws.cc.gatech.edu/covid.
{"title":"Racism is a virus: anti-asian hate and counterspeech in social media during the COVID-19 crisis","authors":"Caleb Ziems, Bing He, Sandeep Soni, Srijan Kumar","doi":"10.1145/3487351.3488324","DOIUrl":"https://doi.org/10.1145/3487351.3488324","url":null,"abstract":"The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities. However, little is known about how racial hate spreads during a pandemic and the role of counterspeech in mitigating this spread. In this work, we study the evolution and spread of anti-Asian hate speech through the lens of Twitter. We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months, containing over 206 million tweets, and a social network with over 127 million nodes. By creating a novel hand-labeled dataset of 3,355 tweets, we train a text classifier to identify hateful and counterspeech tweets that achieves an average macro-F1 score of 0.832. Using this dataset, we conduct longitudinal analysis of tweets and users. Analysis of the social network reveals that hateful and counterspeech users interact and engage extensively with one another, instead of living in isolated polarized communities. We find that nodes were highly likely to become hateful after being exposed to hateful content in the year 2020. Notably, counterspeech messages discourage users from turning hateful, potentially suggesting a solution to curb hate on web and social media platforms. Data and code is available at http://claws.cc.gatech.edu/covid.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123510855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online participatory media platforms that enable one-to-many communication among users, see a significant amount of user generated content and consequently face a problem of being able to recommend a subset of this content to its users. We address the problem of recommending and ranking this content such that different viewpoints about a topic get exposure in a fair and diverse manner. We build our model in the context of a voice-based participatory media platform running in rural central India, for low-income and less-literate communities, that plays audio messages in a ranked list to users over a phone call and allows them to contribute their own messages. In this paper, we describe our model and evaluate it using call-logs from the platform, to compare the fairness and diversity performance of our model with the manual editorial processes currently being followed. Our models are generic and can be adapted and applied to other participatory media platforms as well.
{"title":"Fairness and diversity in the recommendation and ranking of participatory media content","authors":"Muskaan, Mehak Preet Dhaliwal, Aaditeshwar Seth","doi":"10.1145/3487351.3488363","DOIUrl":"https://doi.org/10.1145/3487351.3488363","url":null,"abstract":"Online participatory media platforms that enable one-to-many communication among users, see a significant amount of user generated content and consequently face a problem of being able to recommend a subset of this content to its users. We address the problem of recommending and ranking this content such that different viewpoints about a topic get exposure in a fair and diverse manner. We build our model in the context of a voice-based participatory media platform running in rural central India, for low-income and less-literate communities, that plays audio messages in a ranked list to users over a phone call and allows them to contribute their own messages. In this paper, we describe our model and evaluate it using call-logs from the platform, to compare the fairness and diversity performance of our model with the manual editorial processes currently being followed. Our models are generic and can be adapted and applied to other participatory media platforms as well.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115321680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}