Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100209
Usman Anjum, Vladimir Zadorozhny, Prashant Krishnamurthy
Event localization is the task of finding the location of an event. Commonly, event localization using microblogging services, like Twitter, use con- tents of the messages and the geographical information associated with the messages. In this paper, we propose a novel approach called SPARE (SPAtial REconstruction) that bypasses the need for geographical or semantic information to localize tweets. We assume there are reference coordinates at known locations that scrape the microblog (tweet) counts in time and space (circular regions around the reference coordinate). The counts of tweets are aggregated which are then disaggregated to identify event patterns. The change in counts of tweets would be indicative of an event pattern. We show, using real data, that the change in counts of tweets is manifested as peaks. The peaks from multiple reference coordinates can be used as an input to trilateration techniques to pinpoint the location of an event. We introduce metrics to identify the quality of disaggregation of fine-grained data and examine techniques like filtering to improve accuracy of event location. The experimental results show that our method can identify the location of an event with high accuracy.
{"title":"Localization of Unidentified Events with Raw Microblogging Data","authors":"Usman Anjum, Vladimir Zadorozhny, Prashant Krishnamurthy","doi":"10.1016/j.osnem.2022.100209","DOIUrl":"10.1016/j.osnem.2022.100209","url":null,"abstract":"<div><p><span><span>Event localization is the task of finding the location of an event. Commonly, event localization using microblogging services, like Twitter, use con- tents of the messages and the </span>geographical information<span> associated with the messages. In this paper, we propose a novel approach called SPARE (SPAtial REconstruction) that bypasses the need for geographical or semantic information to localize tweets. We assume there are reference coordinates at known locations that scrape the microblog (tweet) counts in time and space (circular regions around the reference coordinate). The counts of tweets are aggregated which are then disaggregated to identify event patterns. The change in counts of tweets would be indicative of an event pattern. We show, using real data, that the change in counts of tweets is manifested as peaks. The peaks from multiple reference coordinates can be used as an input to </span></span>trilateration techniques to pinpoint the location of an event. We introduce metrics to identify the quality of disaggregation of fine-grained data and examine techniques like filtering to improve accuracy of event location. The experimental results show that our method can identify the location of an event with high accuracy.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128221538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100206
Ahmad Zareie, Rizos Sakellariou
The spread of rumours in social networks has become a significant challenge in recent years. Blocking so-called critical edges, that is, edges that have a significant role in the spreading process, has attracted lots of attention as a means to minimize the spread of rumours. Although the detection of the sources of rumour may help identify critical edges this has an overhead that source-ignorant approaches are trying to eliminate. Several source-ignorant edge blocking methods have been proposed which mostly determine critical edges on the basis of centrality. Taking into account additional features of edges (beyond centrality) may help determine what edges to block more accurately. In this paper, a new source-ignorant method is proposed to identify a set of critical edges by considering for each edge the impact of blocking and the influence of the nodes connected to the edge. Experimental results demonstrate that the proposed method can identify critical edges more accurately in comparison to other source-ignorant methods.
{"title":"Rumour spread minimization in social networks: A source-ignorant approach","authors":"Ahmad Zareie, Rizos Sakellariou","doi":"10.1016/j.osnem.2022.100206","DOIUrl":"10.1016/j.osnem.2022.100206","url":null,"abstract":"<div><p>The spread of rumours in social networks has become a significant challenge in recent years. Blocking so-called critical edges, that is, edges that have a significant role in the spreading process, has attracted lots of attention as a means to minimize the spread of rumours. Although the detection of the sources of rumour may help identify critical edges this has an overhead that source-ignorant approaches are trying to eliminate. Several source-ignorant edge blocking methods have been proposed which mostly determine critical edges on the basis of centrality. Taking into account additional features of edges (beyond centrality) may help determine what edges to block more accurately. In this paper, a new source-ignorant method is proposed to identify a set of critical edges by considering for each edge the impact of blocking and the influence of the nodes connected to the edge. Experimental results demonstrate that the proposed method can identify critical edges more accurately in comparison to other source-ignorant methods.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000106/pdfft?md5=5c46e8ade686686c561918b3c01408b9&pid=1-s2.0-S2468696422000106-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130196186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1016/j.osnem.2021.100194
Rafael M.O. Cruz , Woshington V. de Sousa , George D.C. Cavalcanti
Hate speech is a major issue in social networks due to the high volume of data generated daily. Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language. Many ML solutions for hate speech detection have been proposed by either changing how features are extracted from the text or the classification algorithm employed. However, most works consider only one type of feature extraction and classification algorithm. This work argues that a combination of multiple feature extraction techniques and different classification models is needed. We propose a framework to analyze the relationship between multiple feature extraction and classification techniques to understand how they complement each other. The framework is used to select a subset of complementary techniques to compose a robust multiple classifiers system (MCS) for hate speech detection. The experimental study considering four hate speech classification datasets demonstrates that the proposed framework is a promising methodology for analyzing and designing high-performing MCS for this task. MCS system obtained using the proposed framework significantly outperforms the combination of all models and the homogeneous and heterogeneous selection heuristics, demonstrating the importance of having a proper selection scheme. Source code, figures and dataset splits can be found in the GitHub repository: https://github.com/Menelau/Hate-Speech-MCS.
{"title":"Selecting and combining complementary feature representations and classifiers for hate speech detection","authors":"Rafael M.O. Cruz , Woshington V. de Sousa , George D.C. Cavalcanti","doi":"10.1016/j.osnem.2021.100194","DOIUrl":"https://doi.org/10.1016/j.osnem.2021.100194","url":null,"abstract":"<div><p><span><span>Hate speech is a major issue in social networks due to the high volume of data generated daily. Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language. Many ML solutions for hate speech detection have been proposed by either changing how features are extracted from the text or the </span>classification algorithm<span><span><span> employed. However, most works consider only one type of feature extraction and classification algorithm. This work argues that a combination of multiple feature extraction techniques and different classification models is needed. We propose a framework to analyze the relationship between multiple feature extraction and </span>classification techniques to understand how they complement each other. The framework is used to select a subset of complementary techniques to compose a robust </span>multiple classifiers system<span> (MCS) for hate speech detection. The experimental study considering four hate speech classification datasets demonstrates that the proposed framework is a promising methodology for analyzing and designing high-performing MCS for this task. MCS system obtained using the proposed framework significantly outperforms the combination of all models and the homogeneous and heterogeneous selection heuristics, demonstrating the importance of having a proper selection scheme. Source code, figures and dataset splits can be found in the GitHub repository: </span></span></span><span>https://github.com/Menelau/Hate-Speech-MCS</span><svg><path></path></svg>.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91737144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1016/j.osnem.2021.100196
Leonardo Tonetto , Malintha Adikari , Nitinder Mohan , Aaron Yi Ding , Jörg Ott
Human mobility shapes our daily lives, our urban environment and even the trajectory of a global pandemic. While various aspects of human mobility and inter-personal contact duration have already been studied separately, little is known about how these two key aspects of our daily lives are fundamentally connected. Better understanding of such interconnected human behaviors is crucial for studying infectious diseases, as well as opportunistic content forwarding. To address these deficiencies, we conducted a study on a mobile social network of human mobility and contact duration, using data from 71 persons based on GPS and Bluetooth logs for 2 months in 2018. We augment these data with location APIs, enabling a finer granular characterization of the users’ mobility in addition to contact patterns. We model stops durations to reveal how time-unbounded-stops (e.g., bars or restaurants) follow a log-normal distribution while time-bounded-stops (e.g., offices, hotels) follow a power-law distribution. Furthermore, our analysis reveals contact duration adheres to a log-normal distribution, which we use to model the duration of contacts as a function of the duration of stays. We further extend our understanding of contact duration during trips by modeling these times as a Weibull distribution whose parameters are a function of trip length. These results could better inform models for information or epidemic spreading, helping guide the future design of network protocols as well as policy decisions.
{"title":"Contact duration: Intricacies of human mobility","authors":"Leonardo Tonetto , Malintha Adikari , Nitinder Mohan , Aaron Yi Ding , Jörg Ott","doi":"10.1016/j.osnem.2021.100196","DOIUrl":"https://doi.org/10.1016/j.osnem.2021.100196","url":null,"abstract":"<div><p>Human mobility shapes our daily lives, our urban environment and even the trajectory of a global pandemic. While various aspects of human mobility and inter-personal contact duration have already been studied separately, little is known about how these two key aspects of our daily lives are fundamentally connected. Better understanding of such interconnected human behaviors is crucial for studying infectious diseases, as well as opportunistic content forwarding. To address these deficiencies, we conducted a study on a mobile social network of human mobility and contact duration, using data from 71 persons based on GPS and Bluetooth logs for 2 months in 2018. We augment these data with location APIs, enabling a finer granular characterization of the users’ mobility in addition to contact patterns. We model stops durations to reveal how time-unbounded-stops (<em>e.g.</em>, bars or restaurants) follow a log-normal distribution while time-bounded-stops (<em>e.g.</em>, offices, hotels) follow a power-law distribution. Furthermore, our analysis reveals contact duration adheres to a log-normal distribution, which we use to model the duration of contacts as a function of the duration of stays. We further extend our understanding of contact duration during trips by modeling these times as a Weibull distribution whose parameters are a function of trip length. These results could better inform models for information or epidemic spreading, helping guide the future design of network protocols as well as policy decisions.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696421000720/pdfft?md5=3f4081e0dafc13110ea3b0ba03ef6285&pid=1-s2.0-S2468696421000720-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91696282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1016/j.osnem.2022.100201
Billy Spann , Esther Mead , Maryam Maleki , Nitin Agarwal , Therese Williams
This research proposes a conceptual framework for determining the adoption trajectory of information diffusion in connective action campaigns. This approach reveals whether an information campaign is accelerating, reached critical mass, or decelerating during its life cycle. The experimental approach taken in this study builds on the diffusion of innovations theory, critical mass theory, and previous s-shaped production function research to provide ideas for modeling future connective action campaigns. Most social science research on connective action has taken a qualitative approach. There are limited quantitative studies, but most focus on statistical validation of the qualitative approach, such as surveys, or only focus on one aspect of connective action. In this study, we extend the social science research on connective action theory by applying a mixed-method computational analysis to examine the affordances and features offered through online social networks (OSNs) and then present a new method to quantify the emergence of these action networks. Using the s-curves revealed through plotting the information campaigns usage, we apply a diffusion of innovations lens to the analysis to categorize users into different stages of adoption of information campaigns. We then categorize the users in each campaign by examining their affordance and interdependence relationships by assigning retweets, mentions, and original tweets to the type of relationship they exhibit. The contribution of this analysis provides a foundation for mathematical characterization of connective action signatures, and further, offers policymakers insights about campaigns as they evolve. To evaluate our framework, we present a comprehensive analysis of COVID-19 Twitter data. Establishing this theoretical framework will help researchers develop predictive models to more accurately model campaign dynamics.
{"title":"Applying diffusion of innovations theory to social networks to understand the stages of adoption in connective action campaigns","authors":"Billy Spann , Esther Mead , Maryam Maleki , Nitin Agarwal , Therese Williams","doi":"10.1016/j.osnem.2022.100201","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100201","url":null,"abstract":"<div><p><span>This research proposes a conceptual framework for determining the adoption trajectory of information diffusion in connective action campaigns. This approach reveals whether an information campaign is accelerating, reached critical mass, or decelerating during its life cycle. The experimental approach taken in this study builds on the diffusion of innovations theory, critical mass theory, and previous s-shaped production function research to provide ideas for modeling future connective action campaigns. Most social science research on connective action has taken a qualitative approach. There are limited quantitative studies, but most focus on statistical validation of the qualitative approach, such as surveys, or only focus on one aspect of connective action. In this study, we extend the social science research on connective action theory by applying a mixed-method computational analysis to examine the affordances and features offered through </span>online social networks (OSNs) and then present a new method to quantify the emergence of these action networks. Using the s-curves revealed through plotting the information campaigns usage, we apply a diffusion of innovations lens to the analysis to categorize users into different stages of adoption of information campaigns. We then categorize the users in each campaign by examining their affordance and interdependence relationships by assigning retweets, mentions, and original tweets to the type of relationship they exhibit. The contribution of this analysis provides a foundation for mathematical characterization of connective action signatures, and further, offers policymakers insights about campaigns as they evolve. To evaluate our framework, we present a comprehensive analysis of COVID-19 Twitter data. Establishing this theoretical framework will help researchers develop predictive models to more accurately model campaign dynamics.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90019833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.1016/j.osnem.2022.100198
Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley
Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.
{"title":"Stabilizing a supervised bot detection algorithm: How much data is needed for consistent predictions?","authors":"Lynnette Hui Xian Ng, Dawn C. Robertson, Kathleen M. Carley","doi":"10.1016/j.osnem.2022.100198","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100198","url":null,"abstract":"<div><p>Social media bots have been characterized in their use in digital activism and information manipulation, due to their roles in information diffusion. The detection of bots has been a major task within the field of social media computation, and many datasets and bot detection algorithms have been developed. With these algorithms, the bot score stability is key in estimating the impact of bots on the diffusion of information. Within several experiments on Twitter agents, we quantify the amount of data required for consistent bot predictions and analyze agent bot classification behavior. Through this study, we developed a methodology to establish parameters for stabilizing the bot probability score through threshold, temporal and volume analysis, eventually quantifying suitable threshold values for bot classification (i.e. whether the agent is a bot or not) and reasonable data collection size (i.e. number of days of tweets or number of tweets) for stable scores and bot classification.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000027/pdfft?md5=879d4a241d8634d464a12524eaf23546&pid=1-s2.0-S2468696422000027-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91696283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.osnem.2021.100184
Valerio Arnaboldi , Marco Conti , Andrea Passarella , Robin I.M. Dunbar
{"title":"Erratum to Online Social Networks and information diffusion: The role of ego networks: Online Social Networks and Media, Volume 1 (June 2017), Pages 44-55","authors":"Valerio Arnaboldi , Marco Conti , Andrea Passarella , Robin I.M. Dunbar","doi":"10.1016/j.osnem.2021.100184","DOIUrl":"10.1016/j.osnem.2021.100184","url":null,"abstract":"","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696421000628/pdfft?md5=7b90bb651c421f310f601ebc13af3388&pid=1-s2.0-S2468696421000628-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131380779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.osnem.2021.100180
Jaeheon Kim , Donghee Yvette Wohn , Meeyoung Cha
The latest advances in NLP (natural language processing) have led to the launch of the much needed machine-driven toxic chat detection. Nevertheless, people continuously find new forms of hateful expressions that are easily identified by humans, but not by machines. One such common expression is the mix of text and emotes, a type of visual toxic chat that is increasingly used to evade algorithmic moderation and a trend that is an under-studied aspect of the problem of online toxicity. This research analyzes chat conversations from the popular streaming platform Twitch to understand the varied types of visual toxic chat. Emotes were sometimes used to replace a letter, seek attention, or for emotional expression. We created a labeled dataset that contains 29,721 cases of emotes replacing letters. Based on the dataset, we built a neural network classifier and identified visual toxic chat that would otherwise be undetected through traditional methods and caught an additional 1.3% examples of toxic chat out of 15 million chat utterances.
{"title":"Understanding and identifying the use of emotes in toxic chat on Twitch","authors":"Jaeheon Kim , Donghee Yvette Wohn , Meeyoung Cha","doi":"10.1016/j.osnem.2021.100180","DOIUrl":"10.1016/j.osnem.2021.100180","url":null,"abstract":"<div><p>The latest advances in NLP (natural language processing) have led to the launch of the much needed machine-driven toxic chat detection. Nevertheless, people continuously find new forms of hateful expressions that are easily identified by humans, but not by machines. One such common expression is the mix of text and emotes, a type of visual toxic chat that is increasingly used to evade algorithmic moderation and a trend that is an under-studied aspect of the problem of online toxicity. This research analyzes chat conversations from the popular streaming platform Twitch to understand the varied types of visual toxic chat. Emotes were sometimes used to replace a letter, seek attention, or for emotional expression. We created a labeled dataset that contains 29,721 cases of emotes replacing letters. Based on the dataset, we built a neural network classifier and identified visual toxic chat that would otherwise be undetected through traditional methods and caught an additional 1.3% examples of toxic chat out of 15 million chat utterances.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696421000598/pdfft?md5=74d9b0d4cdd5859c36ea8a0c200c176d&pid=1-s2.0-S2468696421000598-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123624066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.osnem.2021.100182
Aiqi Jiang , Xiaohan Yang , Yang Liu , Arkaitz Zubiaga
Online sexism has become an increasing concern in social media platforms as it has affected the healthy development of the Internet and can have negative effects in society. While research in the sexism detection domain is growing, most of this research focuses on English as the language and on Twitter as the platform. Our objective here is to broaden the scope of this research by considering the Chinese language on Sina Weibo. We propose the first Chinese sexism dataset – Sina Weibo Sexism Review (SWSR) dataset –, as well as a large Chinese lexicon SexHateLex made of abusive and gender-related terms. We introduce our data collection and annotation process, and provide an exploratory analysis of the dataset characteristics to validate its quality and to show how sexism is manifested in Chinese. The SWSR dataset provides labels at different levels of granularity including (i) sexism or non-sexism, (ii) sexism category and (iii) target type, which can be exploited, among others, for building computational methods to identify and investigate finer-grained gender-related abusive language. We conduct experiments for the three sexism classification tasks making use of state-of-the-art machine learning models. Our results show competitive performance, providing a benchmark for sexism detection in the Chinese language, as well as an error analysis highlighting open challenges needing more research in Chinese NLP. The SWSR dataset and SexHateLex lexicon are publicly available.1
{"title":"SWSR: A Chinese dataset and lexicon for online sexism detection","authors":"Aiqi Jiang , Xiaohan Yang , Yang Liu , Arkaitz Zubiaga","doi":"10.1016/j.osnem.2021.100182","DOIUrl":"10.1016/j.osnem.2021.100182","url":null,"abstract":"<div><p><span>Online sexism has become an increasing concern in social media platforms<span> as it has affected the healthy development of the Internet and can have negative effects in society. While research in the sexism detection domain is growing, most of this research focuses on English as the language and on Twitter as the platform. Our objective here is to broaden the scope of this research by considering the Chinese language on Sina Weibo. We propose the first Chinese sexism dataset – Sina Weibo Sexism Review (SWSR) dataset –, as well as a large Chinese lexicon SexHateLex made of abusive and gender-related terms. We introduce our data collection and annotation process, and provide an exploratory analysis of the dataset characteristics to validate its quality and to show how sexism is manifested in Chinese. The SWSR dataset provides labels at different levels of granularity<span><span> including (i) sexism or non-sexism, (ii) sexism category and (iii) target type, which can be exploited, among others, for building computational methods to identify and investigate finer-grained gender-related abusive language. We conduct experiments for the three sexism classification tasks making use of state-of-the-art </span>machine learning models. Our results show competitive performance, providing a benchmark for sexism detection in the Chinese language, as well as an error analysis highlighting open challenges needing more research in Chinese NLP. The SWSR dataset and SexHateLex lexicon are publicly available.</span></span></span><span><sup>1</sup></span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124038451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1016/j.osnem.2021.100178
James R. Ashford , Liam D. Turner , Roger M. Whitaker , Alun Preece , Diane Felmlee
Online social networks serve as a convenient way to connect, share, and promote content with others. As a result, these networks can be used with malicious intent, causing disruption and harm to public debate through the sharing of misinformation. However, automatically identifying such content through its use of natural language is a significant challenge compared to our solution which uses less computational resources, language-agnostic and without the need for complex semantic analysis. Consequently alternative and complementary approaches are highly valuable. In this paper, we assess content that has the potential for misinformation and focus on patterns of user association with online social media communities (subreddits) in the popular Reddit social media platform, and generate networks of behaviour capturing user interaction with different subreddits. We examine these networks using both global and local metrics, in particular noting the presence of induced substructures (graphlets) assessing posts from 96,634 users. From subreddits identified as having potential for misinformation, we note that the associated networks have strongly defined local features relating to node degree — these are evident both from analysis of dominant graphlets and degree-related global metrics. We find that these local features support high accuracy classification of subreddits that are categorised as having the potential for misinformation. Consequently we observe that induced local substructures of high degree are fundamental metrics for subreddit classification, and support automatic detection capabilities for online misinformation independent from any particular language.
{"title":"Understanding the characteristics of COVID-19 misinformation communities through graphlet analysis","authors":"James R. Ashford , Liam D. Turner , Roger M. Whitaker , Alun Preece , Diane Felmlee","doi":"10.1016/j.osnem.2021.100178","DOIUrl":"10.1016/j.osnem.2021.100178","url":null,"abstract":"<div><p>Online social networks serve as a convenient way to connect, share, and promote content with others. As a result, these networks can be used with malicious intent, causing disruption and harm to public debate through the sharing of misinformation. However, automatically identifying such content through its use of natural language is a significant challenge compared to our solution which uses less computational resources, language-agnostic and without the need for complex semantic analysis. Consequently alternative and complementary approaches are highly valuable. In this paper, we assess content that has the potential for misinformation and focus on patterns of user association with online social media communities (subreddits) in the popular Reddit social media platform, and generate networks of behaviour capturing user interaction with different subreddits. We examine these networks using both global and local metrics, in particular noting the presence of induced substructures (graphlets) assessing <span><math><mrow><mn>7</mn><mo>,</mo><mn>876</mn><mo>,</mo><mn>064</mn></mrow></math></span> posts from 96,634 users. From subreddits identified as having potential for misinformation, we note that the associated networks have strongly defined local features relating to node degree — these are evident both from analysis of dominant graphlets and degree-related global metrics. We find that these local features support high accuracy classification of subreddits that are categorised as having the potential for misinformation. Consequently we observe that induced local substructures of high degree are fundamental metrics for subreddit classification, and support automatic detection capabilities for online misinformation independent from any particular language.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696421000586/pdfft?md5=7bf5933a81760cdedf22974545a1b7e2&pid=1-s2.0-S2468696421000586-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115540283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}