Pub Date : 2022-07-01DOI: 10.1016/j.osnem.2022.100210
Wenjie Yin, Arkaitz Zubiaga
While social media offers freedom of self-expression, abusive language carry significant negative social impact. Driven by the importance of the issue, research in the automated detection of abusive language has witnessed growth and improvement. However, these detection models display a reliance on strongly indicative keywords, such as slurs and profanity. This means that they can falsely (1a) miss abuse without such keywords or (1b) flag non-abuse with such keywords, and that (2) they perform poorly on unseen data. Despite the recognition of these problems, gaps and inconsistencies remain in the literature. In this study, we analyse the impact of keywords from dataset construction to model behaviour in detail, with a focus on how models make mistakes on (1a) and (1b), and how (1a) and (1b) interact with (2). Through the analysis, we provide suggestions for future research to address all three problems.
{"title":"Hidden behind the obvious: Misleading keywords and implicitly abusive language on social media","authors":"Wenjie Yin, Arkaitz Zubiaga","doi":"10.1016/j.osnem.2022.100210","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100210","url":null,"abstract":"<div><p>While social media offers freedom of self-expression, abusive language carry significant negative social impact. Driven by the importance of the issue, research in the automated detection of abusive language has witnessed growth and improvement. However, these detection models display a reliance on strongly indicative keywords, such as slurs and profanity. This means that they can falsely (1a) miss abuse without such keywords or (1b) flag non-abuse with such keywords, and that (2) they perform poorly on unseen data. Despite the recognition of these problems, gaps and inconsistencies remain in the literature. In this study, we analyse the impact of keywords from dataset construction to model behaviour in detail, with a focus on how models make mistakes on (1a) and (1b), and how (1a) and (1b) interact with (2). Through the analysis, we provide suggestions for future research to address all three problems.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000143/pdfft?md5=ee7d87179b98cdab8269c5284ee10fcf&pid=1-s2.0-S2468696422000143-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137054072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media is used to commit and detect crimes. With automated methods, it is possible to scale both crime and detection of crime to a large number of people. The ability of criminals to reach large numbers of people has made this area subject to frequent study, and consequently, there have been several surveys that have reviewed specific crimes committed on social platforms. Until now, there has not been a review article that considers all types of crimes on social media, their similarity as well as their detection. The demonstration of similarity between crimes and their detection methods allows for the transfer of techniques and data between domains. This survey, therefore, seeks to document the crimes that have been committed on social media, and demonstrate their similarity through a taxonomy of crimes. Also, this survey documents publicly available datasets. Finally, this survey provides suggestions for further research in this field.
{"title":"A social network of crime: A review of the use of social networks for crime and the detection of crime","authors":"Brett Drury , Samuel Morais Drury , Md Arafatur Rahman , Ihsan Ullah","doi":"10.1016/j.osnem.2022.100211","DOIUrl":"10.1016/j.osnem.2022.100211","url":null,"abstract":"<div><p>Social media is used to commit and detect crimes. With automated methods, it is possible to scale both crime and detection of crime to a large number of people. The ability of criminals to reach large numbers of people has made this area subject to frequent study, and consequently, there have been several surveys that have reviewed specific crimes committed on social platforms. Until now, there has not been a review article that considers all types of crimes on social media, their similarity as well as their detection. The demonstration of similarity between crimes and their detection methods allows for the transfer of techniques and data between domains. This survey, therefore, seeks to document the crimes that have been committed on social media, and demonstrate their similarity through a taxonomy of crimes. Also, this survey documents publicly available datasets. Finally, this survey provides suggestions for further research in this field.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000155/pdfft?md5=20a07ef98209445e0d56492856150415&pid=1-s2.0-S2468696422000155-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117257444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1016/j.osnem.2022.100208
Tu My Doan, Jon Atle Gulla
Political viewpoints identification (PVI) is a task in Natural Language Processing that takes political texts and recognizes the writer’s opinions towards a political matter. PVI reduces the ambiguity in texts by identifying the underlying meaning and clarifying the bias margin along the political spectrum (bias leaning). Thus, even non-experts can better understand political texts. For instance, they can identify misinformation, bias, and hidden political agendas. In this paper, we formally define the concept of political viewpoints identification, explain its importance and discuss to what extent current techniques can be used for extracting political views from text. Existing techniques address the problem of PVI inadequately. We outline their deficiencies and present a research agenda to advance PVI.
{"title":"A Survey on Political Viewpoints Identification","authors":"Tu My Doan, Jon Atle Gulla","doi":"10.1016/j.osnem.2022.100208","DOIUrl":"10.1016/j.osnem.2022.100208","url":null,"abstract":"<div><p>Political viewpoints identification (PVI) is a task in Natural Language Processing that takes political texts and recognizes the writer’s opinions towards a political matter. PVI reduces the ambiguity in texts by identifying the underlying meaning and clarifying the bias margin along the political spectrum (bias leaning). Thus, even non-experts can better understand political texts. For instance, they can identify misinformation, bias, and hidden political agendas. In this paper, we formally define the concept of political viewpoints identification, explain its importance and discuss to what extent current techniques can be used for extracting political views from text. Existing techniques address the problem of PVI inadequately. We outline their deficiencies and present a research agenda to advance PVI.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S246869642200012X/pdfft?md5=bd321c6c5936cd74474205188aafd644&pid=1-s2.0-S246869642200012X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114589922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1016/j.osnem.2022.100220
Lucio La Cava, Sergio Greco, Andrea Tagarelli
Decentralized Online Social Networks (DOSNs) represent a growing trend in the social media landscape, as opposed to the well-known centralized peers, which are often in the spotlight due to privacy concerns and a vision typically focused on monetization through user relationships. By exploiting open-source software, DOSNs allow users to create their own servers, or instances, thus favoring the proliferation of platforms that are independent yet interconnected with each other in a transparent way. Nonetheless, the resulting cooperation model, commonly known as the Fediverse, still represents a world to be fully discovered, since existing studies have mainly focused on a limited number of structural aspects of interest in DOSNs.
In this work, we aim to fill a lack of study on user relations and roles in DOSNs, by taking two main actions: understanding the impact of decentralization on how users relate to each other within their membership instance and/or across different instances, and unveiling user roles that can explain two interrelated axes of social behavioral phenomena, namely information consumption and boundary spanning. To this purpose, we build our analysis on user networks from Mastodon, since it represents the most widely used DOSN platform. We believe that the findings drawn from our study on Mastodon users’ roles and information flow can pave a way for further development of fascinating research on DOSNs.
分散式在线社交网络(Decentralized Online Social Networks,简称dosn)代表了社交媒体领域的一种增长趋势,与众所周知的中心化社交网络相反,中心化社交网络往往因为隐私问题和通过用户关系实现盈利的愿景而受到关注。通过利用开源软件,dosn允许用户创建自己的服务器或实例,从而有利于以透明的方式相互连接的独立平台的扩散。尽管如此,由此产生的合作模式,通常被称为Fediverse,仍然代表着一个有待充分发现的世界,因为现有的研究主要集中在对dosn感兴趣的有限数量的结构方面。在这项工作中,我们的目标是通过采取两项主要行动来填补dosn中用户关系和角色研究的不足:理解去中心化对用户在其成员实例内和/或跨不同实例之间如何相互关联的影响,并揭示可以解释两个相互关联的社会行为现象轴的用户角色,即信息消费和边界跨越。为此,我们在Mastodon的用户网络上进行分析,因为它代表了最广泛使用的DOSN平台。我们相信从乳齿象用户的角色和信息流的研究中得出的发现可以为进一步发展令人着迷的dosn研究铺平道路。
{"title":"Information consumption and boundary spanning in Decentralized Online Social Networks: The case of Mastodon users","authors":"Lucio La Cava, Sergio Greco, Andrea Tagarelli","doi":"10.1016/j.osnem.2022.100220","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100220","url":null,"abstract":"<div><p>Decentralized Online Social Networks<span><span> (DOSNs) represent a growing trend in the social media landscape, as opposed to the well-known centralized peers, which are often in the spotlight due to privacy concerns and a vision typically focused on monetization through user relationships. By exploiting open-source software, DOSNs allow users to create their own servers, or instances, thus favoring the proliferation of platforms that are independent yet interconnected with each other in a transparent way. Nonetheless, the resulting </span>cooperation model, commonly known as the Fediverse, still represents a world to be fully discovered, since existing studies have mainly focused on a limited number of structural aspects of interest in DOSNs.</span></p><p>In this work, we aim to fill a lack of study on user relations and roles in DOSNs, by taking two main actions: understanding the impact of decentralization on how users relate to each other within their membership instance and/or across different instances, and unveiling user roles that can explain two interrelated axes of social behavioral phenomena, namely information consumption and boundary spanning. To this purpose, we build our analysis on user networks from Mastodon, since it represents the most widely used DOSN platform. We believe that the findings drawn from our study on Mastodon users’ roles and information flow can pave a way for further development of fascinating research on DOSNs.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91623858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1016/j.osnem.2022.100207
Mustafa Toprak, Chiara Boldrini, Andrea Passarella, Marco Conti
Ego networks have proved to be a valuable tool for understanding the relationships that individuals establish with their peers, both in offline and online social networks. Particularly interesting are the cognitive constraints associated with the interactions between the ego and the members of their ego network, which limit individuals to maintain meaningful interactions with no more than 150 people, on average, and to arrange such relationships along concentric circles of decreasing engagement. In this work, we focus on the ego networks of journalists on Twitter, considering 17 different countries, and we investigate whether they feature the same characteristics observed for other relevant classes of Twitter users, like politicians and generic users. Our findings are that journalists are generally more active and interact with more people than generic users, regardless of their country. Their ego network structure is very aligned with reference models derived in anthropology and observed in general human ego networks. Remarkably, the similarity is even higher than the one of politicians and generic users ego networks. This may imply a greater cognitive involvement with Twitter for journalists than for other user categories. From a dynamic perspective, journalists have stable short-term relationships that do not change much over time. In the longer term, though, ego networks can be pretty dynamic, especially in the innermost circles. Moreover, the ego-alter ties of journalists are often information-driven, as they are mediated by hashtags both at their inception and during their lifetime. Finally, we found that relationships between journalists are assortative in popularity: journalists tend to engage with other journalists of similar popularity, in all layers but especially in their innermost ones. Instead, when journalists interact with generic users, this assortativity is only present in the innermost layers.
{"title":"Journalists’ ego networks in Twitter: Invariant and distinctive structural features","authors":"Mustafa Toprak, Chiara Boldrini, Andrea Passarella, Marco Conti","doi":"10.1016/j.osnem.2022.100207","DOIUrl":"10.1016/j.osnem.2022.100207","url":null,"abstract":"<div><p><span>Ego networks have proved to be a valuable tool for understanding the relationships that individuals establish with their peers, both in offline and online social networks. Particularly interesting are the </span><em>cognitive constraints</em><span> associated with the interactions between the ego and the members of their ego network, which limit individuals to maintain meaningful interactions with no more than 150 people, on average, and to arrange such relationships along concentric circles of decreasing engagement. In this work, we focus on the ego networks of journalists on Twitter, considering 17 different countries, and we investigate whether they feature the same characteristics observed for other relevant classes of Twitter users, like politicians and generic users. Our findings are that journalists are generally more active and interact with more people than generic users, regardless of their country. Their ego network structure is very aligned with reference models derived in anthropology and observed in general human ego networks. Remarkably, the similarity is even higher than the one of politicians and generic users ego networks. This may imply a greater cognitive involvement with Twitter for journalists than for other user categories. From a dynamic perspective, journalists have stable short-term relationships that do not change much over time. In the longer term, though, ego networks can be pretty dynamic, especially in the innermost circles. Moreover, the ego-alter ties of journalists are often information-driven, as they are mediated by hashtags both at their inception and during their lifetime. Finally, we found that relationships between journalists are assortative in popularity: journalists tend to engage with other journalists of similar popularity, in all layers but especially in their innermost ones. Instead, when journalists interact with generic users, this assortativity is only present in the innermost layers.</span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125330410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100199
Pantelis Agathangelou, Ioannis Katakis
Sentiment analysis is a fast-accelerating discipline that develops algorithms for knowledge discovery from opinionated content. The challenges however, when it comes to analyzing user reviews are plenty. Bad-quality, informal use of language and lack of labels, are only a few obstacles. Most importantly, users, consciously or subconsciously, use different approaches for expressing their opinion about a product or a service. Some of them go sentence by sentence mentioning some positive and negative aspects whereas others provide a mixed piece of text where the reader is supposed to see the big picture to understand the message. In this work, we propose a novel neural network that deals with both situations. Our method, by combining convolutional, recurrent and attention neural networks can extract rich linguistic patterns that reveal the user’s sentiment towards the entity under review. We evaluate our method in nine datasets that represent both binary and multi-class classification tasks. Experimental evaluation indicates that our method outperforms well-established deep learning approaches. Our approach outperformed the competitive methods in 8 out of 9 cases.
{"title":"Balancing between holistic and cumulative sentiment classification","authors":"Pantelis Agathangelou, Ioannis Katakis","doi":"10.1016/j.osnem.2022.100199","DOIUrl":"10.1016/j.osnem.2022.100199","url":null,"abstract":"<div><p>Sentiment analysis<span><span> is a fast-accelerating discipline that develops algorithms for knowledge discovery from opinionated content. The challenges however, when it comes to analyzing user reviews are plenty. Bad-quality, informal use of language and lack of labels, are only a few obstacles. Most importantly, users, consciously or subconsciously, use different approaches for expressing their opinion about a product or a service. Some of them go sentence by sentence mentioning some positive and negative aspects whereas others provide a mixed piece of text where the reader is supposed to see the big picture to understand the message. In this work, we propose a novel neural network that deals with both situations. Our method, by combining convolutional, </span>recurrent<span> and attention neural networks can extract rich linguistic patterns that reveal the user’s sentiment towards the entity under review. We evaluate our method in nine datasets that represent both binary and multi-class classification tasks<span>. Experimental evaluation indicates that our method outperforms well-established deep learning approaches. Our approach outperformed the competitive methods in 8 out of 9 cases.</span></span></span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126927757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100203
Nicolás E. Díaz Ferreyra , Tobias Hecking , Esma Aïmeur , Maritta Heisel , H. Ulrich Hoppe
Access-Control Lists (ACLs) (a.k.a. “friend lists”) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In principle, community detection algorithms can be leveraged to support the generation of ACLs by mapping a set of examples (i.e. contacts labelled as “untrusted”) to the emerging communities inside the user’s ego-network. However, unlike users’ access-control preferences, traditional community-detection algorithms do not take the homophily characteristics of such communities into account (i.e. attributes shared among members). Consequently, this strategy may lead to inaccurate ACL configurations and privacy breaches under certain homophily scenarios. This work investigates the use of community-detection algorithms for the automatic generation of ACLs in OSNs. Particularly, it analyses the performance of the aforementioned approach under different homophily conditions through a simulation model. Furthermore, since private information may reach the scope of untrusted recipients through the re-sharing affordances of OSNs, information diffusion processes are also modelled and taken explicitly into account. Altogether, the removal of gatekeeper nodes is further explored as a strategy to counteract unwanted data dissemination.
{"title":"Community detection for access-control decisions: Analysing the role of homophily and information diffusion in Online Social Networks","authors":"Nicolás E. Díaz Ferreyra , Tobias Hecking , Esma Aïmeur , Maritta Heisel , H. Ulrich Hoppe","doi":"10.1016/j.osnem.2022.100203","DOIUrl":"10.1016/j.osnem.2022.100203","url":null,"abstract":"<div><p>Access-Control Lists (ACLs) (a.k.a. “friend lists”) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In principle, community detection algorithms can be leveraged to support the generation of ACLs by mapping a set of examples (i.e. contacts labelled as “untrusted”) to the emerging communities inside the user’s ego-network. However, unlike users’ access-control preferences, traditional community-detection algorithms do not take the <em>homophily</em> characteristics of such communities into account (i.e. attributes shared among members). Consequently, this strategy may lead to inaccurate ACL configurations and privacy breaches under certain homophily scenarios. This work investigates the use of community-detection algorithms for the automatic generation of ACLs in OSNs. Particularly, it analyses the performance of the aforementioned approach under different homophily conditions through a simulation model. Furthermore, since private information may reach the scope of untrusted recipients through the re-sharing affordances of OSNs, information diffusion processes are also modelled and taken explicitly into account. Altogether, the removal of gatekeeper nodes is further explored as a strategy to counteract unwanted data dissemination.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000076/pdfft?md5=75c4fc7d96a2eb8f7b982d6070762c80&pid=1-s2.0-S2468696422000076-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129427434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100200
Alexandre Magno Sousa , Jussara M. Almeida , Flavio Figueiredo
A number of recent studies have explicitly introduced curiosity models into the analysis of online information consumption, most notably in the design of recommendation systems. However, most prior efforts have neglected the role of social influence as a component of the curiosity stimulation process, which has been referred to as social curiosity. In this paper, we propose a number of metrics to quantify social curiosity applying them to WhatsApp, a widely used communication platform. We show that our metrics capture aspects that are complementary to other variables priorly related to curiosity stimulation and use them to offer a broad characterization of user curiosity as a driving force behind communication in WhatsApp.
{"title":"Metrics of social curiosity: The WhatsApp case","authors":"Alexandre Magno Sousa , Jussara M. Almeida , Flavio Figueiredo","doi":"10.1016/j.osnem.2022.100200","DOIUrl":"10.1016/j.osnem.2022.100200","url":null,"abstract":"<div><p><span>A number of recent studies have explicitly introduced curiosity models into the analysis of online information consumption, most notably in the design of recommendation systems. However, most prior efforts have neglected the role of social influence as a component of the curiosity stimulation process, which has been referred to as </span><em>social curiosity</em>. In this paper, we propose a number of metrics to quantify social curiosity applying them to WhatsApp, a widely used communication platform. We show that our metrics capture aspects that are complementary to other variables priorly related to curiosity stimulation and use them to offer a broad characterization of user curiosity as a driving force behind communication in WhatsApp.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129796388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100204
Lucas E.B. Skora , Helen C.M. Senefonte , Myriam Regattieri Delgado , Ricardo Lüders , Thiago H. Silva
A better understanding of the behavior of tourists is strategic for improving services in the competitive and important economic segment of global tourism. Critical studies in the literature often explore the issue using traditional data, such as questionnaires or interviews. Traditional approaches provide precious information; however, they impose challenges to obtaining large-scale data, making it hard to study worldwide patterns. Location-based social networks (LBSNs) can potentially mitigate such issues due to the relatively low cost of acquiring large amounts of behavioral data. Nevertheless, before using such data for studying tourists’ behavior, it is necessary to verify whether the information adequately reveals the behavior measured with traditional data — considered the ground truth. Thus, the present work investigates in which countries the global tourism network measured with an LBSN agreeably reflects the behavior estimated by the World Tourism Organization using traditional methods. Although we could find exceptions, the results suggest that, for most countries, LBSN data can satisfactorily represent the behavior studied. We have an indication that, in countries with high correlations between results obtained from both datasets, LBSN data can be used in research regarding the mobility of the tourists in the studied context.
{"title":"Comparing global tourism flows measured by official census and social sensing","authors":"Lucas E.B. Skora , Helen C.M. Senefonte , Myriam Regattieri Delgado , Ricardo Lüders , Thiago H. Silva","doi":"10.1016/j.osnem.2022.100204","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100204","url":null,"abstract":"<div><p>A better understanding of the behavior of tourists is strategic for improving services in the competitive and important economic segment of global tourism. Critical studies in the literature often explore the issue using traditional data, such as questionnaires or interviews. Traditional approaches provide precious information; however, they impose challenges to obtaining large-scale data, making it hard to study worldwide patterns. Location-based social networks (LBSNs) can potentially mitigate such issues due to the relatively low cost of acquiring large amounts of behavioral data. Nevertheless, before using such data for studying tourists’ behavior, it is necessary to verify whether the information adequately reveals the behavior measured with traditional data — considered the ground truth. Thus, the present work investigates in which countries the global tourism network measured with an LBSN agreeably reflects the behavior estimated by the World Tourism Organization using traditional methods. Although we could find exceptions, the results suggest that, for most countries, LBSN data can satisfactorily represent the behavior studied. We have an indication that, in countries with high correlations between results obtained from both datasets, LBSN data can be used in research regarding the mobility of the tourists in the studied context.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137156824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-01DOI: 10.1016/j.osnem.2022.100205
Zhixue Zhao, Ziqi Zhang, Frank Hopfgartner
Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different social media platforms. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).
{"title":"Utilizing subjectivity level to mitigate identity term bias in toxic comments classification","authors":"Zhixue Zhao, Ziqi Zhang, Frank Hopfgartner","doi":"10.1016/j.osnem.2022.100205","DOIUrl":"10.1016/j.osnem.2022.100205","url":null,"abstract":"<div><p><span><span><span>Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in </span>false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on </span>BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different </span>social media platforms<span>. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).</span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117258021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}