Online Social Networks and Media最新文献_第7页

Information consumption and boundary spanning in Decentralized Online Social Networks: The case of Mastodon users 分散在线社交网络中的信息消费与边界跨越:以乳齿象用户为例

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-07-01 DOI: 10.1016/j.osnem.2022.100220

Lucio La Cava, Sergio Greco, Andrea Tagarelli

Decentralized Online Social Networks (DOSNs) represent a growing trend in the social media landscape, as opposed to the well-known centralized peers, which are often in the spotlight due to privacy concerns and a vision typically focused on monetization through user relationships. By exploiting open-source software, DOSNs allow users to create their own servers, or instances, thus favoring the proliferation of platforms that are independent yet interconnected with each other in a transparent way. Nonetheless, the resulting cooperation model, commonly known as the Fediverse, still represents a world to be fully discovered, since existing studies have mainly focused on a limited number of structural aspects of interest in DOSNs.

In this work, we aim to fill a lack of study on user relations and roles in DOSNs, by taking two main actions: understanding the impact of decentralization on how users relate to each other within their membership instance and/or across different instances, and unveiling user roles that can explain two interrelated axes of social behavioral phenomena, namely information consumption and boundary spanning. To this purpose, we build our analysis on user networks from Mastodon, since it represents the most widely used DOSN platform. We believe that the findings drawn from our study on Mastodon users’ roles and information flow can pave a way for further development of fascinating research on DOSNs.

分散式在线社交网络(Decentralized Online Social Networks，简称dosn)代表了社交媒体领域的一种增长趋势，与众所周知的中心化社交网络相反，中心化社交网络往往因为隐私问题和通过用户关系实现盈利的愿景而受到关注。通过利用开源软件，dosn允许用户创建自己的服务器或实例，从而有利于以透明的方式相互连接的独立平台的扩散。尽管如此，由此产生的合作模式，通常被称为Fediverse，仍然代表着一个有待充分发现的世界，因为现有的研究主要集中在对dosn感兴趣的有限数量的结构方面。在这项工作中，我们的目标是通过采取两项主要行动来填补dosn中用户关系和角色研究的不足:理解去中心化对用户在其成员实例内和/或跨不同实例之间如何相互关联的影响，并揭示可以解释两个相互关联的社会行为现象轴的用户角色，即信息消费和边界跨越。为此，我们在Mastodon的用户网络上进行分析，因为它代表了最广泛使用的DOSN平台。我们相信从乳齿象用户的角色和信息流的研究中得出的发现可以为进一步发展令人着迷的dosn研究铺平道路。

{"title":"Information consumption and boundary spanning in Decentralized Online Social Networks: The case of Mastodon users","authors":"Lucio La Cava, Sergio Greco, Andrea Tagarelli","doi":"10.1016/j.osnem.2022.100220","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100220","url":null,"abstract":"<div><p>Decentralized Online Social Networks<span><span> (DOSNs) represent a growing trend in the social media landscape, as opposed to the well-known centralized peers, which are often in the spotlight due to privacy concerns and a vision typically focused on monetization through user relationships. By exploiting open-source software, DOSNs allow users to create their own servers, or instances, thus favoring the proliferation of platforms that are independent yet interconnected with each other in a transparent way. Nonetheless, the resulting </span>cooperation model, commonly known as the Fediverse, still represents a world to be fully discovered, since existing studies have mainly focused on a limited number of structural aspects of interest in DOSNs.</span></p><p>In this work, we aim to fill a lack of study on user relations and roles in DOSNs, by taking two main actions: understanding the impact of decentralization on how users relate to each other within their membership instance and/or across different instances, and unveiling user roles that can explain two interrelated axes of social behavioral phenomena, namely information consumption and boundary spanning. To this purpose, we build our analysis on user networks from Mastodon, since it represents the most widely used DOSN platform. We believe that the findings drawn from our study on Mastodon users’ roles and information flow can pave a way for further development of fascinating research on DOSNs.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"30 ","pages":"Article 100220"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91623858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Journalists’ ego networks in Twitter: Invariant and distinctive structural features 记者在Twitter上的自我网络:不变的和独特的结构特征

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-07-01 DOI: 10.1016/j.osnem.2022.100207

Mustafa Toprak, Chiara Boldrini, Andrea Passarella, Marco Conti

Ego networks have proved to be a valuable tool for understanding the relationships that individuals establish with their peers, both in offline and online social networks. Particularly interesting are the cognitive constraints associated with the interactions between the ego and the members of their ego network, which limit individuals to maintain meaningful interactions with no more than 150 people, on average, and to arrange such relationships along concentric circles of decreasing engagement. In this work, we focus on the ego networks of journalists on Twitter, considering 17 different countries, and we investigate whether they feature the same characteristics observed for other relevant classes of Twitter users, like politicians and generic users. Our findings are that journalists are generally more active and interact with more people than generic users, regardless of their country. Their ego network structure is very aligned with reference models derived in anthropology and observed in general human ego networks. Remarkably, the similarity is even higher than the one of politicians and generic users ego networks. This may imply a greater cognitive involvement with Twitter for journalists than for other user categories. From a dynamic perspective, journalists have stable short-term relationships that do not change much over time. In the longer term, though, ego networks can be pretty dynamic, especially in the innermost circles. Moreover, the ego-alter ties of journalists are often information-driven, as they are mediated by hashtags both at their inception and during their lifetime. Finally, we found that relationships between journalists are assortative in popularity: journalists tend to engage with other journalists of similar popularity, in all layers but especially in their innermost ones. Instead, when journalists interact with generic users, this assortativity is only present in the innermost layers.

自我网络已被证明是一个有价值的工具，用于理解个人与同伴建立的关系，无论是在线下还是在线社交网络中。特别有趣的是与自我和自我网络成员之间的互动相关的认知约束，这限制了个体与平均不超过150人保持有意义的互动，并沿着减少参与度的同心圆安排这种关系。在这项工作中，我们关注17个不同国家的推特记者的自我网络，并调查他们是否具有其他相关类别的推特用户(如政治家和普通用户)所观察到的相同特征。我们的发现是，记者通常比普通用户更活跃，与更多的人互动，无论他们来自哪个国家。他们的自我网络结构与人类学中衍生的参考模型非常一致，并在一般的人类自我网络中观察到。值得注意的是，这种相似性甚至高于政治家和普通用户自我网络的相似性。这可能意味着，与其他用户类别相比，记者对Twitter的认知参与程度更高。从动态的角度来看，记者有稳定的短期关系，随着时间的推移不会发生太大变化。然而，从长远来看，自我网络可能是非常动态的，尤其是在最内部的圈子里。此外，记者的自我改变关系往往是由信息驱动的，因为他们在一开始和一生中都受到话题标签的调节。最后，我们发现记者之间的关系在受欢迎程度上是分类的:记者倾向于与其他受欢迎程度相似的记者交往，在所有层面上，尤其是在他们最内在的层面上。相反，当记者与普通用户互动时，这种分类性只存在于最内层。

{"title":"Journalists’ ego networks in Twitter: Invariant and distinctive structural features","authors":"Mustafa Toprak, Chiara Boldrini, Andrea Passarella, Marco Conti","doi":"10.1016/j.osnem.2022.100207","DOIUrl":"10.1016/j.osnem.2022.100207","url":null,"abstract":"<div><p><span>Ego networks have proved to be a valuable tool for understanding the relationships that individuals establish with their peers, both in offline and online social networks. Particularly interesting are the </span><em>cognitive constraints</em><span> associated with the interactions between the ego and the members of their ego network, which limit individuals to maintain meaningful interactions with no more than 150 people, on average, and to arrange such relationships along concentric circles of decreasing engagement. In this work, we focus on the ego networks of journalists on Twitter, considering 17 different countries, and we investigate whether they feature the same characteristics observed for other relevant classes of Twitter users, like politicians and generic users. Our findings are that journalists are generally more active and interact with more people than generic users, regardless of their country. Their ego network structure is very aligned with reference models derived in anthropology and observed in general human ego networks. Remarkably, the similarity is even higher than the one of politicians and generic users ego networks. This may imply a greater cognitive involvement with Twitter for journalists than for other user categories. From a dynamic perspective, journalists have stable short-term relationships that do not change much over time. In the longer term, though, ego networks can be pretty dynamic, especially in the innermost circles. Moreover, the ego-alter ties of journalists are often information-driven, as they are mediated by hashtags both at their inception and during their lifetime. Finally, we found that relationships between journalists are assortative in popularity: journalists tend to engage with other journalists of similar popularity, in all layers but especially in their innermost ones. Instead, when journalists interact with generic users, this assortativity is only present in the innermost layers.</span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"30 ","pages":"Article 100207"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125330410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Balancing between holistic and cumulative sentiment classification 平衡整体和累积的情绪分类

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100199

Pantelis Agathangelou, Ioannis Katakis

Sentiment analysis is a fast-accelerating discipline that develops algorithms for knowledge discovery from opinionated content. The challenges however, when it comes to analyzing user reviews are plenty. Bad-quality, informal use of language and lack of labels, are only a few obstacles. Most importantly, users, consciously or subconsciously, use different approaches for expressing their opinion about a product or a service. Some of them go sentence by sentence mentioning some positive and negative aspects whereas others provide a mixed piece of text where the reader is supposed to see the big picture to understand the message. In this work, we propose a novel neural network that deals with both situations. Our method, by combining convolutional, recurrent and attention neural networks can extract rich linguistic patterns that reveal the user’s sentiment towards the entity under review. We evaluate our method in nine datasets that represent both binary and multi-class classification tasks. Experimental evaluation indicates that our method outperforms well-established deep learning approaches. Our approach outperformed the competitive methods in 8 out of 9 cases.

情感分析是一门快速发展的学科，它开发了从自以为是的内容中发现知识的算法。然而，当涉及到分析用户评论时，挑战是很多的。语言质量差、使用不正式以及缺乏标签，这些只是少数障碍。最重要的是，用户有意识或潜意识地使用不同的方法来表达他们对产品或服务的意见。其中一些是一句一句地提到积极和消极的方面，而另一些则提供了一个混合的文本，读者应该看到大局来理解信息。在这项工作中，我们提出了一种新的神经网络来处理这两种情况。我们的方法结合了卷积、循环和注意力神经网络，可以提取丰富的语言模式，揭示用户对所审查实体的情感。我们在代表二元和多类分类任务的9个数据集中评估了我们的方法。实验评估表明，我们的方法优于成熟的深度学习方法。我们的方法在9个病例中有8个优于竞争方法。

{"title":"Balancing between holistic and cumulative sentiment classification","authors":"Pantelis Agathangelou, Ioannis Katakis","doi":"10.1016/j.osnem.2022.100199","DOIUrl":"10.1016/j.osnem.2022.100199","url":null,"abstract":"<div><p>Sentiment analysis<span><span> is a fast-accelerating discipline that develops algorithms for knowledge discovery from opinionated content. The challenges however, when it comes to analyzing user reviews are plenty. Bad-quality, informal use of language and lack of labels, are only a few obstacles. Most importantly, users, consciously or subconsciously, use different approaches for expressing their opinion about a product or a service. Some of them go sentence by sentence mentioning some positive and negative aspects whereas others provide a mixed piece of text where the reader is supposed to see the big picture to understand the message. In this work, we propose a novel neural network that deals with both situations. Our method, by combining convolutional, </span>recurrent<span> and attention neural networks can extract rich linguistic patterns that reveal the user’s sentiment towards the entity under review. We evaluate our method in nine datasets that represent both binary and multi-class classification tasks<span>. Experimental evaluation indicates that our method outperforms well-established deep learning approaches. Our approach outperformed the competitive methods in 8 out of 9 cases.</span></span></span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"29 ","pages":"Article 100199"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126927757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Community detection for access-control decisions: Analysing the role of homophily and information diffusion in Online Social Networks 访问控制决策的社区检测:分析在线社交网络中同质性和信息扩散的作用

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100203

Nicolás E. Díaz Ferreyra , Tobias Hecking , Esma Aïmeur , Maritta Heisel , H. Ulrich Hoppe

Access-Control Lists (ACLs) (a.k.a. “friend lists”) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In principle, community detection algorithms can be leveraged to support the generation of ACLs by mapping a set of examples (i.e. contacts labelled as “untrusted”) to the emerging communities inside the user’s ego-network. However, unlike users’ access-control preferences, traditional community-detection algorithms do not take the homophily characteristics of such communities into account (i.e. attributes shared among members). Consequently, this strategy may lead to inaccurate ACL configurations and privacy breaches under certain homophily scenarios. This work investigates the use of community-detection algorithms for the automatic generation of ACLs in OSNs. Particularly, it analyses the performance of the aforementioned approach under different homophily conditions through a simulation model. Furthermore, since private information may reach the scope of untrusted recipients through the re-sharing affordances of OSNs, information diffusion processes are also modelled and taken explicitly into account. Altogether, the removal of gatekeeper nodes is further explored as a strategy to counteract unwanted data dissemination.

访问控制列表(acl)(又名“朋友列表”)是在线社交网络(OSNs)最重要的隐私功能之一，因为它们允许用户限制其出版物的受众。然而，创建和维护自定义acl可能会给普通osn用户带来很高的认知负担，因为它通常需要评估大量联系人的可信度。原则上，社区检测算法可以通过将一组示例(即标记为“不可信”的联系人)映射到用户自我网络中的新兴社区来支持acl的生成。然而，与用户的访问控制偏好不同，传统的社区检测算法没有考虑到这些社区的同质性特征(即成员之间共享的属性)。因此，在某些同质性场景下，这种策略可能导致不准确的ACL配置和隐私泄露。这项工作研究了在osn中自动生成acl的社区检测算法的使用。特别地，通过仿真模型分析了上述方法在不同同态条件下的性能。此外，由于私有信息可能通过osn的再共享功能到达不受信任的接收者的范围，因此还对信息扩散过程进行了建模并明确考虑。总之，我们进一步探讨了删除守门人节点作为一种策略来抵制不必要的数据传播。

{"title":"Community detection for access-control decisions: Analysing the role of homophily and information diffusion in Online Social Networks","authors":"Nicolás E. Díaz Ferreyra , Tobias Hecking , Esma Aïmeur , Maritta Heisel , H. Ulrich Hoppe","doi":"10.1016/j.osnem.2022.100203","DOIUrl":"10.1016/j.osnem.2022.100203","url":null,"abstract":"<div><p>Access-Control Lists (ACLs) (a.k.a. “friend lists”) are one of the most important privacy features of Online Social Networks (OSNs) as they allow users to restrict the audience of their publications. Nevertheless, creating and maintaining custom ACLs can introduce a high cognitive burden on average OSNs users since it normally requires assessing the trustworthiness of a large number of contacts. In principle, community detection algorithms can be leveraged to support the generation of ACLs by mapping a set of examples (i.e. contacts labelled as “untrusted”) to the emerging communities inside the user’s ego-network. However, unlike users’ access-control preferences, traditional community-detection algorithms do not take the <em>homophily</em> characteristics of such communities into account (i.e. attributes shared among members). Consequently, this strategy may lead to inaccurate ACL configurations and privacy breaches under certain homophily scenarios. This work investigates the use of community-detection algorithms for the automatic generation of ACLs in OSNs. Particularly, it analyses the performance of the aforementioned approach under different homophily conditions through a simulation model. Furthermore, since private information may reach the scope of untrusted recipients through the re-sharing affordances of OSNs, information diffusion processes are also modelled and taken explicitly into account. Altogether, the removal of gatekeeper nodes is further explored as a strategy to counteract unwanted data dissemination.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"29 ","pages":"Article 100203"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000076/pdfft?md5=75c4fc7d96a2eb8f7b982d6070762c80&pid=1-s2.0-S2468696422000076-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129427434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Metrics of social curiosity: The WhatsApp case 社交好奇心指标:WhatsApp案例

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100200

Alexandre Magno Sousa , Jussara M. Almeida , Flavio Figueiredo

A number of recent studies have explicitly introduced curiosity models into the analysis of online information consumption, most notably in the design of recommendation systems. However, most prior efforts have neglected the role of social influence as a component of the curiosity stimulation process, which has been referred to as social curiosity. In this paper, we propose a number of metrics to quantify social curiosity applying them to WhatsApp, a widely used communication platform. We show that our metrics capture aspects that are complementary to other variables priorly related to curiosity stimulation and use them to offer a broad characterization of user curiosity as a driving force behind communication in WhatsApp.

最近的一些研究明确地将好奇心模型引入到在线信息消费的分析中，尤其是在推荐系统的设计中。然而，大多数先前的努力都忽略了社会影响作为好奇心刺激过程的一个组成部分的作用，这被称为社会好奇心。在本文中，我们提出了一些指标来量化社交好奇心，并将其应用于WhatsApp(一个广泛使用的通信平台)。我们发现，我们的指标捕捉到了与其他先前与好奇心刺激相关的变量相补充的方面，并利用它们提供了用户好奇心作为WhatsApp交流背后驱动力的广泛特征。

引用次数: 2

Comparing global tourism flows measured by official census and social sensing 比较官方人口普查和社会感知测量的全球旅游流量

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100204

Lucas E.B. Skora , Helen C.M. Senefonte , Myriam Regattieri Delgado , Ricardo Lüders , Thiago H. Silva

A better understanding of the behavior of tourists is strategic for improving services in the competitive and important economic segment of global tourism. Critical studies in the literature often explore the issue using traditional data, such as questionnaires or interviews. Traditional approaches provide precious information; however, they impose challenges to obtaining large-scale data, making it hard to study worldwide patterns. Location-based social networks (LBSNs) can potentially mitigate such issues due to the relatively low cost of acquiring large amounts of behavioral data. Nevertheless, before using such data for studying tourists’ behavior, it is necessary to verify whether the information adequately reveals the behavior measured with traditional data — considered the ground truth. Thus, the present work investigates in which countries the global tourism network measured with an LBSN agreeably reflects the behavior estimated by the World Tourism Organization using traditional methods. Although we could find exceptions, the results suggest that, for most countries, LBSN data can satisfactorily represent the behavior studied. We have an indication that, in countries with high correlations between results obtained from both datasets, LBSN data can be used in research regarding the mobility of the tourists in the studied context.

更好地了解游客的行为对于改善全球旅游业竞争激烈和重要的经济部门的服务具有战略意义。文献中的批判性研究通常使用传统数据(如问卷调查或访谈)来探索这个问题。传统方法提供了宝贵的信息;然而，它们给获取大规模数据带来了挑战，使得研究全球模式变得困难。基于位置的社交网络(LBSNs)可以潜在地缓解这些问题，因为获取大量行为数据的成本相对较低。然而，在使用这些数据来研究游客的行为之前，有必要验证这些信息是否充分揭示了传统数据所测量的行为——考虑到基本事实。因此，本研究调查了在哪些国家，用LBSN测量的全球旅游网络能很好地反映世界旅游组织使用传统方法估计的行为。尽管我们可以发现例外，但结果表明，对于大多数国家，LBSN数据可以令人满意地代表所研究的行为。我们有一个迹象表明，在从两个数据集获得的结果之间具有高度相关性的国家，LBSN数据可以用于研究研究背景下的游客流动性。

{"title":"Comparing global tourism flows measured by official census and social sensing","authors":"Lucas E.B. Skora , Helen C.M. Senefonte , Myriam Regattieri Delgado , Ricardo Lüders , Thiago H. Silva","doi":"10.1016/j.osnem.2022.100204","DOIUrl":"https://doi.org/10.1016/j.osnem.2022.100204","url":null,"abstract":"<div><p>A better understanding of the behavior of tourists is strategic for improving services in the competitive and important economic segment of global tourism. Critical studies in the literature often explore the issue using traditional data, such as questionnaires or interviews. Traditional approaches provide precious information; however, they impose challenges to obtaining large-scale data, making it hard to study worldwide patterns. Location-based social networks (LBSNs) can potentially mitigate such issues due to the relatively low cost of acquiring large amounts of behavioral data. Nevertheless, before using such data for studying tourists’ behavior, it is necessary to verify whether the information adequately reveals the behavior measured with traditional data — considered the ground truth. Thus, the present work investigates in which countries the global tourism network measured with an LBSN agreeably reflects the behavior estimated by the World Tourism Organization using traditional methods. Although we could find exceptions, the results suggest that, for most countries, LBSN data can satisfactorily represent the behavior studied. We have an indication that, in countries with high correlations between results obtained from both datasets, LBSN data can be used in research regarding the mobility of the tourists in the studied context.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"29 ","pages":"Article 100204"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137156824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Utilizing subjectivity level to mitigate identity term bias in toxic comments classification 利用主观性水平减轻有毒评论分类中的身份词偏差

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100205

Zhixue Zhao, Ziqi Zhang, Frank Hopfgartner

Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different social media platforms. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).

有毒评论分类模型经常被发现偏向于身份术语，即描述特定人群的术语，如“穆斯林”和“黑人”。这种偏见通常反映在假阳性预测中，即带有身份术语的无毒评论。在这项工作中，我们提出了一种新的方法来消除有毒评论分类模型的偏见，利用评论的主观性水平和身份术语的存在的概念。我们假设含有身份术语的有毒评论更有可能是主观感受或观点的表达。因此，包含身份术语的评论的主观性水平有助于对有毒评论进行分类，减轻身份术语偏见。为了实现这一思想，我们提出了一个基于BERT的模型，并研究了两种不同的主观水平测量方法。第一种方法使用基于词典的工具。第二种方法是基于计算评论和评论中标识词的相关维基百科文本之间的嵌入相似度的思想。我们在从不同的社交媒体平台收集的四个数据集的广泛收集上彻底评估了我们的方法。我们的研究结果表明:(1)我们的模型结合了主观性和身份术语的特征，始终优于强大的SOTA基线，与Twitter数据集相比，我们表现最好的模型的F1提高了4.75%;(2)我们基于与相关维基百科文本的相似度来衡量主观性的想法对有毒评论分类非常有效，因为我们使用的模型在4个数据集中的3个数据集上取得了最佳性能，同时在其余数据集上获得了比较性能。我们进一步在RoBERTa上测试了我们的方法，以评估我们方法的一般性，结果显示F1的最大改进高达1.29%(来自白人至上主义者在线论坛的数据集)。

{"title":"Utilizing subjectivity level to mitigate identity term bias in toxic comments classification","authors":"Zhixue Zhao, Ziqi Zhang, Frank Hopfgartner","doi":"10.1016/j.osnem.2022.100205","DOIUrl":"10.1016/j.osnem.2022.100205","url":null,"abstract":"<div><p><span><span><span>Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in </span>false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on </span>BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different </span>social media platforms<span>. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).</span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"29 ","pages":"Article 100205"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117258021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Localization of Unidentified Events with Raw Microblogging Data 基于微博原始数据的未知事件定位

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100209

Usman Anjum, Vladimir Zadorozhny, Prashant Krishnamurthy

Event localization is the task of finding the location of an event. Commonly, event localization using microblogging services, like Twitter, use con- tents of the messages and the geographical information associated with the messages. In this paper, we propose a novel approach called SPARE (SPAtial REconstruction) that bypasses the need for geographical or semantic information to localize tweets. We assume there are reference coordinates at known locations that scrape the microblog (tweet) counts in time and space (circular regions around the reference coordinate). The counts of tweets are aggregated which are then disaggregated to identify event patterns. The change in counts of tweets would be indicative of an event pattern. We show, using real data, that the change in counts of tweets is manifested as peaks. The peaks from multiple reference coordinates can be used as an input to trilateration techniques to pinpoint the location of an event. We introduce metrics to identify the quality of disaggregation of fine-grained data and examine techniques like filtering to improve accuracy of event location. The experimental results show that our method can identify the location of an event with high accuracy.

事件本地化是查找事件位置的任务。通常，使用微博服务(如Twitter)的事件本地化使用消息的内容和与消息相关的地理信息。在本文中，我们提出了一种名为SPARE (SPAtial REconstruction)的新方法，该方法绕过了对地理或语义信息的需求来定位推文。我们假设在已知位置存在参考坐标，这些参考坐标在时间和空间上抓取微博(tweet)计数(参考坐标周围的圆形区域)。tweet的计数被聚合，然后被分解以识别事件模式。tweet计数的变化将指示事件模式。我们使用真实数据显示，推文数量的变化表现为峰值。来自多个参考坐标的峰值可以用作三边测量技术的输入，以确定事件的位置。我们引入了度量来识别细粒度数据分解的质量，并研究了过滤等技术来提高事件定位的准确性。实验结果表明，该方法能较准确地识别出事件的位置。

{"title":"Localization of Unidentified Events with Raw Microblogging Data","authors":"Usman Anjum, Vladimir Zadorozhny, Prashant Krishnamurthy","doi":"10.1016/j.osnem.2022.100209","DOIUrl":"10.1016/j.osnem.2022.100209","url":null,"abstract":"<div><p><span><span>Event localization is the task of finding the location of an event. Commonly, event localization using microblogging services, like Twitter, use con- tents of the messages and the </span>geographical information<span> associated with the messages. In this paper, we propose a novel approach called SPARE (SPAtial REconstruction) that bypasses the need for geographical or semantic information to localize tweets. We assume there are reference coordinates at known locations that scrape the microblog (tweet) counts in time and space (circular regions around the reference coordinate). The counts of tweets are aggregated which are then disaggregated to identify event patterns. The change in counts of tweets would be indicative of an event pattern. We show, using real data, that the change in counts of tweets is manifested as peaks. The peaks from multiple reference coordinates can be used as an input to </span></span>trilateration techniques to pinpoint the location of an event. We introduce metrics to identify the quality of disaggregation of fine-grained data and examine techniques like filtering to improve accuracy of event location. The experimental results show that our method can identify the location of an event with high accuracy.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"29 ","pages":"Article 100209"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128221538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rumour spread minimization in social networks: A source-ignorant approach 社交网络中的谣言传播最小化:一种不了解来源的方法

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-05-01 DOI: 10.1016/j.osnem.2022.100206

Ahmad Zareie, Rizos Sakellariou

The spread of rumours in social networks has become a significant challenge in recent years. Blocking so-called critical edges, that is, edges that have a significant role in the spreading process, has attracted lots of attention as a means to minimize the spread of rumours. Although the detection of the sources of rumour may help identify critical edges this has an overhead that source-ignorant approaches are trying to eliminate. Several source-ignorant edge blocking methods have been proposed which mostly determine critical edges on the basis of centrality. Taking into account additional features of edges (beyond centrality) may help determine what edges to block more accurately. In this paper, a new source-ignorant method is proposed to identify a set of critical edges by considering for each edge the impact of blocking and the influence of the nodes connected to the edge. Experimental results demonstrate that the proposed method can identify critical edges more accurately in comparison to other source-ignorant methods.

近年来，谣言在社交网络上的传播已成为一个重大挑战。封锁所谓的临界边缘，即在传播过程中起重要作用的边缘，作为最小化谣言传播的一种手段，已经引起了很多关注。尽管对谣言来源的检测可能有助于确定关键边缘，但这有一个开销，无来源方法正在试图消除。提出了几种无源边缘阻塞方法，它们大多是基于中心性来确定临界边缘。考虑边缘的附加特征(除了中心性)可能有助于更准确地确定要阻塞哪些边缘。本文提出了一种新的无源边缘识别方法，该方法考虑了每条边缘的阻塞影响和与边缘相连的节点的影响。实验结果表明，与其他无源方法相比，该方法可以更准确地识别临界边缘。

{"title":"Rumour spread minimization in social networks: A source-ignorant approach","authors":"Ahmad Zareie, Rizos Sakellariou","doi":"10.1016/j.osnem.2022.100206","DOIUrl":"10.1016/j.osnem.2022.100206","url":null,"abstract":"<div><p>The spread of rumours in social networks has become a significant challenge in recent years. Blocking so-called critical edges, that is, edges that have a significant role in the spreading process, has attracted lots of attention as a means to minimize the spread of rumours. Although the detection of the sources of rumour may help identify critical edges this has an overhead that source-ignorant approaches are trying to eliminate. Several source-ignorant edge blocking methods have been proposed which mostly determine critical edges on the basis of centrality. Taking into account additional features of edges (beyond centrality) may help determine what edges to block more accurately. In this paper, a new source-ignorant method is proposed to identify a set of critical edges by considering for each edge the impact of blocking and the influence of the nodes connected to the edge. Experimental results demonstrate that the proposed method can identify critical edges more accurately in comparison to other source-ignorant methods.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"29 ","pages":"Article 100206"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468696422000106/pdfft?md5=5c46e8ade686686c561918b3c01408b9&pid=1-s2.0-S2468696422000106-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130196186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Selecting and combining complementary feature representations and classifiers for hate speech detection 选择和组合互补特征表示和分类器用于仇恨语音检测

Q1 Social Sciences

Online Social Networks and Media

Pub Date : 2022-03-01 DOI: 10.1016/j.osnem.2021.100194

Rafael M.O. Cruz , Woshington V. de Sousa , George D.C. Cavalcanti

Hate speech is a major issue in social networks due to the high volume of data generated daily. Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language. Many ML solutions for hate speech detection have been proposed by either changing how features are extracted from the text or the classification algorithm employed. However, most works consider only one type of feature extraction and classification algorithm. This work argues that a combination of multiple feature extraction techniques and different classification models is needed. We propose a framework to analyze the relationship between multiple feature extraction and classification techniques to understand how they complement each other. The framework is used to select a subset of complementary techniques to compose a robust multiple classifiers system (MCS) for hate speech detection. The experimental study considering four hate speech classification datasets demonstrates that the proposed framework is a promising methodology for analyzing and designing high-performing MCS for this task. MCS system obtained using the proposed framework significantly outperforms the combination of all models and the homogeneous and heterogeneous selection heuristics, demonstrating the importance of having a proper selection scheme. Source code, figures and dataset splits can be found in the GitHub repository: https://github.com/Menelau/Hate-Speech-MCS.

由于每天产生的大量数据，仇恨言论是社交网络中的一个主要问题。最近的研究表明，机器学习(ML)在处理区分仇恨帖子与讽刺或攻击性语言所需的细微差别方面非常有用。许多仇恨言论检测的机器学习解决方案都是通过改变从文本中提取特征的方式或采用分类算法来提出的。然而，大多数工作只考虑了一种特征提取和分类算法。本文认为，需要多种特征提取技术和不同的分类模型相结合。我们提出了一个框架来分析多种特征提取和分类技术之间的关系，以了解它们如何相互补充。该框架用于选择互补技术的子集，组成一个鲁棒的多分类器系统(MCS)用于仇恨言论检测。基于四个仇恨言论分类数据集的实验研究表明，所提出的框架是分析和设计高性能MCS的一种很有前途的方法。使用该框架获得的MCS系统显著优于所有模型和同质和异质选择启发式的组合，证明了选择方案的重要性。源代码、图表和数据集拆分可以在GitHub存储库中找到:https://github.com/Menelau/Hate-Speech-MCS。

{"title":"Selecting and combining complementary feature representations and classifiers for hate speech detection","authors":"Rafael M.O. Cruz , Woshington V. de Sousa , George D.C. Cavalcanti","doi":"10.1016/j.osnem.2021.100194","DOIUrl":"https://doi.org/10.1016/j.osnem.2021.100194","url":null,"abstract":"<div><p><span><span>Hate speech is a major issue in social networks due to the high volume of data generated daily. Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language. Many ML solutions for hate speech detection have been proposed by either changing how features are extracted from the text or the </span>classification algorithm<span><span><span> employed. However, most works consider only one type of feature extraction and classification algorithm. This work argues that a combination of multiple feature extraction techniques and different classification models is needed. We propose a framework to analyze the relationship between multiple feature extraction and </span>classification techniques to understand how they complement each other. The framework is used to select a subset of complementary techniques to compose a robust </span>multiple classifiers system<span> (MCS) for hate speech detection. The experimental study considering four hate speech classification datasets demonstrates that the proposed framework is a promising methodology for analyzing and designing high-performing MCS for this task. MCS system obtained using the proposed framework significantly outperforms the combination of all models and the homogeneous and heterogeneous selection heuristics, demonstrating the importance of having a proper selection scheme. Source code, figures and dataset splits can be found in the GitHub repository: </span></span></span><span>https://github.com/Menelau/Hate-Speech-MCS</span><svg><path></path></svg>.</p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"28 ","pages":"Article 100194"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91737144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5