Proceedings of the International AAAI Conference on Web and Social Media最新文献_第5页

Beyond Discrete Genres: Mapping News Items onto a Multidimensional Framework of Genre Cues 超越离散体裁:将新闻项目映射到体裁线索的多维框架

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22167

Zilin Lin, Kasper Welbers, Susan Vermeer, Damian Trilling

In the contemporary media landscape, with the vast and diverse supply of news, it is increasingly challenging to study such an enormous amount of items without a standardized framework. Although attempts have been made to organize and compare news items on the basis of news values, news genres receive little attention, especially the genres in a news consumer’s perception. Yet, perceived news genres serve as an essential component in exploring how news has developed, as well as a precondition for understanding media effects. We approach this concept by conceptualizing and operationalizing a non-discrete framework for mapping news items in terms of genre cues. As a starting point, we propose a preliminary set of dimensions consisting of “factuality” and “formality”. To automatically analyze a large amount of news items, we deliver two computational models for predicting news sentences in terms of the said two dimensions. Such predictions could then be used for locating news items within our framework. This proposed approach that positions news items upon a multidimensional grid helps deepening our insight into the evolving nature of news genres.

在当代媒体环境中，由于新闻供应的巨大和多样化，在没有标准化框架的情况下研究如此大量的项目越来越具有挑战性。虽然有人尝试根据新闻价值来组织和比较新闻项目，但新闻类型很少受到关注，特别是新闻消费者感知的类型。然而，感知新闻类型是探索新闻如何发展的重要组成部分，也是理解媒体效应的先决条件。我们通过概念化和操作化一个非离散框架来处理这个概念，该框架用于根据类型线索映射新闻项目。作为出发点，我们提出了一套由“事实性”和“形式性”组成的初步维度。为了自动分析大量的新闻条目，我们提供了两个基于上述两个维度的预测新闻句子的计算模型。这样的预测可以用来在我们的框架内定位新闻项目。这种将新闻项目置于多维网格上的建议方法有助于加深我们对新闻类型演变本质的洞察。

{"title":"Beyond Discrete Genres: Mapping News Items onto a Multidimensional Framework of Genre Cues","authors":"Zilin Lin, Kasper Welbers, Susan Vermeer, Damian Trilling","doi":"10.1609/icwsm.v17i1.22167","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22167","url":null,"abstract":"In the contemporary media landscape, with the vast and diverse supply of news, it is increasingly challenging to study such an enormous amount of items without a standardized framework. Although attempts have been made to organize and compare news items on the basis of news values, news genres receive little attention, especially the genres in a news consumer’s perception. Yet, perceived news genres serve as an essential component in exploring how news has developed, as well as a precondition for understanding media effects. We approach this concept by conceptualizing and operationalizing a non-discrete framework for mapping news items in terms of genre cues. As a starting point, we propose a preliminary set of dimensions consisting of “factuality” and “formality”. To automatically analyze a large amount of news items, we deliver two computational models for predicting news sentences in terms of the said two dimensions. Such predictions could then be used for locating news items within our framework. This proposed approach that positions news items upon a multidimensional grid helps deepening our insight into the evolving nature of news genres.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136040988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection 有效地表示多个注释者的标签选择，以提高仇恨言论检测

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22198

Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry

Supervised machine learning approaches often rely on a "ground truth" label. However, obtaining one label through majority voting ignores the important subjectivity information in tasks such hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.

监督式机器学习方法通常依赖于“基本事实”标签。然而，通过多数投票获得一个标签忽略了仇恨言论检测等任务中重要的主观性信息。现有的神经网络模型主要将标签作为分类变量，而忽略了不同标签文本中的语义信息。在本文中，我们提出了一种集成注释者特征和标签文本与基于转换器的模型来检测仇恨言论的首个架构AnnoBERT，通过协作主题回归(CTR)基于每个注释者的特征具有独特的表示，并集成标签文本以丰富文本表示。在训练过程中，模型将标注者与给定文本的标签选择相关联;在评估过程中，当标签信息不可用时，该模型利用学习到的关联预测参与注释者给出的聚合标签。该方法在检测仇恨言论方面具有优势，特别是在少数族裔和注释者意见不一致的边缘情况下。当数据集标签不平衡时，整体性能的改善最大，这表明它在识别现实世界的仇恨言论方面具有实用价值，因为与正常(非仇恨)言论相比，社交媒体上的野生仇恨言论的数量非常少。通过消融研究，我们展示了注释器嵌入和标签文本对模型性能的相对贡献，并测试了一系列替代的注释器嵌入和标签文本组合。

{"title":"AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection","authors":"Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry","doi":"10.1609/icwsm.v17i1.22198","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22198","url":null,"abstract":"Supervised machine learning approaches often rely on a \"ground truth\" label. However, obtaining one label through majority voting ignores the important subjectivity information in tasks such hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135909938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Authority without Care: Moral Values behind the Mask Mandate Response 没有关怀的权威:面具命令背后的道德价值

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22173

Yelena Mejova, Kyriaki Kalimeri, Gianmarco De Francisci Morales

Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's political backdrop. We monitor the discussion about masks on Twitter, which involves almost 600k users in a time span of 7 months. By using a combination of graph mining, natural language processing, topic modeling, content analysis, and time series analysis, we characterize the responses to the mask mandate of both those in favor and against them. We base our analysis on the theoretical frameworks of Moral Foundation Theory and Hofstede's cultural dimensions. Our results show that, while the anti-mask stance is associated with a conservative political leaning, the moral values expressed by its adherents diverge from the ones typically used by conservatives. In particular, the expected emphasis on the values of authority and purity is accompanied by an atypical dearth of in-group loyalty. We find that after the mandate, both pro- and anti-mask sides decrease their emphasis on care about others, and increase their attention on authority and fairness, further politicizing the issue. In addition, the mask mandate reverses the expression of Individualism-Collectivism between the two sides, with an increase of individualism in the anti-mask narrative, and a decrease in the pro-mask one. We argue that monitoring the dynamics of moral positioning is crucial for designing effective public health campaigns that are sensitive to the underlying values of the target audience.

口罩是预防COVID-19等空气传播疾病最便宜、最有效的非药物干预措施之一。不幸的是，他们遭到了相当一部分民众的抵制，尤其是在美国。在这项研究中，我们揭示了潜在的道德价值观，这些价值观支撑着人们对口罩禁令的反应，并将它们与国家的政治背景相对照。我们关注了推特上关于口罩的讨论，这些讨论在7个月的时间里涉及了近60万用户。通过结合使用图挖掘、自然语言处理、主题建模、内容分析和时间序列分析，我们描述了支持和反对掩码命令的人对掩码命令的反应。我们的分析基于道德基础理论和霍夫斯泰德的文化维度的理论框架。我们的研究结果表明，虽然反面具立场与保守的政治倾向有关，但其追随者所表达的道德价值观与保守主义者通常使用的道德价值观不同。特别是，对权威和纯洁价值观的预期强调伴随着群体内忠诚的非典型缺乏。我们发现，在授权之后，无论是支持还是反对戴面具的一方都减少了对他人的关注，增加了对权威和公平的关注，进一步将问题政治化。此外，面具强制令逆转了双方之间个人主义-集体主义的表达，反面具叙事中的个人主义增加，亲面具叙事中的个人主义减少。我们认为，监测道德定位的动态对于设计有效的公共卫生运动至关重要，这些运动对目标受众的潜在价值观敏感。

{"title":"Authority without Care: Moral Values behind the Mask Mandate Response","authors":"Yelena Mejova, Kyriaki Kalimeri, Gianmarco De Francisci Morales","doi":"10.1609/icwsm.v17i1.22173","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22173","url":null,"abstract":"Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's political backdrop. We monitor the discussion about masks on Twitter, which involves almost 600k users in a time span of 7 months. By using a combination of graph mining, natural language processing, topic modeling, content analysis, and time series analysis, we characterize the responses to the mask mandate of both those in favor and against them. We base our analysis on the theoretical frameworks of Moral Foundation Theory and Hofstede's cultural dimensions. Our results show that, while the anti-mask stance is associated with a conservative political leaning, the moral values expressed by its adherents diverge from the ones typically used by conservatives. In particular, the expected emphasis on the values of authority and purity is accompanied by an atypical dearth of in-group loyalty. We find that after the mandate, both pro- and anti-mask sides decrease their emphasis on care about others, and increase their attention on authority and fairness, further politicizing the issue. In addition, the mask mandate reverses the expression of Individualism-Collectivism between the two sides, with an increase of individualism in the anti-mask narrative, and a decrease in the pro-mask one. We argue that monitoring the dynamics of moral positioning is crucial for designing effective public health campaigns that are sensitive to the underlying values of the target audience.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135912561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Amplification Paradox in Recommender Systems 推荐系统中的放大悖论

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22223

Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West

Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In this paper, we explain the following apparent paradox: if the recommendation algorithm favors extreme content, why is it not driving its consumption? With a simple agent-based model where users attribute different utilities to items in the recommender system, we show through simulations that the collaborative-filtering nature of recommender systems and the nicheness of extreme content can resolve the apparent paradox: although blindly following recommendations would indeed lead users to niche content, users rarely consume niche content when given the option because it is of low utility to them, which can lead the recommender system to deamplify such content. Our results call for a nuanced interpretation of "algorithmic amplification" and highlight the importance of modeling the utility of content to users when auditing recommender systems. Code available: https://github.com/epfl-dlab/amplification_paradox.

对推荐系统的自动审计发现，盲目地遵循推荐会导致用户看到越来越多的党派、阴谋或虚假内容。与此同时，使用真实用户跟踪的研究表明，推荐系统并不是将注意力转向极端内容的主要驱动因素;相反，这些内容大多是通过其他途径获得的，例如通过其他网站。在本文中，我们解释了以下明显的悖论:如果推荐算法倾向于极端内容，为什么它不推动其消费?通过一个简单的基于代理的模型，用户将不同的实用程序属性赋予推荐系统中的项目，我们通过模拟表明，推荐系统的协同过滤特性和极端内容的细微性可以解决明显的悖论:虽然盲目地遵循推荐确实会将用户引向小众内容，但当用户有选择的时候，他们很少消费小众内容，因为小众内容对他们的实用性很低，这可能导致推荐系统去放大这些内容。我们的研究结果要求对“算法放大”进行细致入微的解释，并强调在审核推荐系统时对内容的实用性进行建模的重要性。可用代码:https://github.com/epfl-dlab/amplification_paradox。

{"title":"The Amplification Paradox in Recommender Systems","authors":"Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West","doi":"10.1609/icwsm.v17i1.22223","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22223","url":null,"abstract":"Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In this paper, we explain the following apparent paradox: if the recommendation algorithm favors extreme content, why is it not driving its consumption? With a simple agent-based model where users attribute different utilities to items in the recommender system, we show through simulations that the collaborative-filtering nature of recommender systems and the nicheness of extreme content can resolve the apparent paradox: although blindly following recommendations would indeed lead users to niche content, users rarely consume niche content when given the option because it is of low utility to them, which can lead the recommender system to deamplify such content. Our results call for a nuanced interpretation of \"algorithmic amplification\" and highlight the importance of modeling the utility of content to users when auditing recommender systems. Code available: https://github.com/epfl-dlab/amplification_paradox.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135912562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Identifying Influential Brokers on Social Media from Social Network Structure 从社交网络结构看社交媒体上有影响力的经纪人

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22193

Sho Tsugawa, Kohei Watabe

Identifying influencers in a given social network has become an important research problem for various applications, including accelerating the spread of information in viral marketing and preventing the spread of fake news and rumors. The literature contains a rich body of studies on identifying influential source spreaders who can spread their own messages to many other nodes. In contrast, the identification of influential brokers who can spread other nodes' messages to many nodes has not been fully explored. Theoretical and empirical studies suggest that involvement of both influential source spreaders and brokers is a key to facilitating large-scale information diffusion cascades. Therefore, this paper explores ways to identify influential brokers from a given social network. By using three social media datasets, we investigate the characteristics of influential brokers by comparing them with influential source spreaders and central nodes obtained from centrality measures. Our results show that (i) most of the influential source spreaders are not influential brokers (and vice versa) and (ii) the overlap between central nodes and influential brokers is small (less than 15%) in Twitter datasets. We also tackle the problem of identifying influential brokers from centrality measures and node embeddings, and we examine the effectiveness of social network features in the broker identification task. Our results show that (iii) although a single centrality measure cannot characterize influential brokers well, prediction models using node embedding features achieve F1 scores of 0.35--0.68, suggesting the effectiveness of social network features for identifying influential brokers.

识别特定社交网络中的影响者已经成为各种应用的重要研究问题，包括加速病毒式营销中的信息传播，防止假新闻和谣言的传播。文献中包含了大量关于识别有影响力的源传播者的研究，这些传播者可以将自己的信息传播到许多其他节点。相比之下，识别能够将其他节点的消息传播给许多节点的有影响力的代理还没有得到充分的探索。理论和实证研究表明，有影响力的源传播者和中间商的参与是促进大规模信息扩散级联的关键。因此，本文探讨了从给定的社会网络中识别有影响力的经纪人的方法。通过使用三个社交媒体数据集，我们将有影响力的经纪人与有影响力的源传播者和由中心性度量获得的中心节点进行比较，研究了有影响力的经纪人的特征。我们的结果表明(i)大多数有影响力的源传播者不是有影响力的经纪人(反之亦然)，(ii)在Twitter数据集中，中心节点和有影响力的经纪人之间的重叠很小(小于15%)。我们还解决了从中心性度量和节点嵌入中识别有影响力的经纪人的问题，并研究了社交网络特征在经纪人识别任务中的有效性。我们的研究结果表明:(iii)尽管单个中心性度量不能很好地表征有影响力的经纪人，但使用节点嵌入特征的预测模型的F1得分为0.35—0.68，表明社会网络特征在识别有影响力的经纪人方面是有效的。

{"title":"Identifying Influential Brokers on Social Media from Social Network Structure","authors":"Sho Tsugawa, Kohei Watabe","doi":"10.1609/icwsm.v17i1.22193","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22193","url":null,"abstract":"Identifying influencers in a given social network has become an important research problem for various applications, including accelerating the spread of information in viral marketing and preventing the spread of fake news and rumors. The literature contains a rich body of studies on identifying influential source spreaders who can spread their own messages to many other nodes. In contrast, the identification of influential brokers who can spread other nodes' messages to many nodes has not been fully explored. Theoretical and empirical studies suggest that involvement of both influential source spreaders and brokers is a key to facilitating large-scale information diffusion cascades. Therefore, this paper explores ways to identify influential brokers from a given social network. By using three social media datasets, we investigate the characteristics of influential brokers by comparing them with influential source spreaders and central nodes obtained from centrality measures. Our results show that (i) most of the influential source spreaders are not influential brokers (and vice versa) and (ii) the overlap between central nodes and influential brokers is small (less than 15%) in Twitter datasets. We also tackle the problem of identifying influential brokers from centrality measures and node embeddings, and we examine the effectiveness of social network features in the broker identification task. Our results show that (iii) although a single centrality measure cannot characterize influential brokers well, prediction models using node embedding features achieve F1 scores of 0.35--0.68, suggesting the effectiveness of social network features for identifying influential brokers.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media SexWEs:基于跨语言语义专门化的领域感知词嵌入在社交媒体中的中文性别歧视检测

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22159

Aiqi Jiang, Arkaitz Zubiaga

The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages.

性别歧视检测的目标是减少针对特定性别群体的负面在线内容。然而，标记的性别歧视相关数据集的有限可用性使得识别低资源语言的在线性别歧视成为问题。在本文中，我们解决了一种低资源语言——中文的社交媒体性别歧视自动检测任务。为了充分利用现有数据，我们开发了一个跨语言领域感知语义专业化系统，而不是收集新的性别歧视数据或建立跨语言迁移学习模型。语义专门化是一种通过将外部语言知识(如词典-语义关系)集成到专门化特征空间中来改造预训练的分布词向量的技术。为了做到这一点，我们利用高资源语言(英语)的性别歧视语义资源，将目标语言(汉语)中预训练的词向量特殊化，以注入领域知识。我们通过对词相似度的内在评价和对性别歧视检测的外在评价，证明了我们的框架特殊化的性别歧视词嵌入(SexWEs)的好处。与其他专业化方法和中文基线词向量相比，我们的SexWEs在内在和外在评价上的平均得分分别提高了0.033和0.064。实验结果和SexWEs的可视化也证明了我们的框架在低资源语言中对词向量进行改造的有效性。

{"title":"SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media","authors":"Aiqi Jiang, Arkaitz Zubiaga","doi":"10.1609/icwsm.v17i1.22159","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22159","url":null,"abstract":"The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135909941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Auditing Elon Musk’s Impact on Hate Speech and Bots 审计埃隆·马斯克对仇恨言论和机器人的影响

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22222

Daniel Hickey, Matheus Schmitz, Daniel Fessler, Paul E. Smaldino, Goran Muric, Keith Burghardt

On October 27th, 2022, Elon Musk purchased Twitter, becoming its new CEO and firing many top executives in the process. Musk listed fewer restrictions on content moderation and removal of spam bots among his goals for the platform. Given findings of prior research on moderation and hate speech in online communities, the promise of less strict content moderation poses the concern that hate will rise on Twitter. We examine the levels of hate speech and prevalence of bots before and after Musk's acquisition of the platform. We find that hate speech rose dramatically upon Musk purchasing Twitter and the prevalence of most types of bots increased, while the prevalence of astroturf bots decreased.

2022年10月27日，埃隆·马斯克收购了推特，成为其新任首席执行官，并在此过程中解雇了许多高管。马斯克将减少对内容审核的限制和消除垃圾邮件机器人列为他的平台目标。鉴于先前对网络社区节制和仇恨言论的研究结果，不那么严格的内容节制的承诺引发了人们对Twitter上仇恨情绪上升的担忧。我们研究了马斯克收购该平台前后仇恨言论的水平和机器人的流行程度。我们发现，在马斯克收购Twitter后，仇恨言论急剧增加，大多数类型的机器人的普及率都有所增加，而人造草坪机器人的普及率则有所下降。

引用次数: 4

Team Resilience under Shock: An Empirical Analysis of GitHub Repositories during Early COVID-19 Pandemic 冲击下的团队弹性:COVID-19大流行早期GitHub存储库的实证分析

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22170

Xuan Lu, Wei Ai, Yixin Wang, Qiaozhu Mei

While many organizations have shifted to working remotely during the COVID-19 pandemic, how the remote workforce and the remote teams are influenced by and would respond to this and future shocks remain largely unknown. Software developers have relied on remote collaborations long before the pandemic, working in virtual teams (GitHub repositories). The dynamics of these repositories through the pandemic provide a unique opportunity to understand how remote teams react under shock. This work presents a systematic analysis. We measure the overall effect of the early pandemic on public GitHub repositories by comparing their sizes and productivity with the counterfactual outcomes forecasted as if there were no pandemic. We find that the productivity level and the number of active members of these teams vary significantly during different periods of the pandemic. We then conduct a finer-grained investigation and study the heterogeneous effects of the shock on individual teams. We find that the resilience of a team is highly correlated to certain properties of the team before the pandemic. Through a bootstrapped regression analysis, we reveal which types of teams are robust or fragile to the shock.

虽然在2019冠状病毒病大流行期间，许多组织已转向远程工作，但远程工作人员和远程团队如何受到这种冲击以及如何应对这种冲击和未来的冲击，在很大程度上仍不得而知。早在疫情大流行之前，软件开发人员就依赖于远程协作，在虚拟团队(GitHub存储库)中工作。这些储存库在大流行期间的动态为了解远程团队如何应对冲击提供了独特的机会。这项工作提出了一个系统的分析。我们通过比较公共GitHub存储库的规模和生产力，以及在没有大流行的情况下预测的反事实结果，来衡量早期大流行对公共GitHub存储库的总体影响。我们发现，在大流行的不同时期，这些团队的生产力水平和活跃成员人数差异很大。然后，我们进行了更细致的调查，并研究了冲击对单个团队的异质影响。我们发现，一个团队的应变能力与该团队在大流行前的某些特性高度相关。通过自举回归分析，我们揭示了哪些类型的团队对冲击是稳健的或脆弱的。

{"title":"Team Resilience under Shock: An Empirical Analysis of GitHub Repositories during Early COVID-19 Pandemic","authors":"Xuan Lu, Wei Ai, Yixin Wang, Qiaozhu Mei","doi":"10.1609/icwsm.v17i1.22170","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22170","url":null,"abstract":"While many organizations have shifted to working remotely during the COVID-19 pandemic, how the remote workforce and the remote teams are influenced by and would respond to this and future shocks remain largely unknown. Software developers have relied on remote collaborations long before the pandemic, working in virtual teams (GitHub repositories). The dynamics of these repositories through the pandemic provide a unique opportunity to understand how remote teams react under shock. This work presents a systematic analysis. We measure the overall effect of the early pandemic on public GitHub repositories by comparing their sizes and productivity with the counterfactual outcomes forecasted as if there were no pandemic. We find that the productivity level and the number of active members of these teams vary significantly during different periods of the pandemic. We then conduct a finer-grained investigation and study the heterogeneous effects of the shock on individual teams. We find that the resilience of a team is highly correlated to certain properties of the team before the pandemic. Through a bootstrapped regression analysis, we reveal which types of teams are robust or fragile to the shock.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Identifying and Characterizing Behavioral Classes of Radicalization within the QAnon Conspiracy on Twitter 在推特上识别和描述QAnon阴谋中激进化的行为类别

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22197

Emily L. Wang, Luca Luceri, Francesco Pierri, Emilio Ferrara

Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspiracies often appear in the news, are part of political rhetoric, and are espoused by significant swaths of people in the United States. It is therefore crucial to understand how such a conspiracy took root online, and what led so many social media users to adopt its ideas. In this work, we propose a framework that exploits both social interaction and content signals to uncover evidence of user radicalization or support for QAnon. Leveraging a large dataset of 240M tweets collected in the run-up to the 2020 US Presidential election, we define and validate a multivariate metric of radicalization. We use that to separate users in distinct, naturally-emerging, classes of behaviors associated with radicalization processes, from self-declared QAnon supporters to hyper-active conspiracy promoters. We also analyze the impact of Twitter's moderation policies on the interactions among different classes: we discover aspects of moderation that succeed, yielding a substantial reduction in the endorsement received by hyperactive QAnon accounts. But we also uncover where moderation fails, showing how QAnon content amplifiers are not deterred or affected by the Twitter intervention. Our findings refine our understanding of online radicalization processes, reveal effective and ineffective aspects of moderation, and call for the need to further investigate the role social media play in the spread of conspiracies.

社交媒体提供了一个肥沃的土壤，阴谋论和激进思想可以蓬勃发展，接触到广泛的受众，有时会导致网络世界之外的仇恨或暴力。QAnon是一个引人注目的政治阴谋的例子，它始于社交媒体，但后来变成了主流，部分原因是有影响力的政治人物的公开支持。如今，QAnon阴谋论经常出现在新闻中，是政治修辞的一部分，并得到了美国相当一部分人的支持。因此，了解这样一个阴谋是如何在网上扎根的，以及是什么导致如此多的社交媒体用户接受它的观点，是至关重要的。在这项工作中，我们提出了一个框架，该框架利用社交互动和内容信号来发现用户激进化或支持QAnon的证据。利用在2020年美国总统大选之前收集的2.4亿条推文的大型数据集，我们定义并验证了激进化的多元指标。我们用它来区分不同的、自然出现的、与激进过程相关的行为类别的用户，从自称的QAnon支持者到极度活跃的阴谋推动者。我们还分析了Twitter的审核政策对不同阶层之间互动的影响:我们发现了审核成功的方面，从而大大减少了过度活跃的QAnon账户所获得的认可。但我们也发现了适度失败的地方，显示了QAnon内容放大者如何没有受到Twitter干预的阻止或影响。我们的研究结果完善了我们对网络激进化过程的理解，揭示了适度的有效和无效方面，并呼吁有必要进一步调查社交媒体在阴谋传播中所扮演的角色。

{"title":"Identifying and Characterizing Behavioral Classes of Radicalization within the QAnon Conspiracy on Twitter","authors":"Emily L. Wang, Luca Luceri, Francesco Pierri, Emilio Ferrara","doi":"10.1609/icwsm.v17i1.22197","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22197","url":null,"abstract":"Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspiracies often appear in the news, are part of political rhetoric, and are espoused by significant swaths of people in the United States. It is therefore crucial to understand how such a conspiracy took root online, and what led so many social media users to adopt its ideas. In this work, we propose a framework that exploits both social interaction and content signals to uncover evidence of user radicalization or support for QAnon. Leveraging a large dataset of 240M tweets collected in the run-up to the 2020 US Presidential election, we define and validate a multivariate metric of radicalization. We use that to separate users in distinct, naturally-emerging, classes of behaviors associated with radicalization processes, from self-declared QAnon supporters to hyper-active conspiracy promoters. We also analyze the impact of Twitter's moderation policies on the interactions among different classes: we discover aspects of moderation that succeed, yielding a substantial reduction in the endorsement received by hyperactive QAnon accounts. But we also uncover where moderation fails, showing how QAnon content amplifiers are not deterred or affected by the Twitter intervention. Our findings refine our understanding of online radicalization processes, reveal effective and ineffective aspects of moderation, and call for the need to further investigate the role social media play in the spread of conspiracies.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SciLander: Mapping the Scientific News Landscape SciLander:绘制科学新闻景观

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22144

Maurício Gruppi, Panayiotis Smeros, Sibel Adalı, Carlos Castillo, Karl Aberer

The COVID-19 pandemic has fueled the spread of misinformation on social media and the Web as a whole. The phenomenon dubbed `infodemic' has taken the challenges of information veracity and trust to new heights by massively introducing seemingly scientific and technical elements into misleading content. Despite the existing body of work on modeling and predicting misinformation, the coverage of very complex scientific topics with inherent uncertainty and an evolving set of findings, such as COVID-19, provides many new challenges that are not easily solved by existing tools. To address these issues, we introduce SciLander, a method for learning representations of news sources reporting on science-based topics. We extract four heterogeneous indicators for the sources; two generic indicators that capture (1) the copying of news stories between sources, and (2) the use of the same terms to mean different things (semantic shift), and two scientific indicators that capture (1) the usage of jargon and (2) the stance towards specific citations. We use these indicators as signals of source agreement, sampling pairs of positive (similar) and negative (dissimilar) samples, and combine them in a unified framework to train unsupervised news source embeddings with a triplet margin loss objective. We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020. Our results show that the features learned by our model outperform state-of-the-art baseline methods on the task of news veracity classification. Furthermore, a clustering analysis suggests that the learned representations encode information about the reliability, political leaning, and partisanship bias of these sources.

2019冠状病毒病大流行助长了社交媒体和整个网络上错误信息的传播。这种被称为“infodemic”的现象通过在误导性内容中大量引入看似科学和技术的元素，将信息真实性和可信度的挑战推向了新的高度。尽管已有大量关于建模和预测错误信息的工作，但非常复杂的科学主题(如COVID-19)具有固有的不确定性和一系列不断发展的发现，其覆盖范围带来了许多新的挑战，这些挑战无法通过现有工具轻松解决。为了解决这些问题，我们介绍了SciLander，这是一种学习基于科学主题的新闻来源报道表示的方法。我们提取了来源的四个异质指标;两个通用指标反映了(1)新闻报道在不同来源之间的复制，(2)使用相同的术语来表示不同的事物(语义转移)，两个科学指标反映了(1)行话的使用，(2)对特定引用的立场。我们使用这些指标作为源一致性的信号，正(相似)和负(不相似)样本的采样对，并将它们结合在一个统一的框架中，以三元组边际损失目标训练无监督新闻源嵌入。我们在一个新的COVID-19数据集上评估了我们的方法，该数据集包含自2020年大流行开始以来18个月内来自500个来源的近100万篇新闻文章。我们的结果表明，通过我们的模型学习的特征在新闻真实性分类任务上优于最先进的基线方法。此外，聚类分析表明，学习表征编码了有关这些来源的可靠性、政治倾向和党派偏见的信息。

{"title":"SciLander: Mapping the Scientific News Landscape","authors":"Maurício Gruppi, Panayiotis Smeros, Sibel Adalı, Carlos Castillo, Karl Aberer","doi":"10.1609/icwsm.v17i1.22144","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22144","url":null,"abstract":"The COVID-19 pandemic has fueled the spread of misinformation on social media and the Web as a whole. The phenomenon dubbed `infodemic' has taken the challenges of information veracity and trust to new heights by massively introducing seemingly scientific and technical elements into misleading content. Despite the existing body of work on modeling and predicting misinformation, the coverage of very complex scientific topics with inherent uncertainty and an evolving set of findings, such as COVID-19, provides many new challenges that are not easily solved by existing tools. To address these issues, we introduce SciLander, a method for learning representations of news sources reporting on science-based topics. We extract four heterogeneous indicators for the sources; two generic indicators that capture (1) the copying of news stories between sources, and (2) the use of the same terms to mean different things (semantic shift), and two scientific indicators that capture (1) the usage of jargon and (2) the stance towards specific citations. We use these indicators as signals of source agreement, sampling pairs of positive (similar) and negative (dissimilar) samples, and combine them in a unified framework to train unsupervised news source embeddings with a triplet margin loss objective. We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020. Our results show that the features learned by our model outperform state-of-the-art baseline methods on the task of news veracity classification. Furthermore, a clustering analysis suggests that the learned representations encode information about the reliability, political leaning, and partisanship bias of these sources.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"320 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136040990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0