首页 > 最新文献

Proceedings of the International AAAI Conference on Web and Social Media最新文献

英文 中文
Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness 资源分配中平衡公平与效率的民意支持:以网络广告提高福利计划意识为例
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22163
Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel
Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California’s Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.
从医疗保健到贷款,用算法优化有限资源的供应在各个领域都很常见。优化可以导致有效的资源分配,但如果没有进行额外的审查,也可能加剧不平等。人们对可接受的效率-公平权衡的普遍偏好知之甚少,这使得设计能够响应社区需求和愿望的算法变得困难。在这里,我们在GetCalFresh的背景下研究这种权衡和伴随的偏好,GetCalFresh是一种简化加州补充营养援助计划(SNAP,以前称为食品券)申请流程的在线服务。GetCalFresh通过在线广告来提高他们的多语言SNAP应用服务的认知度。我们首先证明,当广告被优化为每美元获得最多的注册人数时,由于非英语广告的成本相对较高,西班牙语使用者的注册人数不成比例地少。将这些结果嵌入一项针对不同美国人的调查(N = 1532)中,我们发现,除了效率之外,人们普遍支持重视公平:受访者普遍倾向于减少总入学人数,以促进西班牙语入学人数的增加。这些结果支持了最近重新评估算法资源分配中流行的以效率为中心的范式的呼吁。
{"title":"Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness","authors":"Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel","doi":"10.1609/icwsm.v17i1.22163","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22163","url":null,"abstract":"Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California’s Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection 有效地表示多个注释者的标签选择,以提高仇恨言论检测
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22198
Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry
Supervised machine learning approaches often rely on a "ground truth" label. However, obtaining one label through majority voting ignores the important subjectivity information in tasks such hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.
监督式机器学习方法通常依赖于“基本事实”标签。然而,通过多数投票获得一个标签忽略了仇恨言论检测等任务中重要的主观性信息。现有的神经网络模型主要将标签作为分类变量,而忽略了不同标签文本中的语义信息。在本文中,我们提出了一种集成注释者特征和标签文本与基于转换器的模型来检测仇恨言论的首个架构AnnoBERT,通过协作主题回归(CTR)基于每个注释者的特征具有独特的表示,并集成标签文本以丰富文本表示。在训练过程中,模型将标注者与给定文本的标签选择相关联;在评估过程中,当标签信息不可用时,该模型利用学习到的关联预测参与注释者给出的聚合标签。该方法在检测仇恨言论方面具有优势,特别是在少数族裔和注释者意见不一致的边缘情况下。当数据集标签不平衡时,整体性能的改善最大,这表明它在识别现实世界的仇恨言论方面具有实用价值,因为与正常(非仇恨)言论相比,社交媒体上的野生仇恨言论的数量非常少。通过消融研究,我们展示了注释器嵌入和标签文本对模型性能的相对贡献,并测试了一系列替代的注释器嵌入和标签文本组合。
{"title":"AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection","authors":"Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry","doi":"10.1609/icwsm.v17i1.22198","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22198","url":null,"abstract":"Supervised machine learning approaches often rely on a \"ground truth\" label. However, obtaining one label through majority voting ignores the important subjectivity information in tasks such hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135909938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Authority without Care: Moral Values behind the Mask Mandate Response 没有关怀的权威:面具命令背后的道德价值
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22173
Yelena Mejova, Kyriaki Kalimeri, Gianmarco De Francisci Morales
Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's political backdrop. We monitor the discussion about masks on Twitter, which involves almost 600k users in a time span of 7 months. By using a combination of graph mining, natural language processing, topic modeling, content analysis, and time series analysis, we characterize the responses to the mask mandate of both those in favor and against them. We base our analysis on the theoretical frameworks of Moral Foundation Theory and Hofstede's cultural dimensions. Our results show that, while the anti-mask stance is associated with a conservative political leaning, the moral values expressed by its adherents diverge from the ones typically used by conservatives. In particular, the expected emphasis on the values of authority and purity is accompanied by an atypical dearth of in-group loyalty. We find that after the mandate, both pro- and anti-mask sides decrease their emphasis on care about others, and increase their attention on authority and fairness, further politicizing the issue. In addition, the mask mandate reverses the expression of Individualism-Collectivism between the two sides, with an increase of individualism in the anti-mask narrative, and a decrease in the pro-mask one. We argue that monitoring the dynamics of moral positioning is crucial for designing effective public health campaigns that are sensitive to the underlying values of the target audience.
口罩是预防COVID-19等空气传播疾病最便宜、最有效的非药物干预措施之一。不幸的是,他们遭到了相当一部分民众的抵制,尤其是在美国。在这项研究中,我们揭示了潜在的道德价值观,这些价值观支撑着人们对口罩禁令的反应,并将它们与国家的政治背景相对照。我们关注了推特上关于口罩的讨论,这些讨论在7个月的时间里涉及了近60万用户。通过结合使用图挖掘、自然语言处理、主题建模、内容分析和时间序列分析,我们描述了支持和反对掩码命令的人对掩码命令的反应。我们的分析基于道德基础理论和霍夫斯泰德的文化维度的理论框架。我们的研究结果表明,虽然反面具立场与保守的政治倾向有关,但其追随者所表达的道德价值观与保守主义者通常使用的道德价值观不同。特别是,对权威和纯洁价值观的预期强调伴随着群体内忠诚的非典型缺乏。我们发现,在授权之后,无论是支持还是反对戴面具的一方都减少了对他人的关注,增加了对权威和公平的关注,进一步将问题政治化。此外,面具强制令逆转了双方之间个人主义-集体主义的表达,反面具叙事中的个人主义增加,亲面具叙事中的个人主义减少。我们认为,监测道德定位的动态对于设计有效的公共卫生运动至关重要,这些运动对目标受众的潜在价值观敏感。
{"title":"Authority without Care: Moral Values behind the Mask Mandate Response","authors":"Yelena Mejova, Kyriaki Kalimeri, Gianmarco De Francisci Morales","doi":"10.1609/icwsm.v17i1.22173","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22173","url":null,"abstract":"Face masks are one of the cheapest and most effective non-pharmaceutical interventions available against airborne diseases such as COVID-19. Unfortunately, they have been met with resistance by a substantial fraction of the populace, especially in the U.S. In this study, we uncover the latent moral values that underpin the response to the mask mandate, and paint them against the country's political backdrop. We monitor the discussion about masks on Twitter, which involves almost 600k users in a time span of 7 months. By using a combination of graph mining, natural language processing, topic modeling, content analysis, and time series analysis, we characterize the responses to the mask mandate of both those in favor and against them. We base our analysis on the theoretical frameworks of Moral Foundation Theory and Hofstede's cultural dimensions. Our results show that, while the anti-mask stance is associated with a conservative political leaning, the moral values expressed by its adherents diverge from the ones typically used by conservatives. In particular, the expected emphasis on the values of authority and purity is accompanied by an atypical dearth of in-group loyalty. We find that after the mandate, both pro- and anti-mask sides decrease their emphasis on care about others, and increase their attention on authority and fairness, further politicizing the issue. In addition, the mask mandate reverses the expression of Individualism-Collectivism between the two sides, with an increase of individualism in the anti-mask narrative, and a decrease in the pro-mask one. We argue that monitoring the dynamics of moral positioning is crucial for designing effective public health campaigns that are sensitive to the underlying values of the target audience.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135912561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Amplification Paradox in Recommender Systems 推荐系统中的放大悖论
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22223
Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West
Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In this paper, we explain the following apparent paradox: if the recommendation algorithm favors extreme content, why is it not driving its consumption? With a simple agent-based model where users attribute different utilities to items in the recommender system, we show through simulations that the collaborative-filtering nature of recommender systems and the nicheness of extreme content can resolve the apparent paradox: although blindly following recommendations would indeed lead users to niche content, users rarely consume niche content when given the option because it is of low utility to them, which can lead the recommender system to deamplify such content. Our results call for a nuanced interpretation of "algorithmic amplification" and highlight the importance of modeling the utility of content to users when auditing recommender systems. Code available: https://github.com/epfl-dlab/amplification_paradox.
对推荐系统的自动审计发现,盲目地遵循推荐会导致用户看到越来越多的党派、阴谋或虚假内容。与此同时,使用真实用户跟踪的研究表明,推荐系统并不是将注意力转向极端内容的主要驱动因素;相反,这些内容大多是通过其他途径获得的,例如通过其他网站。在本文中,我们解释了以下明显的悖论:如果推荐算法倾向于极端内容,为什么它不推动其消费?通过一个简单的基于代理的模型,用户将不同的实用程序属性赋予推荐系统中的项目,我们通过模拟表明,推荐系统的协同过滤特性和极端内容的细微性可以解决明显的悖论:虽然盲目地遵循推荐确实会将用户引向小众内容,但当用户有选择的时候,他们很少消费小众内容,因为小众内容对他们的实用性很低,这可能导致推荐系统去放大这些内容。我们的研究结果要求对“算法放大”进行细致入微的解释,并强调在审核推荐系统时对内容的实用性进行建模的重要性。可用代码:https://github.com/epfl-dlab/amplification_paradox。
{"title":"The Amplification Paradox in Recommender Systems","authors":"Manoel Horta Ribeiro, Veniamin Veselovsky, Robert West","doi":"10.1609/icwsm.v17i1.22223","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22223","url":null,"abstract":"Automated audits of recommender systems found that blindly following recommendations leads users to increasingly partisan, conspiratorial, or false content. At the same time, studies using real user traces suggest that recommender systems are not the primary driver of attention toward extreme content; on the contrary, such content is mostly reached through other means, e.g., other websites. In this paper, we explain the following apparent paradox: if the recommendation algorithm favors extreme content, why is it not driving its consumption? With a simple agent-based model where users attribute different utilities to items in the recommender system, we show through simulations that the collaborative-filtering nature of recommender systems and the nicheness of extreme content can resolve the apparent paradox: although blindly following recommendations would indeed lead users to niche content, users rarely consume niche content when given the option because it is of low utility to them, which can lead the recommender system to deamplify such content. Our results call for a nuanced interpretation of \"algorithmic amplification\" and highlight the importance of modeling the utility of content to users when auditing recommender systems. Code available: https://github.com/epfl-dlab/amplification_paradox.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135912562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying Influential Brokers on Social Media from Social Network Structure 从社交网络结构看社交媒体上有影响力的经纪人
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22193
Sho Tsugawa, Kohei Watabe
Identifying influencers in a given social network has become an important research problem for various applications, including accelerating the spread of information in viral marketing and preventing the spread of fake news and rumors. The literature contains a rich body of studies on identifying influential source spreaders who can spread their own messages to many other nodes. In contrast, the identification of influential brokers who can spread other nodes' messages to many nodes has not been fully explored. Theoretical and empirical studies suggest that involvement of both influential source spreaders and brokers is a key to facilitating large-scale information diffusion cascades. Therefore, this paper explores ways to identify influential brokers from a given social network. By using three social media datasets, we investigate the characteristics of influential brokers by comparing them with influential source spreaders and central nodes obtained from centrality measures. Our results show that (i) most of the influential source spreaders are not influential brokers (and vice versa) and (ii) the overlap between central nodes and influential brokers is small (less than 15%) in Twitter datasets. We also tackle the problem of identifying influential brokers from centrality measures and node embeddings, and we examine the effectiveness of social network features in the broker identification task. Our results show that (iii) although a single centrality measure cannot characterize influential brokers well, prediction models using node embedding features achieve F1 scores of 0.35--0.68, suggesting the effectiveness of social network features for identifying influential brokers.
识别特定社交网络中的影响者已经成为各种应用的重要研究问题,包括加速病毒式营销中的信息传播,防止假新闻和谣言的传播。文献中包含了大量关于识别有影响力的源传播者的研究,这些传播者可以将自己的信息传播到许多其他节点。相比之下,识别能够将其他节点的消息传播给许多节点的有影响力的代理还没有得到充分的探索。理论和实证研究表明,有影响力的源传播者和中间商的参与是促进大规模信息扩散级联的关键。因此,本文探讨了从给定的社会网络中识别有影响力的经纪人的方法。通过使用三个社交媒体数据集,我们将有影响力的经纪人与有影响力的源传播者和由中心性度量获得的中心节点进行比较,研究了有影响力的经纪人的特征。我们的结果表明(i)大多数有影响力的源传播者不是有影响力的经纪人(反之亦然),(ii)在Twitter数据集中,中心节点和有影响力的经纪人之间的重叠很小(小于15%)。我们还解决了从中心性度量和节点嵌入中识别有影响力的经纪人的问题,并研究了社交网络特征在经纪人识别任务中的有效性。我们的研究结果表明:(iii)尽管单个中心性度量不能很好地表征有影响力的经纪人,但使用节点嵌入特征的预测模型的F1得分为0.35—0.68,表明社会网络特征在识别有影响力的经纪人方面是有效的。
{"title":"Identifying Influential Brokers on Social Media from Social Network Structure","authors":"Sho Tsugawa, Kohei Watabe","doi":"10.1609/icwsm.v17i1.22193","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22193","url":null,"abstract":"Identifying influencers in a given social network has become an important research problem for various applications, including accelerating the spread of information in viral marketing and preventing the spread of fake news and rumors. The literature contains a rich body of studies on identifying influential source spreaders who can spread their own messages to many other nodes. In contrast, the identification of influential brokers who can spread other nodes' messages to many nodes has not been fully explored. Theoretical and empirical studies suggest that involvement of both influential source spreaders and brokers is a key to facilitating large-scale information diffusion cascades. Therefore, this paper explores ways to identify influential brokers from a given social network. By using three social media datasets, we investigate the characteristics of influential brokers by comparing them with influential source spreaders and central nodes obtained from centrality measures. Our results show that (i) most of the influential source spreaders are not influential brokers (and vice versa) and (ii) the overlap between central nodes and influential brokers is small (less than 15%) in Twitter datasets. We also tackle the problem of identifying influential brokers from centrality measures and node embeddings, and we examine the effectiveness of social network features in the broker identification task. Our results show that (iii) although a single centrality measure cannot characterize influential brokers well, prediction models using node embedding features achieve F1 scores of 0.35--0.68, suggesting the effectiveness of social network features for identifying influential brokers.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media SexWEs:基于跨语言语义专门化的领域感知词嵌入在社交媒体中的中文性别歧视检测
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22159
Aiqi Jiang, Arkaitz Zubiaga
The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages.
性别歧视检测的目标是减少针对特定性别群体的负面在线内容。然而,标记的性别歧视相关数据集的有限可用性使得识别低资源语言的在线性别歧视成为问题。在本文中,我们解决了一种低资源语言——中文的社交媒体性别歧视自动检测任务。为了充分利用现有数据,我们开发了一个跨语言领域感知语义专业化系统,而不是收集新的性别歧视数据或建立跨语言迁移学习模型。语义专门化是一种通过将外部语言知识(如词典-语义关系)集成到专门化特征空间中来改造预训练的分布词向量的技术。为了做到这一点,我们利用高资源语言(英语)的性别歧视语义资源,将目标语言(汉语)中预训练的词向量特殊化,以注入领域知识。我们通过对词相似度的内在评价和对性别歧视检测的外在评价,证明了我们的框架特殊化的性别歧视词嵌入(SexWEs)的好处。与其他专业化方法和中文基线词向量相比,我们的SexWEs在内在和外在评价上的平均得分分别提高了0.033和0.064。实验结果和SexWEs的可视化也证明了我们的框架在低资源语言中对词向量进行改造的有效性。
{"title":"SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media","authors":"Aiqi Jiang, Arkaitz Zubiaga","doi":"10.1609/icwsm.v17i1.22159","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22159","url":null,"abstract":"The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135909941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Auditing Elon Musk’s Impact on Hate Speech and Bots 审计埃隆·马斯克对仇恨言论和机器人的影响
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22222
Daniel Hickey, Matheus Schmitz, Daniel Fessler, Paul E. Smaldino, Goran Muric, Keith Burghardt
On October 27th, 2022, Elon Musk purchased Twitter, becoming its new CEO and firing many top executives in the process. Musk listed fewer restrictions on content moderation and removal of spam bots among his goals for the platform. Given findings of prior research on moderation and hate speech in online communities, the promise of less strict content moderation poses the concern that hate will rise on Twitter. We examine the levels of hate speech and prevalence of bots before and after Musk's acquisition of the platform. We find that hate speech rose dramatically upon Musk purchasing Twitter and the prevalence of most types of bots increased, while the prevalence of astroturf bots decreased.
2022年10月27日,埃隆·马斯克收购了推特,成为其新任首席执行官,并在此过程中解雇了许多高管。马斯克将减少对内容审核的限制和消除垃圾邮件机器人列为他的平台目标。鉴于先前对网络社区节制和仇恨言论的研究结果,不那么严格的内容节制的承诺引发了人们对Twitter上仇恨情绪上升的担忧。我们研究了马斯克收购该平台前后仇恨言论的水平和机器人的流行程度。我们发现,在马斯克收购Twitter后,仇恨言论急剧增加,大多数类型的机器人的普及率都有所增加,而人造草坪机器人的普及率则有所下降。
{"title":"Auditing Elon Musk’s Impact on Hate Speech and Bots","authors":"Daniel Hickey, Matheus Schmitz, Daniel Fessler, Paul E. Smaldino, Goran Muric, Keith Burghardt","doi":"10.1609/icwsm.v17i1.22222","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22222","url":null,"abstract":"On October 27th, 2022, Elon Musk purchased Twitter, becoming its new CEO and firing many top executives in the process. Musk listed fewer restrictions on content moderation and removal of spam bots among his goals for the platform. Given findings of prior research on moderation and hate speech in online communities, the promise of less strict content moderation poses the concern that hate will rise on Twitter. We examine the levels of hate speech and prevalence of bots before and after Musk's acquisition of the platform. We find that hate speech rose dramatically upon Musk purchasing Twitter and the prevalence of most types of bots increased, while the prevalence of astroturf bots decreased.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135910222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Team Resilience under Shock: An Empirical Analysis of GitHub Repositories during Early COVID-19 Pandemic 冲击下的团队弹性:COVID-19大流行早期GitHub存储库的实证分析
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22170
Xuan Lu, Wei Ai, Yixin Wang, Qiaozhu Mei
While many organizations have shifted to working remotely during the COVID-19 pandemic, how the remote workforce and the remote teams are influenced by and would respond to this and future shocks remain largely unknown. Software developers have relied on remote collaborations long before the pandemic, working in virtual teams (GitHub repositories). The dynamics of these repositories through the pandemic provide a unique opportunity to understand how remote teams react under shock. This work presents a systematic analysis. We measure the overall effect of the early pandemic on public GitHub repositories by comparing their sizes and productivity with the counterfactual outcomes forecasted as if there were no pandemic. We find that the productivity level and the number of active members of these teams vary significantly during different periods of the pandemic. We then conduct a finer-grained investigation and study the heterogeneous effects of the shock on individual teams. We find that the resilience of a team is highly correlated to certain properties of the team before the pandemic. Through a bootstrapped regression analysis, we reveal which types of teams are robust or fragile to the shock.
虽然在2019冠状病毒病大流行期间,许多组织已转向远程工作,但远程工作人员和远程团队如何受到这种冲击以及如何应对这种冲击和未来的冲击,在很大程度上仍不得而知。早在疫情大流行之前,软件开发人员就依赖于远程协作,在虚拟团队(GitHub存储库)中工作。这些储存库在大流行期间的动态为了解远程团队如何应对冲击提供了独特的机会。这项工作提出了一个系统的分析。我们通过比较公共GitHub存储库的规模和生产力,以及在没有大流行的情况下预测的反事实结果,来衡量早期大流行对公共GitHub存储库的总体影响。我们发现,在大流行的不同时期,这些团队的生产力水平和活跃成员人数差异很大。然后,我们进行了更细致的调查,并研究了冲击对单个团队的异质影响。我们发现,一个团队的应变能力与该团队在大流行前的某些特性高度相关。通过自举回归分析,我们揭示了哪些类型的团队对冲击是稳健的或脆弱的。
{"title":"Team Resilience under Shock: An Empirical Analysis of GitHub Repositories during Early COVID-19 Pandemic","authors":"Xuan Lu, Wei Ai, Yixin Wang, Qiaozhu Mei","doi":"10.1609/icwsm.v17i1.22170","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22170","url":null,"abstract":"While many organizations have shifted to working remotely during the COVID-19 pandemic, how the remote workforce and the remote teams are influenced by and would respond to this and future shocks remain largely unknown. Software developers have relied on remote collaborations long before the pandemic, working in virtual teams (GitHub repositories). The dynamics of these repositories through the pandemic provide a unique opportunity to understand how remote teams react under shock. This work presents a systematic analysis. We measure the overall effect of the early pandemic on public GitHub repositories by comparing their sizes and productivity with the counterfactual outcomes forecasted as if there were no pandemic. We find that the productivity level and the number of active members of these teams vary significantly during different periods of the pandemic. We then conduct a finer-grained investigation and study the heterogeneous effects of the shock on individual teams. We find that the resilience of a team is highly correlated to certain properties of the team before the pandemic. Through a bootstrapped regression analysis, we reveal which types of teams are robust or fragile to the shock.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying and Characterizing Behavioral Classes of Radicalization within the QAnon Conspiracy on Twitter 在推特上识别和描述QAnon阴谋中激进化的行为类别
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22197
Emily L. Wang, Luca Luceri, Francesco Pierri, Emilio Ferrara
Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspiracies often appear in the news, are part of political rhetoric, and are espoused by significant swaths of people in the United States. It is therefore crucial to understand how such a conspiracy took root online, and what led so many social media users to adopt its ideas. In this work, we propose a framework that exploits both social interaction and content signals to uncover evidence of user radicalization or support for QAnon. Leveraging a large dataset of 240M tweets collected in the run-up to the 2020 US Presidential election, we define and validate a multivariate metric of radicalization. We use that to separate users in distinct, naturally-emerging, classes of behaviors associated with radicalization processes, from self-declared QAnon supporters to hyper-active conspiracy promoters. We also analyze the impact of Twitter's moderation policies on the interactions among different classes: we discover aspects of moderation that succeed, yielding a substantial reduction in the endorsement received by hyperactive QAnon accounts. But we also uncover where moderation fails, showing how QAnon content amplifiers are not deterred or affected by the Twitter intervention. Our findings refine our understanding of online radicalization processes, reveal effective and ineffective aspects of moderation, and call for the need to further investigate the role social media play in the spread of conspiracies.
社交媒体提供了一个肥沃的土壤,阴谋论和激进思想可以蓬勃发展,接触到广泛的受众,有时会导致网络世界之外的仇恨或暴力。QAnon是一个引人注目的政治阴谋的例子,它始于社交媒体,但后来变成了主流,部分原因是有影响力的政治人物的公开支持。如今,QAnon阴谋论经常出现在新闻中,是政治修辞的一部分,并得到了美国相当一部分人的支持。因此,了解这样一个阴谋是如何在网上扎根的,以及是什么导致如此多的社交媒体用户接受它的观点,是至关重要的。在这项工作中,我们提出了一个框架,该框架利用社交互动和内容信号来发现用户激进化或支持QAnon的证据。利用在2020年美国总统大选之前收集的2.4亿条推文的大型数据集,我们定义并验证了激进化的多元指标。我们用它来区分不同的、自然出现的、与激进过程相关的行为类别的用户,从自称的QAnon支持者到极度活跃的阴谋推动者。我们还分析了Twitter的审核政策对不同阶层之间互动的影响:我们发现了审核成功的方面,从而大大减少了过度活跃的QAnon账户所获得的认可。但我们也发现了适度失败的地方,显示了QAnon内容放大者如何没有受到Twitter干预的阻止或影响。我们的研究结果完善了我们对网络激进化过程的理解,揭示了适度的有效和无效方面,并呼吁有必要进一步调查社交媒体在阴谋传播中所扮演的角色。
{"title":"Identifying and Characterizing Behavioral Classes of Radicalization within the QAnon Conspiracy on Twitter","authors":"Emily L. Wang, Luca Luceri, Francesco Pierri, Emilio Ferrara","doi":"10.1609/icwsm.v17i1.22197","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22197","url":null,"abstract":"Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspiracies often appear in the news, are part of political rhetoric, and are espoused by significant swaths of people in the United States. It is therefore crucial to understand how such a conspiracy took root online, and what led so many social media users to adopt its ideas. In this work, we propose a framework that exploits both social interaction and content signals to uncover evidence of user radicalization or support for QAnon. Leveraging a large dataset of 240M tweets collected in the run-up to the 2020 US Presidential election, we define and validate a multivariate metric of radicalization. We use that to separate users in distinct, naturally-emerging, classes of behaviors associated with radicalization processes, from self-declared QAnon supporters to hyper-active conspiracy promoters. We also analyze the impact of Twitter's moderation policies on the interactions among different classes: we discover aspects of moderation that succeed, yielding a substantial reduction in the endorsement received by hyperactive QAnon accounts. But we also uncover where moderation fails, showing how QAnon content amplifiers are not deterred or affected by the Twitter intervention. Our findings refine our understanding of online radicalization processes, reveal effective and ineffective aspects of moderation, and call for the need to further investigate the role social media play in the spread of conspiracies.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How Much User Context Do We Need? Privacy by Design in Mental Health NLP Applications 我们需要多少用户背景?隐私在心理健康NLP应用中的设计
Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22186
Ramit Sawhney, Atula Neerkaje, Ivan Habernal, Lucie Flek
Clinical NLP tasks such as mental health assessment from text, must take social constraints into account - the performance maximization must be constrained by the utmost importance of guaranteeing privacy of user data. Consumer protection regulations, such as GDPR, generally handle privacy by restricting data availability, such as requiring to limit user data to 'what is necessary' for a given purpose. In this work, we reason that providing stricter formal privacy guarantees, while increasing the volume of user data in the model, in most cases increases benefit for all parties involved, especially for the user. We demonstrate our arguments on two existing suicide risk assessment datasets of Twitter and Reddit posts. We present the first analysis juxtaposing user history length and differential privacy budgets and elaborate how modeling additional user context enables utility preservation while maintaining acceptable user privacy guarantees.
临床NLP任务,如从文本进行心理健康评估,必须考虑到社会约束——性能最大化必须受到保证用户数据隐私的最重要的约束。消费者保护条例,如GDPR,通常通过限制数据可用性来处理隐私,例如要求将用户数据限制在特定目的的“必要”范围内。在这项工作中,我们认为,在增加模型中用户数据量的同时,提供更严格的正式隐私保证,在大多数情况下会增加各方的利益,尤其是用户。我们在两个现有的Twitter和Reddit帖子的自杀风险评估数据集上展示了我们的论点。我们提出了第一个分析,并列了用户历史长度和不同的隐私预算,并详细说明了如何建模额外的用户上下文来实现效用保存,同时保持可接受的用户隐私保证。
{"title":"How Much User Context Do We Need? Privacy by Design in Mental Health NLP Applications","authors":"Ramit Sawhney, Atula Neerkaje, Ivan Habernal, Lucie Flek","doi":"10.1609/icwsm.v17i1.22186","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22186","url":null,"abstract":"Clinical NLP tasks such as mental health assessment from text, must take social constraints into account - the performance maximization must be constrained by the utmost importance of guaranteeing privacy of user data. Consumer protection regulations, such as GDPR, generally handle privacy by restricting data availability, such as requiring to limit user data to 'what is necessary' for a given purpose. In this work, we reason that providing stricter formal privacy guarantees, while increasing the volume of user data in the model, in most cases increases benefit for all parties involved, especially for the user. We demonstrate our arguments on two existing suicide risk assessment datasets of Twitter and Reddit posts. We present the first analysis juxtaposing user history length and differential privacy budgets and elaborate how modeling additional user context enables utility preservation while maintaining acceptable user privacy guarantees.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136040989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the International AAAI Conference on Web and Social Media
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1