Proceedings of the International AAAI Conference on Web and Social Media最新文献_第4页

Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios 低资源场景下的跨语言跨领域危机分类

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22185

Cinthia Sánchez, Hernan Sarmiento, Andres Abeliuk, Jorge Pérez, Barbara Poblete

Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence.

社交媒体数据已成为有关现实世界危机事件的及时信息的有用来源。与使用社会媒体进行灾害管理有关的主要任务之一是自动识别与危机有关的信息。关于这一主题的大多数研究都集中在对特定语言中特定类型事件的数据进行分析。这限制了推广现有方法的可能性，因为模型不能直接应用于新类型的事件或其他语言。在这项工作中，我们研究了通过利用跨语言和跨领域标记数据对与危机事件相关的消息进行自动分类的任务。我们的目标是利用来自高资源语言的标记数据来对来自其他(低资源)语言和/或新的(以前未见过的)危机情况类型的消息进行分类。在我们的研究中，我们从文献中整合了一个包含多种危机事件和语言的大型统一数据集。我们的实证研究结果表明，确实有可能利用英语危机事件的数据来对其他语言(如西班牙语和意大利语)的同一类型事件进行分类(80.0%的f1得分)。此外，我们在跨语言设置的跨域任务中取得了良好的性能(80.0% f1得分)。总的来说，我们的工作有助于改善数据稀缺性问题，这对多语言危机分类非常重要。特别是，在时间紧迫的紧急情况下，减轻冷启动情况。

{"title":"Cross-Lingual and Cross-Domain Crisis Classification for Low-Resource Scenarios","authors":"Cinthia Sánchez, Hernan Sarmiento, Andres Abeliuk, Jorge Pérez, Barbara Poblete","doi":"10.1609/icwsm.v17i1.22185","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22185","url":null,"abstract":"Social media data has emerged as a useful source of timely information about real-world crisis events. One of the main tasks related to the use of social media for disaster management is the automatic identification of crisis-related messages. Most of the studies on this topic have focused on the analysis of data for a particular type of event in a specific language. This limits the possibility of generalizing existing approaches because models cannot be directly applied to new types of events or other languages. In this work, we study the task of automatically classifying messages that are related to crisis events by leveraging cross-language and cross-domain labeled data. Our goal is to make use of labeled data from high-resource languages to classify messages from other (low-resource) languages and/or of new (previously unseen) types of crisis situations. For our study we consolidated from the literature a large unified dataset containing multiple crisis events and languages. Our empirical findings show that it is indeed possible to leverage data from crisis events in English to classify the same type of event in other languages, such as Spanish and Italian (80.0% F1-score). Furthermore, we achieve good performance for the cross-domain task (80.0% F1-score) in a cross-lingual setting. Overall, our work contributes to improving the data scarcity problem that is so important for multilingual crisis classification. In particular, mitigating cold-start situations in emergency events, when time is of essence.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Open-Source Cultural Consensus Approach to Name-Based Gender Classification 基于名字的性别分类的开源文化共识方法

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22195

Ian Van Buskirk, Aaron Clauset, Daniel B. Larremore

Name-based gender classification has enabled hundreds of otherwise infeasible scientific studies of gender. Yet, the lack of standardization, reliance on paid services, understudied limitations, and conceptual debates cast a shadow over many applications. To address these problems we develop and evaluate an ensemble-based open-source method built on publicly available data of empirical name-gender associations. Our method integrates 36 distinct sources—spanning over 150 countries and more than a century—via a meta-learning algorithm inspired by Cultural Consensus Theory (CCT). We also construct a taxonomy with which names themselves can be classified. We find that our method's performance is competitive with paid services and that our method, and others, approach the upper limits of performance; we show that conditioning estimates on additional metadata (e.g. cultural context), further combining methods, or collecting additional name-gender association data is unlikely to meaningfully improve performance. This work definitively shows that name-based gender classification can be a reliable part of scientific research and provides a pair of tools, a classification method and a taxonomy of names, that realize this potential.

以名字为基础的性别分类使得数以百计的性别科学研究成为可能。然而，缺乏标准化、依赖付费服务、研究不足的局限性以及概念上的争论给许多应用蒙上了阴影。为了解决这些问题，我们开发并评估了一种基于集成的开源方法，该方法建立在公开可用的经验名称-性别关联数据之上。我们的方法通过受文化共识理论(CCT)启发的元学习算法，整合了36个不同的来源——跨越150多个国家和一个多世纪。我们还构造了一个分类法，用它可以对名称本身进行分类。我们发现我们的方法的性能与付费服务具有竞争力，并且我们的方法和其他方法接近性能的上限;我们表明，对额外元数据(例如文化背景)的条件估计、进一步组合方法或收集额外的姓名-性别关联数据不太可能有意义地提高性能。这项工作明确地表明，基于名字的性别分类可以成为科学研究的一个可靠组成部分，并提供了一套工具、一种分类方法和一种人名分类法，以实现这一潜力。

{"title":"An Open-Source Cultural Consensus Approach to Name-Based Gender Classification","authors":"Ian Van Buskirk, Aaron Clauset, Daniel B. Larremore","doi":"10.1609/icwsm.v17i1.22195","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22195","url":null,"abstract":"Name-based gender classification has enabled hundreds of otherwise infeasible scientific studies of gender. Yet, the lack of standardization, reliance on paid services, understudied limitations, and conceptual debates cast a shadow over many applications. To address these problems we develop and evaluate an ensemble-based open-source method built on publicly available data of empirical name-gender associations. Our method integrates 36 distinct sources—spanning over 150 countries and more than a century—via a meta-learning algorithm inspired by Cultural Consensus Theory (CCT). We also construct a taxonomy with which names themselves can be classified. We find that our method's performance is competitive with paid services and that our method, and others, approach the upper limits of performance; we show that conditioning estimates on additional metadata (e.g. cultural context), further combining methods, or collecting additional name-gender association data is unlikely to meaningfully improve performance. This work definitively shows that name-based gender classification can be a reliable part of scientific research and provides a pair of tools, a classification method and a taxonomy of names, that realize this potential.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135910227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Retweet-BERT: Political Leaning Detection Using Language Features and Information Diffusion on Social Networks Retweet-BERT:基于语言特征和社交网络信息扩散的政治倾向检测

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22160

Julie Jiang, Xiang Ren, Emilio Ferrara

Estimating the political leanings of social media users is a challenging and ever more pressing problem given the increase in social media consumption. We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users. Retweet-BERT leverages the retweet network structure and the language used in users' profile descriptions. Our assumptions stem from patterns of networks and linguistics homophily among people who share similar ideologies. Retweet-BERT demonstrates competitive performance against other state-of-the-art baselines, achieving 96%-97% macro-F1 on two recent Twitter datasets (a COVID-19 dataset and a 2020 United States presidential elections dataset). We also perform manual validation to validate the performance of Retweet-BERT on users not in the training data. Finally, in a case study of COVID-19, we illustrate the presence of political echo chambers on Twitter and show that it exists primarily among right-leaning users. Our code is open-sourced and our data is publicly available.

鉴于社交媒体消费的增加，估计社交媒体用户的政治倾向是一个具有挑战性和日益紧迫的问题。我们介绍了rettweet - bert，一个简单且可扩展的模型来估计Twitter用户的政治倾向。rettweet - bert利用了转发网络结构和用户配置文件描述中使用的语言。我们的假设源于意识形态相似的人之间的网络模式和语言同质性。rettweet - bert与其他最先进的基线相比表现出竞争力，在两个最近的Twitter数据集(新冠肺炎数据集和2020年美国总统选举数据集)上实现了96%-97%的宏观f1。我们还执行手动验证来验证rettweet - bert对不在训练数据中的用户的性能。最后，在COVID-19的案例研究中，我们说明了Twitter上政治回音室的存在，并表明它主要存在于右倾用户中。我们的代码是开源的，我们的数据是公开的。

引用次数: 0

Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter’s Trending Topics 算法趋势推广的影响:来自Twitter趋势主题协调活动的证据

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22187

Joseph Schlessinger, Kiran Garimella, Maurice Jakesch, Dean Eckles

In addition to more personalized content feeds, some leading social media platforms give a prominent role to content that is more widely popular. On Twitter, "trending topics" identify popular topics of conversation on the platform, thereby promoting popular content which users might not have otherwise seen through their network. Hence, "trending topics" potentially play important roles in influencing the topics users engage with on a particular day. Using two carefully constructed data sets from India and Turkey, we study the effects of a hashtag appearing on the trending topics page on the number of tweets produced with that hashtag. We specifically aim to answer the question: How many new tweeting using that hashtag appear because a hashtag is labeled as trending? We distinguish the effects of the trending topics page from network exposure and find there is a statistically significant, but modest, return to a hashtag being featured on trending topics. Analysis of the types of users impacted by trending topics shows that the feature helps less popular and new users to discover and spread content outside their network, which they otherwise might not have been able to do.

除了更个性化的内容提要，一些领先的社交媒体平台还对更受欢迎的内容给予了突出的作用。在Twitter上，“热门话题”确定了平台上的热门话题，从而推广了用户可能没有通过他们的网络看到的热门内容。因此，“热门话题”可能在影响用户在某一天参与的话题方面发挥重要作用。使用来自印度和土耳其的两个精心构建的数据集，我们研究了出现在热门话题页面上的标签对使用该标签产生的tweet数量的影响。我们的目标是回答这个问题:有多少使用该标签的新推文因为标签被标记为趋势而出现?我们将热门话题页面的影响与网络曝光区分开来，发现在统计上显著但适度地回归到热门话题上的标签。对受热门话题影响的用户类型的分析表明，该功能可以帮助不太受欢迎的新用户发现和传播他们的网络之外的内容，否则他们可能无法做到这一点。

{"title":"Effects of Algorithmic Trend Promotion: Evidence from Coordinated Campaigns in Twitter’s Trending Topics","authors":"Joseph Schlessinger, Kiran Garimella, Maurice Jakesch, Dean Eckles","doi":"10.1609/icwsm.v17i1.22187","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22187","url":null,"abstract":"In addition to more personalized content feeds, some leading social media platforms give a prominent role to content that is more widely popular. On Twitter, \"trending topics\" identify popular topics of conversation on the platform, thereby promoting popular content which users might not have otherwise seen through their network. Hence, \"trending topics\" potentially play important roles in influencing the topics users engage with on a particular day. Using two carefully constructed data sets from India and Turkey, we study the effects of a hashtag appearing on the trending topics page on the number of tweets produced with that hashtag. We specifically aim to answer the question: How many new tweeting using that hashtag appear because a hashtag is labeled as trending? We distinguish the effects of the trending topics page from network exposure and find there is a statistically significant, but modest, return to a hashtag being featured on trending topics. Analysis of the types of users impacted by trending topics shows that the feature helps less popular and new users to discover and spread content outside their network, which they otherwise might not have been able to do.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-polar Opposites: Analyzing the Relationship between Echo Chambers and Hostile Intergroup Interactions on Reddit 非极性对立:分析Reddit上回音室与敌对群体互动之间的关系

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22138

Alexandros Efstratiou, Jeremy Blackburn, Tristan Caulfield, Gianluca Stringhini, Savvas Zannettou, Emiliano De Cristofaro

Previous research has documented the existence of both online echo chambers and hostile intergroup interactions. In this paper, we explore the relationship between these two phenomena by studying the activity of 5.97M Reddit users and 421M comments posted over 13 years. We examine whether users who are more engaged in echo chambers are more hostile when they comment on other communities. We then create a typology of relationships between political communities based on whether their users are toxic to each other, whether echo chamber-like engagement with these communities has a polarizing effect, and on the communities' political leanings. We observe both the echo chamber and hostile intergroup interaction phenomena, but neither holds universally across communities. Contrary to popular belief, we find that polarizing and toxic speech is more dominant between communities on the same, rather than opposing, sides of the political spectrum, especially on the left; however, this mostly points to the collective targeting of political outgroups.

先前的研究已经证明了在线回声室和敌对群体间互动的存在。在本文中，我们通过研究13年来597万Reddit用户的活动和4.21亿评论来探讨这两种现象之间的关系。我们研究的是，在回音室中参与度更高的用户在评论其他社区时是否更有敌意。然后，我们创建了一个政治社区之间关系的类型学，基于他们的用户是否彼此有毒，与这些社区的回音室式接触是否具有两极分化效应，以及社区的政治倾向。我们观察到回音室和敌对的群体间互动现象，但这两种现象都不适用于所有社区。与普遍看法相反，我们发现，在政治光谱的同一阵营(而非对立阵营)之间，尤其是在左翼群体中，两极分化和有毒言论更占主导地位;然而，这主要是指针对政治外群体的集体攻击。

{"title":"Non-polar Opposites: Analyzing the Relationship between Echo Chambers and Hostile Intergroup Interactions on Reddit","authors":"Alexandros Efstratiou, Jeremy Blackburn, Tristan Caulfield, Gianluca Stringhini, Savvas Zannettou, Emiliano De Cristofaro","doi":"10.1609/icwsm.v17i1.22138","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22138","url":null,"abstract":"Previous research has documented the existence of both online echo chambers and hostile intergroup interactions. In this paper, we explore the relationship between these two phenomena by studying the activity of 5.97M Reddit users and 421M comments posted over 13 years. We examine whether users who are more engaged in echo chambers are more hostile when they comment on other communities. We then create a typology of relationships between political communities based on whether their users are toxic to each other, whether echo chamber-like engagement with these communities has a polarizing effect, and on the communities' political leanings. We observe both the echo chamber and hostile intergroup interaction phenomena, but neither holds universally across communities. Contrary to popular belief, we find that polarizing and toxic speech is more dominant between communities on the same, rather than opposing, sides of the political spectrum, especially on the left; however, this mostly points to the collective targeting of political outgroups.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging Nations: Quantifying the Role of Multilinguals in Communication on Social Media 桥梁国家:量化多语言在社交媒体沟通中的作用

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22174

Julia Mendelsohn, Sayan Ghosh, David Jurgens, Ceren Budak

Social media enables the rapid spread of many kinds of information, from pop culture memes to social movements. However, little is known about how information crosses linguistic boundaries. We apply causal inference techniques on the European Twitter network to quantify the structural role and communication influence of multilingual users in cross-lingual information exchange. Overall, multilinguals play an essential role; posting in multiple languages increases betweenness centrality by 13%, and having a multilingual network neighbor increases monolinguals’ odds of sharing domains and hashtags from another language 16-fold and 4-fold, respectively. We further show that multilinguals have a greater impact on diffusing information is less accessible to their monolingual compatriots, such as information from far-away countries and content about regional politics, nascent social movements, and job opportunities. By highlighting information exchange across borders, this work sheds light on a crucial component of how information and ideas spread around the world.

社交媒体使多种信息迅速传播，从流行文化表情包到社会运动。然而，人们对信息如何跨越语言界限知之甚少。我们在欧洲Twitter网络上应用因果推理技术来量化多语言用户在跨语言信息交换中的结构角色和传播影响。总的来说，多语者发挥着至关重要的作用;用多种语言发帖可以将中间性中心性提高13%，拥有多语言网络邻居可以将单语者分享另一种语言的域和标签的几率分别提高16倍和4倍。我们进一步表明，多语者对传播信息有更大的影响，这些信息是单语同胞难以获得的，例如来自遥远国家的信息和有关地区政治、新兴社会运动和就业机会的内容。通过强调跨境信息交流，这项工作揭示了信息和思想如何在世界范围内传播的一个关键组成部分。

引用次数: 0

Contextualizing Online Conversational Networks 语境化在线会话网络

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22171

Thomas Magelinski, Kathleen M. Carley

Online social connections occur within a specific conversational context. Prior work in network analysis of social media data attempts to contextualize data through filtering. We propose a method of contextualizing online conversational connections automatically and illustrate this method with Twitter data. Specifically, we detail a graph neural network model capable of representing tweets in a vector space based on their text, hashtags, URLs, and neighboring tweets. Once tweets are represented, clusters of tweets uncover conversational contexts. We apply our method to a dataset with 4.5 million tweets discussing the 2020 US election. We find that even filtered data contains many different conversational contexts, with users engaging in multiple conversations. While users engage in multiple conversations, the overlap between any two pairs of conversations tends to be only 30-40%, giving very different networks for different conversations. Even accounting for this variation, we show that the relative social status of users varies considerably across contexts, with tau=0.472 on average. Our findings imply that standard network analysis on social media data can be unreliable in the face of multiple conversational contexts.

在线社交关系发生在特定的会话环境中。先前在社交媒体数据网络分析方面的工作试图通过过滤将数据语境化。我们提出了一种自动上下文化在线会话连接的方法，并用Twitter数据说明了这种方法。具体来说，我们详细介绍了一个图神经网络模型，该模型能够基于推文的文本、标签、url和相邻推文在向量空间中表示推文。一旦tweet被表示，tweet集群就会揭示会话上下文。我们将我们的方法应用于一个包含450万条讨论2020年美国大选的推文的数据集。我们发现，即使经过过滤的数据也包含许多不同的会话上下文，用户参与多个会话。当用户参与多个对话时，任何两对对话之间的重叠往往只有30-40%，这就给了不同的对话提供了非常不同的网络。即使考虑到这种差异，我们也表明，用户的相对社会地位在不同的背景下差异很大，平均tau=0.472。我们的研究结果表明，面对多种对话环境，对社交媒体数据的标准网络分析可能是不可靠的。

{"title":"Contextualizing Online Conversational Networks","authors":"Thomas Magelinski, Kathleen M. Carley","doi":"10.1609/icwsm.v17i1.22171","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22171","url":null,"abstract":"Online social connections occur within a specific conversational context. Prior work in network analysis of social media data attempts to contextualize data through filtering. We propose a method of contextualizing online conversational connections automatically and illustrate this method with Twitter data. Specifically, we detail a graph neural network model capable of representing tweets in a vector space based on their text, hashtags, URLs, and neighboring tweets. Once tweets are represented, clusters of tweets uncover conversational contexts. We apply our method to a dataset with 4.5 million tweets discussing the 2020 US election. We find that even filtered data contains many different conversational contexts, with users engaging in multiple conversations. While users engage in multiple conversations, the overlap between any two pairs of conversations tends to be only 30-40%, giving very different networks for different conversations. Even accounting for this variation, we show that the relative social status of users varies considerably across contexts, with tau=0.472 on average. Our findings imply that standard network analysis on social media data can be unreliable in the face of multiple conversational contexts.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scope of Pre-trained Language Models for Detecting Conflicting Health Information 用于检测冲突健康信息的预训练语言模型的范围

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22140

Joseph Gatto, Madhusudan Basak, Sarah Masud Preum

An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from diabetes and hypertension often receive conflicting health advice on diet. This motivates the need for technologies which can provide contextualized, user-specific health advice. A crucial step towards contextualized advice is the ability to compare health advice statements and detect if and how they are conflicting. This is the task of health conflict detection (HCD). Given two pieces of health advice, the goal of HCD is to detect and categorize the type of conflict. It is a challenging task, as (i) automatically identifying and categorizing conflicts requires a deeper understanding of the semantics of the text, and (ii) the amount of available data is quite limited. In this study, we are the first to explore HCD in the context of pre-trained language models. We find that DeBERTa-v3 performs best with a mean F1 score of 0.68 across all experiments. We additionally investigate the challenges posed by different conflict types and how synthetic data improves a model's understanding of conflict-specific semantics. Finally, we highlight the difficulty in collecting real health conflicts and propose a human-in-the-loop synthetic data augmentation approach to expand existing HCD datasets. Our HCD training dataset is over 2x bigger than the existing HCD dataset and is made publicly available on Github.

越来越多的人现在依靠在线平台来满足他们的健康信息需求。因此，识别不一致或冲突的文本健康信息已成为一项安全关键任务。卫生咨询数据构成了一个独特的挑战，在一种诊断情况下准确的信息在另一种诊断情况下可能相互冲突。例如，患有糖尿病和高血压的人在饮食方面经常收到相互矛盾的健康建议。这促使人们需要能够提供情境化的、针对用户的健康咨询的技术。实现情境化建议的关键一步是能够比较健康建议声明，并发现它们是否相互矛盾以及如何相互矛盾。这就是运行状况冲突检测(HCD)的任务。鉴于两条健康建议，HCD的目标是发现冲突类型并对其进行分类。这是一项具有挑战性的任务，因为(i)自动识别和分类冲突需要对文本的语义有更深入的理解，(ii)可用数据的数量相当有限。在这项研究中，我们首次在预训练语言模型的背景下探索HCD。我们发现DeBERTa-v3在所有实验中表现最好，平均F1得分为0.68。我们还研究了不同冲突类型带来的挑战，以及合成数据如何提高模型对冲突特定语义的理解。最后，我们强调了收集真实健康冲突的困难，并提出了一种人在环合成数据增强方法来扩展现有的HCD数据集。我们的HCD训练数据集比现有的HCD数据集大2倍以上，并在Github上公开提供。

{"title":"Scope of Pre-trained Language Models for Detecting Conflicting Health Information","authors":"Joseph Gatto, Madhusudan Basak, Sarah Masud Preum","doi":"10.1609/icwsm.v17i1.22140","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22140","url":null,"abstract":"An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from diabetes and hypertension often receive conflicting health advice on diet. This motivates the need for technologies which can provide contextualized, user-specific health advice. A crucial step towards contextualized advice is the ability to compare health advice statements and detect if and how they are conflicting. This is the task of health conflict detection (HCD). Given two pieces of health advice, the goal of HCD is to detect and categorize the type of conflict. It is a challenging task, as (i) automatically identifying and categorizing conflicts requires a deeper understanding of the semantics of the text, and (ii) the amount of available data is quite limited. In this study, we are the first to explore HCD in the context of pre-trained language models. We find that DeBERTa-v3 performs best with a mean F1 score of 0.68 across all experiments. We additionally investigate the challenges posed by different conflict types and how synthetic data improves a model's understanding of conflict-specific semantics. Finally, we highlight the difficulty in collecting real health conflicts and propose a human-in-the-loop synthetic data augmentation approach to expand existing HCD datasets. Our HCD training dataset is over 2x bigger than the existing HCD dataset and is made publicly available on Github.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136041296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22163

Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel

Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California’s Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.

从医疗保健到贷款，用算法优化有限资源的供应在各个领域都很常见。优化可以导致有效的资源分配，但如果没有进行额外的审查，也可能加剧不平等。人们对可接受的效率-公平权衡的普遍偏好知之甚少，这使得设计能够响应社区需求和愿望的算法变得困难。在这里，我们在GetCalFresh的背景下研究这种权衡和伴随的偏好，GetCalFresh是一种简化加州补充营养援助计划(SNAP，以前称为食品券)申请流程的在线服务。GetCalFresh通过在线广告来提高他们的多语言SNAP应用服务的认知度。我们首先证明，当广告被优化为每美元获得最多的注册人数时，由于非英语广告的成本相对较高，西班牙语使用者的注册人数不成比例地少。将这些结果嵌入一项针对不同美国人的调查(N = 1532)中，我们发现，除了效率之外，人们普遍支持重视公平:受访者普遍倾向于减少总入学人数，以促进西班牙语入学人数的增加。这些结果支持了最近重新评估算法资源分配中流行的以效率为中心的范式的呼吁。

{"title":"Popular Support for Balancing Equity and Efficiency in Resource Allocation: A Case Study in Online Advertising to Increase Welfare Program Awareness","authors":"Allison Koenecke, Eric Giannella, Robb Willer, Sharad Goel","doi":"10.1609/icwsm.v17i1.22163","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22163","url":null,"abstract":"Algorithmically optimizing the provision of limited resources is commonplace across domains from healthcare to lending. Optimization can lead to efficient resource allocation, but, if deployed without additional scrutiny, can also exacerbate inequality. Little is known about popular preferences regarding acceptable efficiency-equity trade-offs, making it difficult to design algorithms that are responsive to community needs and desires. Here we examine this trade-off and concomitant preferences in the context of GetCalFresh, an online service that streamlines the application process for California’s Supplementary Nutrition Assistance Program (SNAP, formerly known as food stamps). GetCalFresh runs online advertisements to raise awareness of their multilingual SNAP application service. We first demonstrate that when ads are optimized to garner the most enrollments per dollar, a disproportionately small number of Spanish speakers enroll due to relatively higher costs of non-English language advertising. Embedding these results in a survey (N = 1,532) of a diverse set of Americans, we find broad popular support for valuing equity in addition to efficiency: respondents generally preferred reducing total enrollments to facilitate increased enrollment of Spanish speakers. These results buttress recent calls to reevaluate the efficiency-centric paradigm popular in algorithmic resource allocation.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Online Emotions during the Storming of the U.S. Capitol: Evidence from the Social Media Network Parler 美国国会大厦风暴期间的在线情绪:来自社交媒体网络Parler的证据

Proceedings of the International AAAI Conference on Web and Social Media

Pub Date : 2023-06-02 DOI: 10.1609/icwsm.v17i1.22157

Johannes Jakubik, Michael Vössing, Nicolas Pröllochs, Dominik Bär, Stefan Feuerriegel

The storming of the U.S. Capitol on January 6, 2021 has led to the killing of 5 people and is widely regarded as an attack on democracy. The storming was largely coordinated through social media networks such as Twitter and "Parler". Yet little is known regarding how users interacted on Parler during the storming of the Capitol. In this work, we examine the emotion dynamics on Parler during the storming with regard to heterogeneity across time and users. For this, we segment the user base into different groups (e.g., Trump supporters and QAnon supporters). We use affective computing to infer the emotions in content, thereby allowing us to provide a comprehensive assessment of online emotions. Our evaluation is based on a large-scale dataset from Parler, comprising of 717,300 posts from 144,003 users. We find that the user base responded to the storming of the Capitol with an overall negative sentiment. Akin to this, Trump supporters also expressed a negative sentiment and high levels of unbelief. In contrast to that, QAnon supporters did not express a more negative sentiment during the storming. We further provide a cross-platform analysis and compare the emotion dynamics on Parler and Twitter. Our findings point at a comparatively less negative response to the incidents on Parler compared to Twitter accompanied by higher levels of disapproval and outrage. Our contribution to research is three-fold: (1) We identify online emotions that were characteristic of the storming; (2) we assess emotion dynamics across different user groups on Parler; (3) we compare the emotion dynamics on Parler and Twitter. Thereby, our work offers important implications for actively managing online emotions to prevent similar incidents in the future.

2021年1月6日袭击美国国会大厦导致5人死亡，被广泛认为是对民主的攻击。这场风暴主要是通过Twitter和“Parler”等社交媒体网络协调的。然而，在袭击国会大厦期间，用户是如何在Parler上互动的，我们知之甚少。在这项工作中，我们研究了风暴期间Parler在时间和用户异质性方面的情绪动态。为此，我们将用户群划分为不同的组(例如，特朗普支持者和QAnon支持者)。我们使用情感计算来推断内容中的情感，从而使我们能够提供对在线情感的全面评估。我们的评估基于Parler的大型数据集，包括来自144,003个用户的717,300个帖子。我们发现，用户群对国会大厦的袭击反应总体上是负面的。与此类似，特朗普的支持者也表达了负面情绪和高度的不相信。与此相反，QAnon的支持者在风暴中并没有表现出更负面的情绪。我们进一步提供了跨平台分析，并比较了Parler和Twitter上的情感动态。我们的调查结果表明，与推特相比，Parler上对事件的负面反应相对较少，但反对和愤怒的程度更高。我们对研究的贡献有三个方面:(1)我们确定了风暴的在线情绪特征;(2)我们评估了Parler上不同用户群体的情感动态;(3)我们比较了Parler和Twitter上的情绪动态。因此，我们的工作为积极管理网络情绪以防止未来类似事件的发生提供了重要的启示。

{"title":"Online Emotions during the Storming of the U.S. Capitol: Evidence from the Social Media Network Parler","authors":"Johannes Jakubik, Michael Vössing, Nicolas Pröllochs, Dominik Bär, Stefan Feuerriegel","doi":"10.1609/icwsm.v17i1.22157","DOIUrl":"https://doi.org/10.1609/icwsm.v17i1.22157","url":null,"abstract":"The storming of the U.S. Capitol on January 6, 2021 has led to the killing of 5 people and is widely regarded as an attack on democracy. The storming was largely coordinated through social media networks such as Twitter and \"Parler\". Yet little is known regarding how users interacted on Parler during the storming of the Capitol. In this work, we examine the emotion dynamics on Parler during the storming with regard to heterogeneity across time and users. For this, we segment the user base into different groups (e.g., Trump supporters and QAnon supporters). We use affective computing to infer the emotions in content, thereby allowing us to provide a comprehensive assessment of online emotions. Our evaluation is based on a large-scale dataset from Parler, comprising of 717,300 posts from 144,003 users. We find that the user base responded to the storming of the Capitol with an overall negative sentiment. Akin to this, Trump supporters also expressed a negative sentiment and high levels of unbelief. In contrast to that, QAnon supporters did not express a more negative sentiment during the storming. We further provide a cross-platform analysis and compare the emotion dynamics on Parler and Twitter. Our findings point at a comparatively less negative response to the incidents on Parler compared to Twitter accompanied by higher levels of disapproval and outrage. Our contribution to research is three-fold: (1) We identify online emotions that were characteristic of the storming; (2) we assess emotion dynamics across different user groups on Parler; (3) we compare the emotion dynamics on Parler and Twitter. Thereby, our work offers important implications for actively managing online emotions to prevent similar incidents in the future.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135910226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3