Pub Date : 2025-06-07DOI: 10.1609/icwsm.v19i1.35860
Allison Lahnala, Vasudha Varadarajan, Lucie Flek, H Andrew Schwartz, Ryan L Boyd
The proliferation of ideological movements into extremist factions via social media has become a global concern. While radicalization has been studied extensively within the context of specific ideologies, our ability to accurately characterize extremism in more generalizable terms remains underdeveloped. In this paper, we propose a novel method for extracting and analyzing extremist discourse across a range of online ideological community forums. By focusing on verbal behavioral signatures of extremist traits, we develop a framework for quantifying extremism at both user and community levels. Our research identifies 11 distinct factors, which we term "The Extremist Eleven," as a generalized psychosocial model of extremism. Applying our method to various online communities, we demonstrate an ability to characterize ideologically diverse communities across the 11 extremist traits. We demonstrate the power of this method by analyzing user histories from members of the incel community. We find that our framework accurately predicts which users join the incel community up to 10 months before their actual entry with an AUC of > 0.6, steadily increasing to AUC ~ 0.9 three to four months before the event. Further, we find that upon entry into an ideological forum, the users tend to maintain their level of extremist traits within the community, while still remaining distinguishable from the general online discourse. Our findings contribute to the study of extremism by introducing a more holistic, cross-ideological approach that transcends traditional, trait-specific models.
{"title":"Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization.","authors":"Allison Lahnala, Vasudha Varadarajan, Lucie Flek, H Andrew Schwartz, Ryan L Boyd","doi":"10.1609/icwsm.v19i1.35860","DOIUrl":"10.1609/icwsm.v19i1.35860","url":null,"abstract":"<p><p>The proliferation of ideological movements into extremist factions via social media has become a global concern. While radicalization has been studied extensively within the context of specific ideologies, our ability to accurately characterize extremism in more generalizable terms remains underdeveloped. In this paper, we propose a novel method for extracting and analyzing extremist discourse across a range of online ideological community forums. By focusing on verbal behavioral signatures of extremist traits, we develop a framework for quantifying extremism at both user and community levels. Our research identifies 11 distinct factors, which we term \"The Extremist Eleven,\" as a generalized psychosocial model of extremism. Applying our method to various online communities, we demonstrate an ability to characterize ideologically diverse communities across the 11 extremist traits. We demonstrate the power of this method by analyzing user histories from members of the incel community. We find that our framework accurately predicts which users join the incel community up to 10 months before their actual entry with an AUC of > 0.6, steadily increasing to AUC ~ 0.9 three to four months before the event. Further, we find that upon entry into an ideological forum, the users tend to maintain their level of extremist traits within the community, while still remaining distinguishable from the general online discourse. Our findings contribute to the study of extremism by introducing a more holistic, cross-ideological approach that transcends traditional, trait-specific models.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"19 ","pages":"1051-1067"},"PeriodicalIF":0.0,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12584583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-07DOI: 10.1609/icwsm.v19i1.35875
Viet Cuong Nguyen, Mini Jain, Abhijat Chauhan, Heather Jaime Soled, Santiago Alvarez Lesmes, Zihang Li, Michael L Birnbaum, Sunny X Tang, Srijan Kumar, Munmun De Choudhury
Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis and treatment. Detecting and understanding engagement with such content is crucial to mitigating their harmful effects on public health. We perform the first quantitative study of the phenomenon using YouTube Shorts and Bitchute as the sites of study. We contribute MentalMisinfo, a novel labeled mental health misinformation (MHMisinfo) dataset of 739 videos (639 from Youtube and 100 from Bitchute) and 135372 comments in total, using an expert-driven annotation schema. We first found that few-shot in-context learning with large language models (LLMs) are effective in detecting MHMisinfo videos. Next, we discover distinct and potentially alarming linguistic patterns in how audiences engage with MHMisinfo videos through commentary on both video-sharing platforms. Across the two platforms, comments could exacerbate prevailing stigma with some groups showing heightened susceptibility to and alignment with MHMisinfo. We discuss technical and public health-driven adaptive solutions to tackling the "epidemic" of mental health misinformation online.
{"title":"Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms.","authors":"Viet Cuong Nguyen, Mini Jain, Abhijat Chauhan, Heather Jaime Soled, Santiago Alvarez Lesmes, Zihang Li, Michael L Birnbaum, Sunny X Tang, Srijan Kumar, Munmun De Choudhury","doi":"10.1609/icwsm.v19i1.35875","DOIUrl":"https://doi.org/10.1609/icwsm.v19i1.35875","url":null,"abstract":"<p><p>Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis and treatment. Detecting and understanding engagement with such content is crucial to mitigating their harmful effects on public health. We perform the first quantitative study of the phenomenon using YouTube Shorts and Bitchute as the sites of study. We contribute MentalMisinfo, a novel labeled mental health misinformation (MHMisinfo) dataset of 739 videos (639 from Youtube and 100 from Bitchute) and 135372 comments in total, using an expert-driven annotation schema. We first found that few-shot in-context learning with large language models (LLMs) are effective in detecting MHMisinfo videos. Next, we discover distinct and potentially alarming linguistic patterns in how audiences engage with MHMisinfo videos through commentary on both video-sharing platforms. Across the two platforms, comments could exacerbate prevailing stigma with some groups showing heightened susceptibility to and alignment with MHMisinfo. We discuss technical and public health-driven adaptive solutions to tackling the \"epidemic\" of mental health misinformation online.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"19 ","pages":"1329-1345"},"PeriodicalIF":0.0,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12365693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-06-07DOI: 10.1609/icwsm.v19i1.35861
Tanmay Laud, Akadia Kacha-Ochana, Steven A Sumner, Vikram Krishnasamy, Royal Law, Lyna Schieber, Munmun De Choudhury, Mai ElSherief
Opioid use disorder (OUD) is a leading health problem that affects individual well-being as well as general public health. Due to a variety of reasons, including the stigma faced by people using opioids, online communities for recovery and support were formed on different social media platforms. In these communities, people share their experiences and solicit information by asking questions to learn about opioid use and recovery. However, these communities do not always contain clinically verified information. In this paper, we study natural language questions asked in the context of OUD-related discourse on Reddit. We adopt transformer-based question detection along with hierarchical clustering across 19 subreddits to identify six coarse-grained categories and 69 fine-grained categories of OUD-related questions. Our analysis uncovers ten areas of information seeking from Reddit users in the context of OUD: drug sales, specific drug-related questions, OUD treatment, drug uses, side effects, withdrawal, lifestyle, drug testing, pain management and others, during the study period of 2018-2021. Our work provides a major step in improving the understanding of OUD-related questions people ask unobtrusively on Reddit. We finally discuss technological interventions and public health harm reduction techniques based on the topics of these questions.
{"title":"Large-Scale Analysis of Online Questions Related to Opioid Use Disorder on Reddit.","authors":"Tanmay Laud, Akadia Kacha-Ochana, Steven A Sumner, Vikram Krishnasamy, Royal Law, Lyna Schieber, Munmun De Choudhury, Mai ElSherief","doi":"10.1609/icwsm.v19i1.35861","DOIUrl":"10.1609/icwsm.v19i1.35861","url":null,"abstract":"<p><p>Opioid use disorder (OUD) is a leading health problem that affects individual well-being as well as general public health. Due to a variety of reasons, including the stigma faced by people using opioids, online communities for recovery and support were formed on different social media platforms. In these communities, people share their experiences and solicit information by asking questions to learn about opioid use and recovery. However, these communities do not always contain clinically verified information. In this paper, we study natural language questions asked in the context of OUD-related discourse on Reddit. We adopt transformer-based question detection along with hierarchical clustering across 19 subreddits to identify six coarse-grained categories and 69 fine-grained categories of OUD-related questions. Our analysis uncovers ten areas of information seeking from Reddit users in the context of OUD: drug sales, specific drug-related questions, OUD treatment, drug uses, side effects, withdrawal, lifestyle, drug testing, pain management and others, during the study period of 2018-2021. Our work provides a major step in improving the understanding of OUD-related questions people ask unobtrusively on Reddit. We finally discuss technological interventions and public health harm reduction techniques based on the topics of these questions.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"19 ","pages":"1068-1084"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12766712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31Epub Date: 2024-05-28DOI: 10.1609/icwsm.v18i1.31324
Muskan Garg, Msvpj Sathvik, Shaina Raza, Amrit Chadha, Sunghwan Sohn
The social NLP research community witness a recent surge in the computational advancements of mental health analysis to build responsible AI models for a complex interplay between language use and self-perception. Such responsible AI models aid in quantifying the psychological concepts from user-penned texts on social media. On thinking beyond the low-level (classification) task, we advance the existing binary classification dataset, towards a higher-level task of reliability analysis through the lens of explanations, posing it as one of the safety measures. We annotate the LoST dataset to capture nuanced textual cues that suggest the presence of low self-esteem in the posts of Reddit users. We further state that the NLP models developed for determining the presence of low self-esteem, focus more on three types of textual cues: (i) Trigger: words that triggers mental disturbance, (ii) LoST indicators: text indicators emphasizing low self-esteem, and (iii) Consequences: words describing the consequences of mental disturbance. We implement existing classifiers to examine the attention mechanism in pre-trained language models (PLMs) for a domain-specific psychology-grounded task. Our findings suggest the need of shifting the focus of PLMs from Trigger and Consequences to a more comprehensive explanation, emphasizing LoST indicators while determining low self-esteem in Reddit posts.
{"title":"Reliability Analysis of Psychological Concept Extraction and Classification in User-penned Text.","authors":"Muskan Garg, Msvpj Sathvik, Shaina Raza, Amrit Chadha, Sunghwan Sohn","doi":"10.1609/icwsm.v18i1.31324","DOIUrl":"10.1609/icwsm.v18i1.31324","url":null,"abstract":"<p><p>The social NLP research community witness a recent surge in the computational advancements of mental health analysis to build responsible AI models for a complex interplay between language use and self-perception. Such responsible AI models aid in quantifying the psychological concepts from user-penned texts on social media. On thinking beyond the low-level (<i>classification</i>) task, we advance the existing binary classification dataset, towards a higher-level task of reliability analysis through the lens of explanations, posing it as one of the safety measures. We annotate the <i>LoST</i> dataset to capture nuanced textual cues that suggest the presence of low self-esteem in the posts of Reddit users. We further state that the NLP models developed for determining the presence of low self-esteem, focus more on three types of textual cues: (i) <i>Trigger</i>: words that triggers mental disturbance, (ii) <i>LoST indicators</i>: text indicators emphasizing low self-esteem, and (iii) <i>Consequences</i>: words describing the consequences of mental disturbance. We implement existing classifiers to examine the attention mechanism in pre-trained language models (PLMs) for a domain-specific psychology-grounded task. Our findings suggest the need of shifting the focus of PLMs from <i>Trigger</i> and <i>Consequences</i> to a more comprehensive explanation, emphasizing <i>LoST indicators</i> while determining low self-esteem in Reddit posts.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"18 ","pages":"422-434"},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-31DOI: 10.1609/icwsm.v16i1.19399
Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt
The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.
{"title":"Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency.","authors":"Austin van Loon, Salvatore Giorgi, Robb Willer, Johannes Eichstaedt","doi":"10.1609/icwsm.v16i1.19399","DOIUrl":"https://doi.org/10.1609/icwsm.v16i1.19399","url":null,"abstract":"<p><p>The word embedding association test (WEAT) is an important method for measuring linguistic biases against social groups such as ethnic minorities in large text corpora. It does so by comparing the semantic relatedness of words prototypical of the groups (e.g., names unique to those groups) and attribute words (e.g., 'pleasant' and 'unpleasant' words). We show that anti-Black WEAT estimates from geo-tagged social media data at the level of metropolitan statistical areas strongly correlate with several measures of racial animus-even when controlling for sociodemographic covariates. However, we also show that every one of these correlations is explained by a third variable: the frequency of Black names in the underlying corpora relative to White names. This occurs because word embeddings tend to group positive (negative) words and frequent (rare) words together in the estimated semantic space. As the frequency of Black names on social media is strongly correlated with Black Americans' prevalence in the population, this results in spuriously high anti-Black WEAT estimates wherever few Black Americans live. This suggests that research using the WEAT to measure bias should consider term frequency, and also demonstrates the potential consequences of using black-box models like word embeddings to study human cognition and behavior.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"16 ","pages":"1419-1424"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10147343/pdf/nihms-1842382.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9399665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salvatore Giorgi, Veronica E Lynn, Keshav Gupta, Farhan Ahmed, Sandra Matz, Lyle H Ungar, H Andrew Schwartz
Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population - a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where observations are reweighted according to how under- or over-sampled their socio-demographic groups are. Yet, restratifaction is rarely evaluated for improving prediction. In this two-part study, we first evaluate standard, "out-of-the-box" restratification techniques, finding they provide no improvement and often even degraded prediction accuracies across four tasks of esimating U.S. county population health statistics from Twitter. The core reasons for degraded performance seem to be tied to their reliance on either sparse or shrunken estimates of each population's socio-demographics. In the second part of our study, we develop and evaluate Robust Poststratification, which consists of three methods to address these problems: (1) estimator redistribution to account for shrinking, as well as (2) adaptive binning and (3) informed smoothing to handle sparse socio-demographic estimates. We show that each of these methods leads to significant improvement in prediction accuracies over the standard restratification approaches. Taken together, Robust Poststratification enables state-of-the-art prediction accuracies, yielding a 53.0% increase in variance explained (R2) in the case of surveyed life satisfaction, and a 17.8% average increase across all tasks.
{"title":"Correcting Sociodemographic Selection Biases for Population Prediction from Social Media.","authors":"Salvatore Giorgi, Veronica E Lynn, Keshav Gupta, Farhan Ahmed, Sandra Matz, Lyle H Ungar, H Andrew Schwartz","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population - a \"selection bias\". Within the social sciences, such a bias is typically addressed with <i>restratification</i> techniques, where observations are reweighted according to how under- or over-sampled their socio-demographic groups are. Yet, restratifaction is rarely evaluated for improving prediction. In this two-part study, we first evaluate standard, \"out-of-the-box\" restratification techniques, finding they provide no improvement and often even degraded prediction accuracies across four tasks of esimating U.S. county population health statistics from Twitter. The core reasons for degraded performance seem to be tied to their reliance on either sparse or shrunken estimates of each population's socio-demographics. In the second part of our study, we develop and evaluate Robust Poststratification, which consists of three methods to address these problems: (1) <i>estimator redistribution</i> to account for shrinking, as well as (2) <i>adaptive binning</i> and (3) <i>informed smoothing</i> to handle sparse socio-demographic estimates. We show that each of these methods leads to significant improvement in prediction accuracies over the standard restratification approaches. Taken together, Robust Poststratification enables state-of-the-art prediction accuracies, yielding a 53.0% increase in variance explained (<i>R</i> <sup>2</sup>) in the case of surveyed life satisfaction, and a 17.8% average increase across all tasks.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"16 1","pages":"228-240"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9714525/pdf/nihms-1842768.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35254726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cory J Cascalheira, Shah Muhammad Hamdi, Jillian R Scheer, Koustuv Saha, Soukaina Filali Boubrahimi, Munmun De Choudhury
Because of their stigmatized social status, sexual and gender minority (SGM; e.g., gay, transgender) people experience minority stress (i.e., identity-based stress arising from adverse social conditions). Given that minority stress is the leading framework for understanding health inequity among SGM people, researchers and clinicians need accurate methods to detect minority stress. Since social media fulfills important developmental, affiliative, and coping functions for SGM people, social media may be an ecologically valid channel for detecting minority stress. In this paper, we propose a bidirectional long short-term memory (BI-LSTM) network for classifying minority stress disclosed on Reddit. Our experiments on a dataset of 12,645 Reddit posts resulted in an average accuracy of 65%.
{"title":"Classifying Minority Stress Disclosure on Social Media with Bidirectional Long Short-Term Memory.","authors":"Cory J Cascalheira, Shah Muhammad Hamdi, Jillian R Scheer, Koustuv Saha, Soukaina Filali Boubrahimi, Munmun De Choudhury","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Because of their stigmatized social status, sexual and gender minority (SGM; e.g., gay, transgender) people experience minority stress (i.e., identity-based stress arising from adverse social conditions). Given that minority stress is the leading framework for understanding health inequity among SGM people, researchers and clinicians need accurate methods to detect minority stress. Since social media fulfills important developmental, affiliative, and coping functions for SGM people, social media may be an ecologically valid channel for detecting minority stress. In this paper, we propose a bidirectional long short-term memory (BI-LSTM) network for classifying minority stress disclosed on Reddit. Our experiments on a dataset of 12,645 Reddit posts resulted in an average accuracy of 65%.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":" ","pages":"1373-1377"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235017/pdf/nihms-1816009.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40408344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-31DOI: 10.1609/icwsm.v16i1.19390
C. Cascalheira, S. M. Hamdi, Jillian R. Scheer, Koustuv Saha, S. F. Boubrahimi, M. Choudhury
Because of their stigmatized social status, sexual and gender minority (SGM; e.g., gay, transgender) people experience minority stress (i.e., identity-based stress arising from adverse social conditions). Given that minority stress is the leading framework for understanding health inequity among SGM people, researchers and clinicians need accurate methods to detect minority stress. Since social media fulfills important developmental, affiliative, and coping functions for SGM people, social media may be an ecologically valid channel for detecting minority stress. In this paper, we propose a bidirectional long short-term memory (BI-LSTM) network for classifying minority stress disclosed on Reddit. Our experiments on a dataset of 12,645 Reddit posts resulted in an average accuracy of 65%.
{"title":"Classifying Minority Stress Disclosure on Social Media with Bidirectional Long Short-Term Memory","authors":"C. Cascalheira, S. M. Hamdi, Jillian R. Scheer, Koustuv Saha, S. F. Boubrahimi, M. Choudhury","doi":"10.1609/icwsm.v16i1.19390","DOIUrl":"https://doi.org/10.1609/icwsm.v16i1.19390","url":null,"abstract":"Because of their stigmatized social status, sexual and gender minority (SGM; e.g., gay, transgender) people experience minority stress (i.e., identity-based stress arising from adverse social conditions). Given that minority stress is the leading framework for understanding health inequity among SGM people, researchers and clinicians need accurate methods to detect minority stress. Since social media fulfills important developmental, affiliative, and coping functions for SGM people, social media may be an ecologically valid channel for detecting minority stress. In this paper, we propose a bidirectional long short-term memory (BI-LSTM) network for classifying minority stress disclosed on Reddit. Our experiments on a dataset of 12,645 Reddit posts resulted in an average accuracy of 65%.","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"11 6 1","pages":"1373-1377"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75217014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ramit Sawhney, Harshit Joshi, Alicia Nobles, Rajiv Ratn Shah
Social media platforms are already engaged in leveraging existing online socio-technical systems to employ just-in-time interventions for suicide prevention to the public. These efforts primarily rely on self-reports of potential self-harm content that is reviewed by moderators. Most recently, platforms have employed automated models to identify self-harm content, but acknowledge that these automated models still struggle to understand the nuance of human language (e.g., sarcasm). By explicitly focusing on Twitter posts that could easily be misidentified by a model as expressing suicidal intent (i.e., they contain similar phrases such as "wanting to die"), our work examines the temporal differences in historical expressions of general and emotional language prior to a clear expression of suicidal intent. Additionally, we analyze time-aware neural models that build on these language variants and factors in the historical, emotional spectrum of a user's tweeting activity. The strongest model achieves high (statistically significant) performance (macro F1=0.804, recall=0.813) to identify social media indicative of suicidal intent. Using three use cases of tweets with phrases common to suicidal intent, we qualitatively analyze and interpret how such models decided if suicidal intent was present and discuss how these analyses may be used to alleviate the burden on human moderators within the known constraints of how moderation is performed (e.g., no access to the user's timeline). Finally, we discuss the ethical implications of such data-driven models and inferences about suicidal intent from social media. Content warning: this article discusses self-harm and suicide.
{"title":"Tweet Classification to Assist Human Moderation for Suicide Prevention.","authors":"Ramit Sawhney, Harshit Joshi, Alicia Nobles, Rajiv Ratn Shah","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social media platforms are already engaged in leveraging existing online socio-technical systems to employ just-in-time interventions for suicide prevention to the public. These efforts primarily rely on self-reports of potential self-harm content that is reviewed by moderators. Most recently, platforms have employed automated models to identify self-harm content, but acknowledge that these automated models still struggle to understand the nuance of human language (e.g., sarcasm). By explicitly focusing on Twitter posts that could easily be misidentified by a model as expressing suicidal intent (i.e., they contain similar phrases such as \"wanting to die\"), our work examines the temporal differences in historical expressions of general and emotional language prior to a clear expression of suicidal intent. Additionally, we analyze time-aware neural models that build on these language variants and factors in the historical, emotional spectrum of a user's tweeting activity. The strongest model achieves high (statistically significant) performance (macro F1=0.804, recall=0.813) to identify social media indicative of suicidal intent. Using three use cases of tweets with phrases common to suicidal intent, we qualitatively analyze and interpret how such models decided if suicidal intent was present and discuss how these analyses may be used to alleviate the burden on human moderators within the known constraints of how moderation is performed (e.g., no access to the user's timeline). Finally, we discuss the ethical implications of such data-driven models and inferences about suicidal intent from social media. <b>Content warning: this article discusses self-harm and suicide.</b></p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":" ","pages":"609-620"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8843106/pdf/nihms-1774843.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39627521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-01DOI: 10.1609/icwsm.v15i1.18132
Salvatore Giorgi, Sharath Chandra Guntuku, Johannes C Eichstaedt, Claire Pajot, H Andrew Schwartz, Lyle H Ungar
Psychological research has shown that subjective well-being is sensitive to social comparison effects; individuals report decreased happiness when their neighbors earn more than they do. In this work, we use Twitter language to estimate the well-being of users, and model both individual and neighborhood income using hierarchical modeling across counties in the United States (US). We show that language-based estimates from a sample of 5.8 million Twitter users replicate results obtained from large-scale well-being surveys - relatively richer neighbors leads to lower well-being, even when controlling for absolute income. Furthermore, predicting individual-level happiness using hierarchical models (i.e., individuals within their communities) out-predicts standard baselines. We also explore language associated with relative income differences and find that individuals with lower income than their community tend to swear (f*ck, sh*t, b*tch), express anger (pissed, bullsh*t, wtf), hesitation (don't, anymore, idk, confused) and acts of social deviance (weed, blunt, drunk). These results suggest that social comparison robustly affects reported well-being, and that Twitter language analyses can be used to both measure these effects and shed light on their underlying psychological dynamics.
{"title":"Well-Being Depends on Social Comparison: Hierarchical Models of Twitter Language Suggest That Richer Neighbors Make You Less Happy.","authors":"Salvatore Giorgi, Sharath Chandra Guntuku, Johannes C Eichstaedt, Claire Pajot, H Andrew Schwartz, Lyle H Ungar","doi":"10.1609/icwsm.v15i1.18132","DOIUrl":"https://doi.org/10.1609/icwsm.v15i1.18132","url":null,"abstract":"<p><p>Psychological research has shown that subjective well-being is sensitive to social comparison effects; individuals report decreased happiness when their neighbors earn more than they do. In this work, we use Twitter language to estimate the well-being of users, and model both individual and neighborhood income using hierarchical modeling across counties in the United States (US). We show that language-based estimates from a sample of 5.8 million Twitter users replicate results obtained from large-scale well-being surveys - relatively richer neighbors leads to lower well-being, even when controlling for absolute income. Furthermore, predicting individual-level happiness using hierarchical models (i.e., individuals within their communities) out-predicts standard baselines. We also explore language associated with relative income differences and find that individuals with lower income than their community tend to swear (f*ck, sh*t, b*tch), express anger (pissed, bullsh*t, wtf), hesitation (don't, anymore, idk, confused) and acts of social deviance (weed, blunt, drunk). These results suggest that social comparison robustly affects reported well-being, and that Twitter language analyses can be used to both measure these effects and shed light on their underlying psychological dynamics.</p>","PeriodicalId":74525,"journal":{"name":"Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media","volume":"15 ","pages":"1069-1074"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10099468/pdf/nihms-1854629.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9328583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}