Background: User demographics are often hidden in social media data due to privacy concerns. However, demographic information on substance use (SU) can provide valuable insights, allowing public health policy makers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises such as the COVID-19 pandemic.
Objective: This study aimed to analyze SU trends at the user level across different demographic dimensions, such as age, gender, race, and ethnicity, with a focus on the COVID-19 pandemic. The study also establishes a baseline for SU trends using social media data.
Methods: The study was conducted using large-scale English-language data from Twitter (now known as X) over a 3-year period (2019, 2020, and 2021), comprising 1.13 billion posts. Following preprocessing, the SU posts were identified using our custom-trained deep learning model (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach [RoBERTa]), which resulted in the identification of 9 million SU posts. Then, demographic attributes, such as user type, age, gender, race, and ethnicity, as well as sentiments and emotions associated with each post, were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to obtain insight into user behaviors based on demographics.
Results: The highest level of user participation in SU discussions was observed in 2020, with a 22.18% increase compared to 2019 and a 25.24% increase compared to 2021. Throughout the study period, male users and teenagers increasingly dominated the SU discussions across all substance types. During the COVID-19 pandemic, user participation in prescription medication discussions was notably higher among female users compared to other substance types. In addition, alcohol use increased by 80% within 2 weeks after the global pandemic declaration in 2020.
Conclusions: This study presents a large-scale, fine-grained analysis of SU on social media data, examining trends by age, gender, race, and ethnicity before, during, and after the COVID-19 pandemic. Our findings, contextualized with sociocultural and pandemic-specific factors, provide actionable insights for targeted public health interventions. This study establishes social media data (powered with artificial intelligence and natural language processing tools) as a valuable platform for real-time SU surveillance and prevention during crises.
Background: In recent years, there has been a dramatic increase in the popularity and use of glucagon-like peptide-1 receptor agonists (GLP-1 RAs) for weight loss. As such, it is essential to understand users' real-world discussions of short-term, long-term, and co-occurrent adverse events associated with currently used GLP-1 RA medications.
Objective: This study aims to quantitatively analyze temporal and co-occurrent GLP-1 RA adverse event trends through discussions of GLP-1 RA weight loss medications on Facebook from 2022 to 2024.
Methods: We collected 64,202 Facebook posts (59,293 posts after removing duplicate posts) from January 1, 2022, to May 31, 2024, through CrowdTangle, a public insights tool from Meta. Using English language social media posts from the United States, we examined discussions of adverse event mentions for posts referencing 7 GLP-1 RA weight loss product categories (ie, semaglutide, Ozempic, Wegovy, tirzepatide, Mounjaro, Zepbound, and GLP-1 RA as a class). All analyses were conducted using Python (version 3; Python Software Foundation) in a Google Colab environment.
Results: Temporal time series analysis revealed that the GLP-1 RAs' adverse event mentions on social media aligned with several key events: the Food and Drug Administration's approval of Wegovy for pediatric weight management in December 2022, increased media coverage in August 2023, celebrity endorsement in December 2023, and Medicare Part D coverage expansion for weight loss medications in March 2024. Gastrointestinal (GI)-related adverse events (general term) were most prevalent for posts mentioning the GLP-1 RA class (210/4885, 4.30%) and Mounjaro (241/4031, 5.98%). In contrast, the most prevalent adverse event mentions noted for tirzepatide were headache (78/4202, 1.86%) and joint pain (71/4202, 1.69%). Hypertension (13/1769, 0.73%) was frequently mentioned in Zepbound posts, while pancreatitis was commonly associated with Mounjaro posts (44/4031, 1.08%), and 2.85% (139/4885) of posts broadly referring to the GLP-1 RA class. Furthermore, an integrated node network analysis revealed 3 distinct GLP-1 RA adverse events-mentioned clusters: cluster 1 (purple) contained allergies, anxiety, depression, chronic obstructive pulmonary disease, fatigue, fever, hypertension, indigestion, insomnia, gastroesophageal reflux disease, hives, swelling, restlessness, and seizures. Cluster 2 (pink) contained constipation, dehydration, headache, diarrhea, dizziness, hypoglycemia, sweating, and jaundice. Cluster 3 (brown) contained GI symptoms, such as nausea, pancreatitis, rash, and vomiting. The GI symptoms, such as nausea, vomiting, pancreatitis, diarrhea, and indigestion, were strongly associated together (≥100 co-occurrence mentions), while the mentioned neurological symptoms, such as anxiety, depression, and insomnia, were highly correlated with each other (50-100 co-occurrence men
Background: Stroke has become a leading cause of death and disability worldwide, resulting in a significant loss of healthy life years and imposing a considerable economic burden on patients, their families, and caregivers. However, despite the growing role of online videos as an emerging source of health information, the credibility and quality of stroke prevention education videos, especially those in Chinese, remain unclear.
Objective: This study aims to assess the basic characteristics, overall quality, and reliability of Chinese-language online videos related to public health education on stroke prevention.
Methods: We systematically searched and screened stroke prevention-related video resources from 4 popular Chinese domestic video platforms (Bilibili, Douyin, Haokan, and Xigua). General information, including upload date, duration, views, likes, comments, and shares, was extracted and recorded. Two validated evaluation tools were used: the modified DISCERN questionnaire to assess content reliability and the Global Quality Scale (GQS) to evaluate overall quality. Finally, Spearman correlation analysis was conducted to examine potential associations between general video metrics and their quality and reliability.
Results: After searching and screening, a total of 313 eligible videos were included for analysis: 68 from Bilibili, 74 from Douyin, 86 from Haokan, and 85 from Xigua. Among these, 113 (36.1%) were created by health care professionals, followed by news agencies (n=95, 30.4%) and general individual users (n=40, 12.8%). The median scores for the modified DISCERN and GQS were 2 and 3, respectively, suggesting that the included stroke prevention-related videos were relatively unreliable and of moderate quality. Most videos focused on primary stroke prevention and commonly recommended adopting a healthy diet; engaging in physical activity; and managing blood pressure, glucose, and lipid levels. Additionally, videos with longer durations and more comments tended to be more reliable and of higher quality. A positive association was also observed between video quality and reliability.
Conclusions: Overall, the quality and reliability of Chinese-language online videos as a source of stroke prevention information remain unsatisfactory and should be approached with caution by viewers. To address this issue, several measures should be implemented, including establishing an online monitoring and correction system, strengthening the video review process through collaboration with health care professionals, and encouraging more selective and cautious sharing of controversial content. These steps are essential to help curb the spread of online misinformation and minimize its ongoing impact.
Background: Abortion access in the United States has been in a state of rapid change and increasing restriction since the Dobbs v Jackson Women's Health Organization decision from the US Supreme Court in June 2022. With further constraints on access to abortion since Dobbs, the internet and online communities are playing an increasingly important role in people's abortion trajectories. There is a need for a broader understanding of how online resources are used for abortion and how they may reflect changes in the sociopolitical and legal context of abortion access. Research using online information and leveraging methods to work efficiently with large textual datasets has the potential to accelerate knowledge generation and provide novel insights into changing abortion-related experiences following Dobbs, helping address these knowledge gaps.
Objective: This project sought to use natural language processing techniques, specifically topic modeling, to explore the content of posts to 1 online community for abortion (r/abortion) in 2022 and assess how community use changed during that time.
Methods: This analysis described and explored posts shared throughout 2022 and for 3 subperiods of interest: before the Dobbs leak (December 24, 2021-May 1, 2022), Dobbs leak to decision (May 2, 2022-June 23, 2022), and after the Dobbs decision (June 24, 2022-December 23, 2022). We used topic modeling to obtain descriptive topics for the year and each subperiod and then classified posts. Topics were then aggregated into conceptual groups based on a combination of quantitative and qualitative assessments. The proportion of posts classified in each conceptual group was used to assess change in community interests across the 3 study subperiods.
Results: The 7273 posts shared in r/abortion in 2022 included in our analyses were categorized into 8 conceptual groups: abortion decision-making, navigating abortion access barriers, clinical abortion care, medication abortion processes, postabortion physical experiences, potential pregnancy, and self-managed abortion processes. Posts related to navigating access barriers were most common. The proportion of posts about abortion decision-making and self-management changed significantly across study periods (P=.006 and P<.001, respectively); abortion decision-making posts were more common before the Dobbs leak, whereas those related to self-management increased following the leak and decision.
Conclusions: This analysis provides a holistic view of r/abortion posts in 2022, highlighting the important role of online communities as abortion-supportive online resources and changing interests among posters with abortion policy changes. As policies and pathways to abortion access continue to change across the United States, approaches leveraging natural language processing with sufficiently large samples of textual data pr
Background: Mental health organizations have the vital and difficult task of shaping public discourse and providing important information. Social media platforms such as X (formerly known as Twitter) serve as such communication channels, and analyzing organizational health information offers valuable insights into their guidance and linguistic patterns, which can enhance communication strategies for health campaigns and interventions. The findings inform strategies to enhance public engagement, trust, and the effectiveness of mental health messaging.
Objective: This study examines the predominant themes and linguistic characteristics of messages from mental health organizations, focusing on how these messages' structure information, engage audiences, and contribute to public information and discourse on mental health.
Methods: A computational content analysis was conducted to identify thematic clusters within messages from 17 unique mental health organizations, totaling 326,967 tweets and approximately 7.2 million words. In addition, Linguistic Inquiry and Word Count (LIWC) was used to analyze affective, social, and cognitive processes in messages with positive versus negative sentiment. Differences in sentiment were assessed using a Mann-Whitney U test.
Results: The analysis revealed that organizations predominantly emphasize themes related to community, well-being, and workplace mental health. Sentiment analysis indicated significant differences in affect (P<.001), social processes (P<.001), and cognitive processing (P<.001) between positive and negative messages, with effect sizes that were small to medium. Notably, while messages frequently conveyed positive sentiment and social engagement, there was a lower emphasis on cognitive processing, suggesting that more complex discussions about mental health challenges may be underrepresented.
Conclusions: Organizations use social media to promote engagement and support, often through positively valanced messages. Yet the limited emphasis on cognitive processing may indicate a gap in how organizations address more nuanced or complex mental health issues. Findings demonstrate the need for communication strategies that balance information with depth and clarity, ensuring that messages are trustworthy, actionable, and responsive to multiple mental health needs. By refining digital messaging strategies, organizations can enhance the effectiveness of health communication and improve engagement with mental health resources.
Background: The outbreak of SARS-CoV-2 in 2019 was accompanied by a rise in the popularity of conspiracy theories. These theories often undermined vaccination efforts. There is evidence that the spread of misinformation about COVID-19 is associated with online social media use. Online social media enables network effects that influence the dissemination of information. It is important to distinguish between the effects of using social media and the network effects that occur within the platform.
Objective: This study aims to investigate the association between the modularity of online social networks and the spread of, as well as attitudes toward, information and misinformation about COVID-19.
Methods: This study used data from the social network structure of the online social media platform Vkontakte (VK) to construct an adjusted modularity index (fragmentation index) for 166 Russian towns. VK is a widely used Russian social media platform. The study combined town-level network indices with data from the poll "Research on COVID-19 in Russia's Regions" (RoCIRR), which included responses from 23,000 individuals. The study measured respondents' knowledge of both fake and true statements about COVID-19, as well as their attitudes toward these statements.
Results: A positive association was observed between town-level fragmentation and individuals' knowledge of fake statements, and a negative association with knowledge of true statements. There is a strong negative association between fragmentation and the average attitude toward true statements (P<.001), while the association with attitudes toward fake statements is positive but statistically insignificant (P=.55). Additionally, a strong association was found between network fragmentation and ideological differences in attitudes toward true versus fake statements.
Conclusions: While social media use plays an important role in the diffusion of health-related information, the structure of social networks can amplify these effects. Social network modularity plays a key role in the spread of information, with differing impacts on true and fake statements. These differences in information dissemination contribute to variations in attitudes toward true and fake statements about COVID-19. Ultimately, fragmentation was associated with individual-level polarization on medical topics. Future research should further explore the interaction between social media use and underlying network effects.
Background: There has been an increase in the prevalence of noncommunicable diseases in Malaysia. This can be prevented and managed through the adoption of healthy lifestyle behaviors, including not smoking, avoiding alcohol consumption, maintaining a balanced diet, and being physically active. The growing importance of using social media to deliver information on healthy behaviors has led health care professionals (HCPs) to lead these efforts. To ensure effective delivery of information on healthy lifestyle behaviors, HCPs should begin by understanding users' current opinions about these behaviors and whether the users are receptive to recommended health practices. Nevertheless, there has been limited research conducted in Malaysia that aims to identify the sentiments and content of posts, as well as how well users' perceptions align with recommended health practices.
Objective: This study aims to examine social media posts related to various lifestyle behaviors, by using a combination of sentiment analysis to analyze users' sentiments and manual content analysis to explore the content of the posts and how well users' perceptions align with recommended health practices.
Methods: Using keywords based on lifestyle behaviors, posts originating from X (formerly known as Twitter) and published in Malaysia between November and December 2022 were scraped for sentiment analysis. Posts with positive and negative sentiments were randomly selected for content analysis. A codebook was developed to code the selected posts according to content and alignment of users' perceptions with recommended health practices.
Results: A total of 3320 posts were selected for sentiment analysis. Significant associations were observed between sentiment class and lifestyle behaviors (χ26=67.64; P<.001), with positive sentiments higher than negative sentiments for all lifestyle behaviors. Findings from content analysis of 1328 posts revealed that most of the posts were about users' narratives (492/1328), general statements (203/1328), and planned actions toward the conduct of their behavior (196/1328). More than half of tobacco-, diet-, and activity-related posts were aligned with recommended health practices, whereas most of the alcohol-related posts were not aligned with recommended health practices (63/112).
Conclusions: As most of the alcohol-related posts did not align with recommended health practices, the findings reflect a need for HCPs to increase their delivery of health information on alcohol consumption. It is also important to ensure the ongoing health promotion of the other 3 lifestyle behaviors on social media, while continuing to monitor the discussions made by social media users.
Background: Social media is widely used by the general public as a source of health information because of its convenience. However, the increasing prevalence of health misinformation on social media is becoming a serious concern, and it remains unclear how the general public identifies and responds to it.
Objective: This study aims to explore the approaches used by the general public for identifying and responding to health misinformation on social media.
Methods: Semistructured interviews were conducted with 22 respondents from the Malaysian general public. The theory of motivated information management was used as a guiding framework for conducting the interviews. Audio-taped interviews were transcribed verbatim and imported into ATLAS.ti software for analysis. Themes were identified from the qualitative data using a thematic analysis method.
Results: The 3 main themes identified were emotional responses and impacts of health misinformation, approaches used to identify health misinformation, and responses to health misinformation. The spread of health misinformation through social media platforms has caused uncertainty and triggered a range of emotional responses, including anxiety and feelings of vulnerability, among respondents who encountered it. The approaches to identifying health misinformation on social media included examining message characteristics and sources. Messages were deemed to be misinformation if they contradicted credible sources or exhibited illogical and exaggerated content. Respondents described multiple response approaches to health misinformation based on the situation. Verification was chosen if the information was deemed important, while misinformation was often ignored to avoid conflict. Respondents were compelled to take action if misinformation affected their family members, had been corrected by others, or if they were knowledgeable about the topic. Taking action involved correcting the misinformation and reporting the misinformation to relevant social media, enforcement authorities, and government bodies.
Conclusions: This study highlights the factors and motivations influencing the general public's identification and response to health misinformation on social media. Addressing the challenges of health misinformation identified in this study requires collaborative efforts from all stakeholders to reduce the spread of health misinformation and reduce the general public's belief in it.
Background: The rapid emergence of artificial intelligence-based large language models (LLMs) in 2022 has initiated extensive discussions within the academic community. While proponents highlight LLMs' potential to improve writing and analytical tasks, critics caution against the ethical and cultural implications of widespread reliance on these models. Existing literature has explored various aspects of LLMs, including their integration, performance, and utility, yet there is a gap in understanding the nature of these discussions and how public perception contrasts with expert opinion in the field of public health.
Objective: This study sought to explore how the general public's views and sentiments regarding LLMs, using OpenAI's ChatGPT as an example, differ from those of academic researchers and experts in the field, with the goal of gaining a more comprehensive understanding of the future role of LLMs in health care.
Methods: We used a hybrid sentiment analysis approach, integrating the Syuzhet package in R (R Core Team) with GPT-3.5, achieving an 84% accuracy rate in sentiment classification. Also, structural topic modeling was applied to identify and analyze 8 key discussion topics, capturing both optimistic and critical perspectives on LLMs.
Results: Findings revealed a predominantly positive sentiment toward LLM integration in health care, particularly in areas such as patient care and clinical decision-making. However, concerns were raised regarding their suitability for mental health support and patient communication, highlighting potential limitations and ethical challenges.
Conclusions: This study underscores the transformative potential of LLMs in public health while emphasizing the need to address ethical and practical concerns. By comparing public discourse with academic perspectives, our findings contribute to the ongoing scholarly debate on the opportunities and risks associated with LLM adoption in health care.

