Understanding Health-Related Discussions on Reddit: Development of a Topic Assignment Method and Exploratory Analysis.

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES JMIR Formative Research Pub Date : 2025-01-29 DOI:10.2196/55309
Garrett J Chan, Mark Fung, Jill Warrington, Sarah A Nowak
{"title":"Understanding Health-Related Discussions on Reddit: Development of a Topic Assignment Method and Exploratory Analysis.","authors":"Garrett J Chan, Mark Fung, Jill Warrington, Sarah A Nowak","doi":"10.2196/55309","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Social media has become a widely used way for people to share opinions about health care and medical topics. Social media data can be leveraged to understand patient concerns and provide insight into why patients may turn to the internet instead of the health care system for health advice.</p><p><strong>Objective: </strong>This study aimed to develop a method to investigate Reddit posts discussing health-related conditions. Our goal was to characterize these topics and identify trends in these social media-based medical discussions.</p><p><strong>Methods: </strong>Using an initial query, we collected 1 year of Reddit posts containing the phrases \"get tested\" and \"get checked.\" These posts were manually reviewed, and subreddits containing irrelevant posts were excluded from analysis. This selection of posts was manually read by the investigators to categorize posts into topics. A script was developed to automatically assign topics to additional posts based on keywords. Topic and keyword selections were refined based on manual review for more accurate topic assignment. Topic assignment was then performed on the entire 1-year Reddit dataset containing 347,130 posts. Related topics were grouped into broader medical disciplines. Analysis of the topic assignments was then conducted to assess condition and medical topic frequencies in medical condition-focused subreddits and general subreddits.</p><p><strong>Results: </strong>We created an automated algorithm to assign medical topics to Reddit posts. By iterating through multiple rounds of topic assignment, we improved the accuracy of the algorithm. Ultimately, this algorithm created 82 topics sorted into 17 broader medical disciplines. Of all topics, sexually transmitted infections (STIs), eye disorders, anxiety, and pregnancy had the highest post frequency overall. STIs comprised 7.44% (5876/78,980) of posts, and anxiety comprised 5.43% (4289/78,980) of posts. A total of 34% (28/82) of the topics comprised 80% (63,184/78,980) of all posts. Of the medical disciplines, those with the most posts were psychiatry and mental health; genitourinary and reproductive health; infectious diseases; and endocrinology, nutrition, and metabolism. Psychiatry and mental health comprised 26.6% (21,009/78,980) of posts, and genitourinary and reproductive health comprised 13.6% (10,741/78,980) of posts. Overall, most posts were also classified under these 4 medical disciplines. During analysis, subreddits were also classified as general if they did not focus on a specific health issue and topic-specific if they discussed a specific medical issue. Topics that appeared most frequently in the top 5 in general subreddits included addiction and drug anxiety, attention-deficit/hyperactivity disorder, abuse, and STIs. In topic-specific subreddits, most posts were found to discuss the topic of that subreddit.</p><p><strong>Conclusions: </strong>Certain health topics and medical disciplines are predominant on Reddit. These include topics such as STIs, eye disorders, anxiety, and pregnancy. Most posts were classified under the medical disciplines of psychiatry and mental health, as well as genitourinary and reproductive health.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e55309"},"PeriodicalIF":2.0000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822319/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/55309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Social media has become a widely used way for people to share opinions about health care and medical topics. Social media data can be leveraged to understand patient concerns and provide insight into why patients may turn to the internet instead of the health care system for health advice.

Objective: This study aimed to develop a method to investigate Reddit posts discussing health-related conditions. Our goal was to characterize these topics and identify trends in these social media-based medical discussions.

Methods: Using an initial query, we collected 1 year of Reddit posts containing the phrases "get tested" and "get checked." These posts were manually reviewed, and subreddits containing irrelevant posts were excluded from analysis. This selection of posts was manually read by the investigators to categorize posts into topics. A script was developed to automatically assign topics to additional posts based on keywords. Topic and keyword selections were refined based on manual review for more accurate topic assignment. Topic assignment was then performed on the entire 1-year Reddit dataset containing 347,130 posts. Related topics were grouped into broader medical disciplines. Analysis of the topic assignments was then conducted to assess condition and medical topic frequencies in medical condition-focused subreddits and general subreddits.

Results: We created an automated algorithm to assign medical topics to Reddit posts. By iterating through multiple rounds of topic assignment, we improved the accuracy of the algorithm. Ultimately, this algorithm created 82 topics sorted into 17 broader medical disciplines. Of all topics, sexually transmitted infections (STIs), eye disorders, anxiety, and pregnancy had the highest post frequency overall. STIs comprised 7.44% (5876/78,980) of posts, and anxiety comprised 5.43% (4289/78,980) of posts. A total of 34% (28/82) of the topics comprised 80% (63,184/78,980) of all posts. Of the medical disciplines, those with the most posts were psychiatry and mental health; genitourinary and reproductive health; infectious diseases; and endocrinology, nutrition, and metabolism. Psychiatry and mental health comprised 26.6% (21,009/78,980) of posts, and genitourinary and reproductive health comprised 13.6% (10,741/78,980) of posts. Overall, most posts were also classified under these 4 medical disciplines. During analysis, subreddits were also classified as general if they did not focus on a specific health issue and topic-specific if they discussed a specific medical issue. Topics that appeared most frequently in the top 5 in general subreddits included addiction and drug anxiety, attention-deficit/hyperactivity disorder, abuse, and STIs. In topic-specific subreddits, most posts were found to discuss the topic of that subreddit.

Conclusions: Certain health topics and medical disciplines are predominant on Reddit. These include topics such as STIs, eye disorders, anxiety, and pregnancy. Most posts were classified under the medical disciplines of psychiatry and mental health, as well as genitourinary and reproductive health.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
了解Reddit上与健康相关的讨论:主题分配方法的发展和探索性分析。
背景:社交媒体已经成为人们广泛使用的一种方式来分享关于医疗保健和医学话题的观点。可以利用社交媒体数据来了解患者的担忧,并深入了解为什么患者可能会转向互联网而不是医疗保健系统寻求健康建议。目的:本研究旨在开发一种方法来调查Reddit上讨论健康状况的帖子。我们的目标是描述这些主题,并确定这些基于社交媒体的医学讨论的趋势。方法:使用初始查询,我们收集了1年的Reddit帖子,其中包含短语“get test”和“get checked”。这些帖子是人工审查的,包含不相关帖子的子reddit被排除在分析之外。这些帖子的选择由调查人员手动阅读,以将帖子分类为主题。开发了一个脚本,可以根据关键字自动将主题分配给其他帖子。主题和关键字的选择是基于人工审查更准确的主题分配改进。然后对包含347,130个帖子的整个1年Reddit数据集进行主题分配。相关主题被归类为更广泛的医学学科。然后对主题分配进行分析,以评估以医疗状况为重点的子reddit和一般子reddit中的病情和医疗主题频率。结果:我们创建了一个自动算法来为Reddit帖子分配医疗主题。通过多轮主题分配的迭代,我们提高了算法的准确性。最终,该算法创建了82个主题,分为17个更广泛的医学学科。在所有话题中,性传播感染(STIs)、眼疾、焦虑和怀孕的后频率最高。性传播感染占7.44%(5876/78,980),焦虑占5.43%(4289/78,980)。34%(28/82)的主题占所有帖子的80%(63,184/78,980)。在医学学科中,职位最多的是精神病学和心理健康;泌尿生殖和生殖健康;传染病;还有内分泌学,营养学和新陈代谢。精神病学和心理健康占26.6%(21 009/78 980),泌尿生殖健康和生殖健康占13.6%(10 741/78 980)。总体而言,大多数员额也归为这4个医学学科。在分析过程中,如果子reddit不关注特定的健康问题,则将其分类为一般,如果讨论特定的医疗问题,则将其分类为特定主题。在一般子版块前5名中出现最频繁的话题包括成瘾和药物焦虑、注意力缺陷/多动障碍、滥用和性传播感染。在特定主题的子reddit中,大多数帖子都是讨论该子reddit的主题。结论:某些健康话题和医学学科在Reddit上占主导地位。这些话题包括性传播感染、眼疾、焦虑和怀孕。大多数员额被分类为精神病学和心理健康以及泌尿生殖系统和生殖健康等医学学科。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JMIR Formative Research
JMIR Formative Research Medicine-Medicine (miscellaneous)
CiteScore
2.70
自引率
9.10%
发文量
579
审稿时长
12 weeks
期刊最新文献
Mental Health Needs of Families of Patients in Intensive Care Units and the Role of Mobile Health: Survey Study. AI-Assisted Systematic Review: Humans Still Need to Review All Abstracts for Inclusion. mHealth Intervention to Promote Nonexercise Physical Activity in Patients With Type 2 Diabetes: Secondary Analysis and Implementation Study. Personalized Glucose Management With AI: Pilot Study Using a Multiarmed Bandit Approach. Prospective Evaluation of Large Language Model Integration Into a Classical Hematology Case Conference.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1