Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer.

IF 3.3 Q2 ONCOLOGY JMIR Cancer Pub Date : 2024-05-09 DOI:10.2196/51332

Zhouqing Zhang, Kongmeng Liew, Roeline Kuijer, Wan Jou She, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

{"title":"Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer.","authors":"Zhouqing Zhang, Kongmeng Liew, Roeline Kuijer, Wan Jou She, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki","doi":"10.2196/51332","DOIUrl":null,"url":null,"abstract":"Background: Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there.Objective: With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance.Methods: We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic).Results: Our binary model (F1-score=0.92) and multiclass model (F1-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the \"no_patient\" class, but sadness and anxiety were higher for the \"family_members\" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the \"no_patient\" class, but topics on cancer treatment were higher in the \"family_members\" class.Conclusions: Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.","PeriodicalId":45538,"journal":{"name":"JMIR Cancer","volume":"10 ","pages":"e51332"},"PeriodicalIF":3.3000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11117131/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cancer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/51332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there.

Objective: With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance.

Methods: We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic).

Results: Our binary model (F₁-score=0.92) and multiclass model (F₁-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the "no_patient" class, but sadness and anxiety were higher for the "family_members" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the "no_patient" class, but topics on cancer treatment were higher in the "family_members" class.

Conclusions: Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

中国社交媒体平台微博上基于发帖人与患者关系的不同内容和语言：关于乳腺癌帖子的文本分类、情感分析和主题建模。

背景：乳腺癌不仅影响着确诊者的生活，也影响着周围人的生活。许多患者在社交媒体上分享他们的经历。然而，这些叙述可能会因发布者的身份及其与患者的关系而有所不同；患者在发布自己的经历时可能会与朋友或家人患有乳腺癌的人发布不同的内容。微博是中国最受欢迎的社交媒体平台之一，与乳腺癌相关的帖子在微博上经常可见：为了了解中国乳腺癌患者的不同经历，我们旨在探索相关帖子的内容和语言在发帖人是谁及其与患者的关系不同的情况下有何不同，如果患者是发帖人本人还是朋友、家人、亲戚或熟人，情感表达和话题内容是否存在差异：我们以微博为资源，研究了发帖者与患者之间不同关系下的发帖差异。我们共收集了 10,322 条相关微博。我们采用两步分析法，在这一数据集上微调了 2 个中文的 "从变换器中稳健优化的双向编码器表征（BERT）预训练法 "模型，并注释了贴主与患者的关系。这些模型按顺序排列，首先是二分类器（无病人或病人），然后是多分类器（post_user、family_members、friends_relatives、acquaintances、heard_relation），对海报与病人的关系进行分类。接下来，我们使用语言学探究和字数词典从 5 个情绪类别（积极和消极情绪、愤怒、悲伤和焦虑）进行情感分析，然后进行主题建模（BERTopic）：我们的二元模型（F1-score=0.92）和多类模型（F1-score=0.83）在很大程度上能够准确分类海报与患者之间的关系。随后的情感分析表明，在所有的海报-患者关系中，情感类别存在显著差异。值得注意的是，"无患者 "类别的负面情绪和愤怒情绪较高，而 "家庭成员 "类别的悲伤和焦虑情绪较高。在前 30 个主题中，我们还注意到 "非患者 "类别中关于对癌症的恐惧和愤怒的主题较多，而 "家庭成员 "类别中关于癌症治疗的主题较多：结论：中国用户会根据发帖人与发帖人之间的关系发布不同类型的内容。如果患者是家属，帖子会更悲伤、更焦虑，但也包含更多关于治疗的内容。然而，如果没有发现患者，帖子则显示出更高的愤怒程度。我们认为这可能源于发帖人的咆哮，这可能有助于情绪调节和收集社会支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊